pywiki: a Python web app without frameworks
A couple of weeks ago I went through Writing Web Applications in Go and my mind was blown away: I used to think that you need a web framework (at least a very minimal one like Flask) to make web applications (possibly because of my Python background). So I set out to do something similar in Python3 to figure out why its standard library is not enough.
gowiki
Gowiki is an example of a very tiny web application that is already useful: it can show text files (GET /view/<PAGE>
) from ./pages/
, edit them using a <textarea>
(GET /edit/<page>
) and save them back (POST /save/<PAGE>
).
http.server: the basics
First of all, we need a web server. I use python3 -m http.server
pretty often to serve a directory over http quickly, which looks promising to extend.
The documentation page for http.server greets us with a warning:
http.server is meant for demo purposes and does not implement the stringent security checks needed of real HTTP server. We do not recommend using this module directly in production.
Fine, that must be one of the reasons why nobody uses it. Let’s proceed at our own risk.
Let’s run the initial example:
import http.server as http
address = ('', 8000)
with http.HTTPServer(address, SimpleHTTPRequestHandler) as httpd:
print('Listening on %s:%d' % address)
try:
httpd.serve_forever()
except KeyboardInterrupt:
pass
Extending the http.server
The documentation looked pretty massive and intimidating at first, and after some initial reading I had a lot of questions:
- should I extend
HTTPServer
,BaseHTTPRequestHandler
, one of its subclasses? Turns out thatHTTPServer
is the “transport” part of the application (a subclass ofTCPServer
) and it takes a request handler (subclassed fromBaseHTTPRequestHandler
) as a mix-in to actually handle the requests;SimpleHTTPRequestHandler
is a subclass ofBaseHTTPRequestHandler
that maps requests to files under the current directory;CGIHTTPRequestHandler
extendsSimpleHTTPRequestHandler
with the ability to run CGI scripts from./cgi-bin
(in the very primitive CGI way: running a process for each request). - are there any request routing mechanism in the standard library? Nope, none: you need to handle routing manually, unlike Go;
- where do I get the requested path (do I get any parameters to
do_GET
)? I needed to check with the source code and debug it to understand thatdo_GET
relies on the state of its object; - what is
send_response()
,send_response_only()
,send_error()
,end_headers()
, what is the protocol to call them? Which of those are internal methods and which can I call? I needed to read the source code ofhttp.server
to understand what they do exactly; - how do I write the response? I wrote some Python FastCGI script some time ago, so I knew about
wfile
file descriptors that can be used for this: indeed, there’sself.rfile
andself.wfile
;
import shutil
from pathlib import Path
from http import HTTPStatus as status
pages_dir = Path('./pages/')
class RequestHandlers(http.BaseHTTPRequestHandler):
def do_GET(self):
if self.path.startswith('/view/'):
return self.get_view()
return self.send_error(status.NOT_FOUND)
def respond_ok(self, body):
self.send_response(status.OK)
self.end_headers()
shutil.copyfileobj(body, self.wfile)
def get_view(self):
title = self.path[len('/view/'):]
p = pages_dir / (title + '.txt')
if not p.exists():
return self.send_error(status.NOT_FOUND)
with p.open('rb') as f:
self.respond_ok(f)
Plug it in http.HTTPServer(address, RequestHandlers)
and we are ready to go.
Serving HTML
Now let’s serve html instead of plain text. At first, I had hard-coded html snippets in the source code using multi-line string literals and inserting values using formatting, but soon I needed to move templates into separate files and to do more advanced templating using string.Template
:
...
import io
import string
templ_dir = Path('./tmpl/')
class RequestHandlers(http.BaseHTTPRequestHandler):
# ...
def respond_ok(self, body, mime='plain'):
if isinstance(body, str):
body = body.encode('utf8')
if isinstance(body, bytes):
body = io.BytesIO(body)
self.send_response(status.OK)
self.send_header('Content-Type', 'text/%s; charset=utf8' % mime)
self.end_headers()
shutil.copyfileobj(body, self.wfile)
def format_tmpl(self, page, **values):
template = open(templ_dir / tmpl).read()
t = string.Template(template)
return t.substitute(**values)
def get_view(self):
title = self.path[len('/view/'):]
p = pages_dir / (title + '.txt')
if not p.exists():
return self.send_error(status.NOT_FOUND)
with p.open('rb') as f:
body = f.read().decode('utf8')
page = self.format_tmpl('view.html', title=title, body=body)
self.respond_ok(page, text='html')
and now we have a new file, tmpl/view.html
:
<h1>${title}</h1>
<ul id="menu">
<li>[<a href="/">main</a>]</li>
<li>[<a href="/edit/${title}">edit</a>]</li>
</ul>
<hr>
<div><pre>${body}</pre></div>
<link rel="stylesheet" href="/file/style.css">
Wow, that’s a rather low-level trickery going on now in respond_ok()
, but it will pay off later, when we need to serve files.
Editing…
The frontend part is a matter of a simple html page, tmpl/edit.html
:
<h1>${title}<h1>
<ul id="menu"></ul>
<hr>
<div>
<form action="/save/${title}" method="POST">
<textarea cols="80" rows="20" autofocus name="body">${body}</textarea><br>
<input type="submit" value="Save">
</form>
</div>
and serving it is easy:
class RequestHandlers(http.BaseHTTPRequestHandler):
# ...
def do_GET(self):
if self.path.startswith('/view/'):
return self.get_view()
if self.path.startswith('/edit/'):
return self.get_edit()
return self.send_error(status.NOT_FOUND)
def get_edit(self):
title = self.path[len("/edit/"):]
body = ''
p = pages_dir / (title + '.txt')
if p.exists():
body = p.open().read()
page = self.format_tmpl('edit.html', title=title, body=body)
self.respond_ok(page, text='html')
…and POSTing
Now, how do I do POST
requests? The shocking answer is that http.server
only provides do_POST
and leaves you on your own. You need to read the sent form yourself, parse it (thanks to the standard library, there’s urllib.parse.parse_qs
) and save it.
I spent some time debugging why self.rfile.read()
just hangs the app: turns out, you need to read Content-Length
and only read it from self.rfile
(HTTP/1.1 can reuse connections for new requests).
class RequestHandlers(http.BaseHTTPRequestHandler):
# ...
def _is_form(self):
""" checks is the request is a form sent from a page """
formmime = 'application/x-www-form-urlencoded'
return self.headers.get('content-type') == formmime
def _parse_form(self):
""" parse and save form into `self.form` """
size = int(self.headers['content-length'])
form = self.rfile.read(size)
form = form.decode('ascii')
form = parse_qs(form)
self.form = form
def form_value(self, key):
""" get a form parameter by key or None """
return self.form.get(key, [None])[0]
– and now we are ready to read form parameters!
def do_POST(self):
if self._is_form():
self._parse_form()
if self.path.startswith('/save/'):
return self.post_save()
return self.send_error(status.NOT_FOUND)
def post_save(self):
title = self.path[len("/save/"):]
body = self.form_value('body')
p = pages_dir / (title + '.txt')
with p.open('w') as f:
f.write(body)
self.respond_redirect('/view/' + title)
Redirecting requires reinventing the wheel again:
def respond_redirect(self, url):
self.send_response(status.FOUND)
self.send_header('Location', url)
self.end_headers()
Now we can view, create and edit pages.
Main page
Adding the index page is very straightforward now. The only small problem is that there are no loops in string.Template
, so the html representation for the list of pages must be built in Python:
def get_main(self):
pages = []
for p in pages_dir.iterdir():
if p.suffix == '.txt':
pages.append(p.stem)
pagelist = []
for page in pages:
pagelist.append(f'<li><a href="/view/{page}">{page}</a></li>')
pagelist = '\n'.join(pagelist)
body = self.format_tmpl('main.html', pagelist=pagelist)
self.respond_ok(body, text='html')
def post_goto(self):
page = self.form_value('page')
self.respond_redirect("/edit/" + page)
– and don’t forget to add get_main
to do_GET
and post_goto
to do_POST
! This definitely could be automated using getattr
and dynamic method calls, but I am a little wary of calling code dynamically based on external requests.
main.html
is:
<h1>pywiki</h1>
<hr>
<form action="/goto" method="POST">
<input type="text" name="page" required autofocus>
<input type="submit" value="Go">
</form>
<ul>
${pagelist}
</ul>
Serving files
What about serving files, e.g. adding a bit of CSS through /file/style.css
? I could just fall back onto SimpleHTTPRequestHandler
and use its do_GET
, but let’s reinvent this too:
files_dir = Path('./file/')
class RequestHandlers(http.BaseHTTPRequestHandler):
# ...
def get_file(self):
""" just serve files from ./file/ """
fname = self.path[len("/file/"):]
if '/' in fname:
return self.send_error(status.BAD_REQUEST)
fpath = files_dir / fname
if not fpath.exists():
return self.send_error(status.NOT_FOUND)
with fpath.open('rb') as f:
contenttype = 'application/octet-stream'
if fname.endswith('.css'):
contenttype = 'text/css'
self.respond_ok(f, {'Content-Type': contenttype})
A nice side-effect here is that it’s not necessary to read file to memory at all, shutil.copyfileobj()
can take care of this through sendfile(2).
General ergonomics compared to Go
- absence of any complex examples in the documentation. This might be intentional: to discourage production usage, which is assumed by default in Go.
- massive documentation with several big classes with directly accessible state; you need to read through and consult the source code to understand what is going on: what methods to call, in what sequence, what methods are low-level and high-level, it was not clear immediately how to read the request body and write the response (which is immediately obvious in Go:
http.Request
andhttp.ResponseWriter
as parameters to a handler); - not a very complex, but non-trivial hierarchy of classes which are used to access complex state and run custom methods (instead of clean and isolated callbacks in Go with clear inputs and outputs);
- a very bare HTTP implementation, without essentials like parsing form fields (Go’s
http.Request.FormValue()
) and redirects (justhttp.Redirect()
in Go); - no html templating with loops, no html character escaping, compared to
http/template
in Go;
Conclusion
The source code of pywiki may be found here: https://git.dmytrish.net/lang-learn/pywiki.
I feel like I have written my own buggy and incomplete micro webframework at this point: low-level http manipulation, absence of good url routing, manual parsing, etc; http.server
is clearly not supposed to be extended further and should be a finished demo of what can be done using other parts of the standard library.
There is no concurrency here: everything is blocking (although e.g. Flask works the same way); even though Python has asyncio
now, the standard library does not ship an asyncio http server, which is a shame. Go gives you production-grade concurrency for free.
I haven’t built any security checks into my code, which might be disastrous.
So: you can write primitive web-applications using only the Python standard library. Should you do it? Judge for yourself.