john chesley

fresh stuff

destinations

favorites

uploading files and in-memory file processing

in all my web development experience i’ve never needed to include a file upload in a form, until the other day. i needed to be able to upload a file (an image file), convert it to PDF format, and hand that back to the browser.

using python’s imaging library, it’s pretty easy to do the conversion:


>>> from PIL import Image
>>> img = Image.open("image.png")
>>> img.mode = 'RGB'
>>> img.save("newfile.pdf")

since PNG images have a transparency layer, i change the mode by hand to “RGB” from “RGBA” (the A stands for Alpha, that transparency layer). PDF files don’t support the alpha layer, so it’s ok to just drop it.

PIL automagically detects the format to save it in, and saves a PDF file. Beauty! But in processing a web form, i don’t want to be writing to the hard drive all the time, and in fact that’s not a good idea at all, anyway! i needed a way to write a file without accessing the disk. python’s built-in StringIO library came in handy:


>>> import StringIO
>>> buf = StringIO.StringIO()
>>> img.save(buf, 'pdf')

this time i had to tell the imaging library to save the image as a PDF, since it isn’t able to infer that from the filename. there’s no file name to infer from!

the PDF file is now stored in buf, and can be accessed with buf.getvalue(). now, reading the file from an uploaded form, and sending the PDF file back are pretty simple:


def someview(request): f = request.FILES['file’] img = Image.open(f) img.load() img.mode = 'RGB’

buf = StringIO.StringIO() img.save(buf, 'pdf’) response = HttpResponse(buf.getvalue(), mimetype=“application/pdf”) response['Content-Disposition’] = “attachment; filename=%s” % (f.name + “.pdf”) return response

this stuff is all in the django view code that handles my form page. notice i used img.load() here, while i didn’t earlier. this is because the open function is lazy, and the file isn’t actually loaded until it’s accessed. i’m still not sure why this wasn’t needed when doing things from the command line, but i was getting an “unknown raw format” error without it. interesting…

setting the Content-Disposition header makes the browser download the file, and also gives it a filename with .pdf appended, which is nice. the default filename (if you don’t set Content-Disposition) is the last part of the URL of the page, which in my case wasn’t desirable at all.

now whoever needs to can go to this form, upload their image file, and get an easily-printable PDF file handed right back. The nice thing about using PIL is that, presumably, any image format that is supported can be uploaded, which means just about any. :)