2012
3
May

Blogofile Improvements

I finally got around to fixing a couple of minor annoyances I have with Blogofile. These fixes apply to the plugins development branch of Blogofile and the Blogofile_blog plugin, but they should be easily backport-able to the Blogofile master branch. I opened pull requests for these changes on Github and I'm happy to report that @EnigmaCurry merged the Blogofile_blog ones within hours! But since Blogofile development and this blog have been languishing for a while, I figured I should write about the changes here.

Using Python 2.7 and 3.2 with the PYTHONWARNINGS environment variable set to default reveals that both Blogofile and the Blogofile_blog plugin raise ResourceWarning exceptions when the blogofile build command is run. Admittedly, this is a really minor issue, but seeing a screen full of tracebacks every time I build my site is annoying, and it can obscure more serious problems. Those warnings are easily silenced by changing the offending open statements to use with statement context managers. The fix for Blogofile is in pull request 119 and the one for Blogofile_blog is in pull request 7.

The blogofile blog create post command creates a file with the extension .markdown by default but Blogofile also supports RST and Textile markup. I use RST and really want my newly created post files to have the extension .rst so that emacs goes to rst-mode automatically when I open a post file for editing. Again, a minor annoyance, and my fix was easily implemented. I chose to add a blog.post.default_markup config option. With blog.post.default_markup = 'rst' in my site's _config.py file my new posts get the .rst extension I want. If blog.post.default_markup is not set the created post file extension defaults to .markdown as before. This feature is in pull request 8.

I really like Blogofile and using it's plugins branch was my spur to get serious about using Python 3 and blogging again. So, I'm really happy to see @EnigmaCurry accepting pull requests again on the project. Hooray!!

Read and Post Comments
2012
9
Feb

CSV Downloads from Web Apps

One of the users of an intranet app I maintain was using copy/paste to put data from the app into Excel. She asked if there was a better way. The pages she was copying from have tables of datastore objects, so adding a CSV download feature was an obvious solution. Providing that feature turned out to be pretty easy with the help of the StringIO and csv modules in the standard library.

While the code below is from a Pylons controller class, I can't see it being difficult to implement this in Django, Pyramid, or other web framework stacks.

The request handler for the CSV download looks like:

import cString
import csv

def csv_download(self, ...):
    csv_buffer = cStringIO.StringIO()
    csv_writer = csv.writer(csv_buffer)
    header = self._build csv_header(...)
    csv_writer.writerow(header)
    query_result = self.get_data_for_csv(...)
    for result in query_result:
        row = self._build_csv_row(result)
        csv_writer.writerow(row)
    content = csv_buffer.getvalue()
    csv_buffer.close()
    response.content_type = 'text/csv; charset=utf-8'
    response.content_disposition = (
        'attachment; filename="your_file.csv"')
    return content

This method uses StringIO to set up a file-like memory buffer, and instantiates a default CSV writer to write to the buffer. Next we build the header row and write it to the buffer. Then we get an iterator for the content that we want to write the the CSV file from the datastore, and format and write it to the buffer, one line at a time. Finally we get the CSV data from the buffer, set the response headers appropriately, and return the CSV data for download.

One thing I'm uncertain about: Is it necessary to explicitly call close method on a StringIO instance, or could I just do:

return csv_buffer.getvalue()

and let garbage collection take care of releasing the memory allocated for csv_buffer?

To get Excel to play nice with UTF-8 encoded data it's necessary to include 3 specific bytes as a Byte Order Mark (BOM) at the beginning of the file. I did that by prepending them to the heading string for the first column:

def _build_csv_header(self, ...)
    UTF_8_BOM = '\xef\xbb\xbf'
    header = [
        UTF_8_BOM + 'Column 1 Heading',
        ...
    ]
    return header

Building the content for each row of the CSV file is just a matter of formatting each query result into an array of strings. Fields containing non-ASCII characters stored as Unicode need to be encoded to UTF-8:

def _build_csv_row(self, result)
    row = [
        result.column_1_value,
        '{:%Y-%m-%d}'.format(result.some_date)
        ...
        result.unicode_value.encode('utf-8'),
        ...
    ]
    return row

I had an additional complication to deal with. The data for one of the CSV columns is stored in the database as HTML fragments that may contain non-ASCII characters. That data had to be rendered to Unicode before it could be added to the CSV row (encoded as UTF-8). It turns out that the Python standard library provides a fairly painless way of handling that complication too, but I'll leave that for another post.

Read and Post Comments
2012
23
Jan

YAML Fixtures in Django Tests

I have a Django project called RandoPony that handles event registration for the BC Randonneurs Cycling Club. It's on an annual release cycle; i.e. I spend the few weeks that pass for winter in Vancouver updating the project. That's when I bump it to the latest version of Django, fixing minor bugs, and adding new features that I and other users have come up with during the preceding year. Once I release a new version for the new year, I usually don't have to worry about the code until the next winter. The pony just works, facilitating people doing hundreds of thousands of kilometres of crazy long cycling events, and we like it that way!

My workflow at the beginning of the annual update looks something like:

  • Create a new virtualenv
  • Install the latest version of Django and other project dependencies
  • Read the release notes for the Django releases since the one I was working with last
  • Run the RandoPony test suite to find deprecations and other obvious breakage
  • Start hacking

I recently started working on the 2012 release of RandoPony and was blown away when I ran the test suite because there were over 60 failing tests! It took me way longer than it should have to figure out why things were so massively broken.

The problem was that the test fixtures weren't being installed. They weren't being installed because they are YAML files and I had forgotten to install PyYAML in the virtualenv. What's really annoying is that the fixtures files were being ignored silently.

It turns out that if you specify a YAML fixture for a Django TestCase:

class TestPopulairesListView(django.test.TestCase):
    """Functional tests for populaires-list view.
    """
    fixtures = ['populaires']

without giving the fixture file a .yaml extension, the fixture will be silently ignored if PyYAML isn't installed. Really, Django?!

So, the number 1 thing that I should have done to save myself from this thrash was to explicitly specify the serialization format of my fixtures:

class TestPopulairesListView(django.test.TestCase):
    """Functional tests for populaires-list view.
    """
    fixtures = ['populaires.yaml']

Then the Django test runner would have told me:

Problem installing fixture 'populaires': yaml is not a known
serialization format.

I'll take the hit for ignoring the PEP 20 aphorism "Explicit is better than implicit". But shouldn't Django get docked for "Errors should never pass silently"?

The other thing I should have done was use a pip requirements file for the project.

RandoPony has 2 requirements files now. requirements.txt for the packages required for the production deployment, and requirements-dev.txt for the additional packages, like PyYAML, required for development work. Now I just have to hope that I'm smart enough when I start work on the 2013 release to do:

(randopony)$ pip install -r requirements.txt
(randopony)$ pip install -r requirements-dev.txt
Read and Post Comments
2012
8
Jan

Stayin' Alive

Stayin' Alive

MySQL Connection Timeout and SQLAlchemy Connection Pool Recycling

Recently a Pyramid web app surprised me with some odd behaviour. The app uses SQLAlchemy to interface with a MySQL database. The first request of the day would always fail with a traceback that ended with some sort of database connection failure, though not always the same error. This was happening in the development environment where I was the only one sending requests to the server. Hitting refresh, or sending another request resulted in the expected response, and things would continue to work that way - until the next morning...

The subtitle gives lots of hints about what was going on, so if you're still reading at this point I'll assume that you're still as puzzled as I was.

It took a surprising number of tries on Google before I found this section of the SQLAlchemy docs which explains all. To summarize, the default configuration of MySQL drops connections on which there has been no activity for 8 hours. SQLAlchemy provides the pool_recycle parameter for its engine creation functions as a way of working around that behaviour. (Although, as noted in the SQLAlchemy Engine creation API docs full description of pool_recycle, the behaviour is also configurable at the MySQLDB connection, and database configuration levels too.)

Since my Pyramid app uses the sqlalchemy.engine_from_config function, all I had to do was add:

sqlalchemy.pool_recycle = 3600

to my development.ini and production.ini config files, and ... problem solved.

P.S. Sorry - I couldn't resist the Bee Gees reference in the title. It's an age thing...

Read and Post Comments
2011
30
Dec

2012 Python Meme

Following Tarek Ziade's lead:

  1. What’s the coolest Python application, framework or library you have discovered in 2011?

    It's a toss-up between Pyramid and requests.

    For the admittedly small, specialized application that I've used Pyramid for, I like how the framework interface is largely confined to the __init__.py module. Outside of that I find myself just writing Python code to get stuff done and throwing in a few decorators here and there to link into the framework. I find the Pyramid docs to be really well organized, informative, and complete. They worked for me at the introductory level (though I did arrive with quite a bit of experience of other Python web frameworks) and continue to work as I dig deeper and do more advanced things.

    Requests has made my life so much better in several projects, whether its collecting data from well structured web services or scraping hydrometric data from a particularly annoying Government of Canada site.

  2. What new programming technique did you learn in 2011?

    I got a lot better at writing simple, clean, uncoupled unit tests, inspired in large part by the Pyramid unit testing guidelines. I shifted from being a skeptic to a proponent of mocking thanks to the mock library. I was spurred in that by the need to refactor a test suite that had become way too slow to be useful.

    I also got a lot more proficient in JavaScript, using it for client-side stuff in web apps, database views for CouchDB, and a Firefox add-on.

  3. What’s the name of the open source project you contributed the most in 2011? What did you do?

    CouchDBkit. I spent most of my time at the PyCon 2011 sprints adding the SetProperty and LazySet class to store Python sets as lists of unique elements in CouchDB. I also added some missing Python list methods to the LazyList class. It was a good learning experience in the realm of subclassing Python builtins. It was also cool to work sitting beside @benoitc (the CouchDBkit lead developer) in contrast to communicating electronically across 9 time zones.

  4. What was the Python blog or website you read the most in 2011?

    The Planet Python feed, by far.

  5. What are the three top things you want to learn in 2012?

    • Message passing systems. I've got a Django project that needs to get some asynchronicity and I'm planning to explore Celery for that.
    • More message passing systems. I've got some devops issues around server synchronization and failover that I think I might be able to address with ZeroMQ.
    • Python 3. Because it's time! Most of the libraries that I use have already been ported, so the transition shouldn't be overwhelmingly difficult, and perhaps I can lend a hand porting some of the libraries that I need that haven't made the jump yet.
  6. What are the top software, app or lib you wish someone would write in 2012?

    I wish there were more open source libraries and applications in the home automation realm. One of my first personal Python projects was a wrapper around heyu to make our house look lived in when we are away. It's still working okay after nearly 6 years, but the X10 hardware it interfaces with is getting rather long in the tooth. Sadly, newer home automation hardware and protocols seem to be mired in the muck of "vendor associations" that only pay lip service to openness.

Want to do your own list ? here’s how:

  • copy-paste the questions and answer to them in your blog
  • tweet it with the #2012pythonmeme hashtag
Read and Post Comments
Next Page »