fredag den 23. maj 2008

Export of member data from Plone as a CSV file

I needed to export member data from Plone for mail merging etc. There was nothing really suitable available so I wrote this small product for it.

The manual is:

Install "portal_memberdata_export" with the quickinstaller.

Export by calling the memberdataExportCSV script, like

You need to have the "Manage Portal" permission.

So far it is only available from the collective.

søndag den 30. marts 2008

Python Unicode lessons from the school of hard knocks

Unicode is becomming increasingly popular, but is often misunderstood. That is a pitty, as it is really not that difficult. Especially since unicode is a really powerfull tool to know, and it becomes the standard string type in Python 3K.

There are many unicode articles around, but they are often very "theoretical" and computer science like. So I have collected af few examples I have learned in the school of hard knocks, that are practically oriented.

I will use "unicode" as shorthand for a unicode string, and "string" as a shorthand for a plain Python string. I live in Europe and am used to using latin-1 as default encoding. Whenever I write latin-1 just substitute that with your own language default encoding. Should work just fine.

I remember when I tried to understand unicode that I could not get my head around when to use encode and decode. In these examples I practically only go from unicode to string. So I use the encode() method for most examples. I believe this makes it easier to understand and remember.

If there is popular demand I might write an article using only the decode() method.

Working with unicode in code

First examples from a python console:

>>> 'this is a python string'
'this is a python string'

>>> u'this is a python unicode string'
u'this is a python unicode string'

Not much difference there. That is because they both contain only ascii characters. When I try to insert a danish character it changes:

>>> 'this is a pythøn string'
'this is a pyth\xc3\xb8n string'

>>> u'this is a pythøn unicode string'
u'this is a pyth\xf8n unicode string'

The string example shows something interresting as it shows the ø as '\xc3\xb8'. Whenever you see international characters showing up as two encoded characters/bytes like this, it is usually a sign that you are seeing a utf-8 encoded string.

It is by no means the law, but it is a good rule of thumb.

In this example the string is encoded to utf-8 because that is the default I use in the console window that runs the examples.

Thinking about unicode

A good way to think about the difference of unicode and string is as a text and a binary file.

Unicode is the text and the string is the binary file format. So when you want to save your text somewhere you save it in a file format. Different file formats you can use are latin-1 (iso-8859-1), ascii, utf-8 etc.

You convert to the correct file format by using the 'encode()' method. Like this:

>>> u'this is a pythøn unicode string'.encode('latin-1')
'this is a pyth\xf8n unicode string'

>>> u'this is a pythøn unicode string'.encode('utf-8')
'this is a pyth\xc3\xb8n unicode string'

>>> u'this is a pythøn unicode string'.encode('ascii')
Traceback (most recent call last):
File "", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 14: ordinal not in range(128)

Whoops. ascii is an illegal file format for text with international characters. That is an important lesson.

You can get from unicode to every other encoding without any other information about the unicode string. If the encoding supports the characters your unicode use. So unicode is the singular type from and to which all the others text formats can be converted.

unicode to anything

String encodings are archeological. Well sometimes you even get the feeling that they are geological. Especially when writing something to support an RFC. Each geological layer is there for backwards compatibility.

First there was ascii, then iso-8859-1 (latin-1), iso-8859-15 (adds the € sign and more) and finally utf-8. They are mostly supported in that order too. The older the encoding, the better support it has in software out there in the wild.

Theoretically you can just use utf-8 for it all end be done with it. But I have had to rewrite almost any email/maillist application I have written to support latin-1 or other encodings. Many mail clients supports utf-8 correctly these days. But many web based clients do not. Neither hotmail nor gmail does actually.

Most likely you will be writing software that works perfectly, and passes every test and mail client you and your customer have in your organisations. But when the brand new maillist goes live, customers complains about unreadable characters.

So it can make good sense to try and encode unicode into a gradually more "modern" encoding. Trying the older encodings first, and if that fails then try newer ones. This function does that::

# -*- coding: utf-8 -*-
>>> def optimal_encode(st):
>>> for encoding in ['ascii','iso-8859-1','iso-8859-15','utf-8']:
>>> try:
>>> return (encoding, st.encode(encoding))
>>> except UnicodeEncodeError:
>>> pass
>>> raise UnicodeError, 'Could not find encoding'

>>> st = u'this is a pythøn unicod€ string'
>>> print optimal_encode(st)

('iso-8859-15', 'this is a pyth\xf8n unicod\xa4 string')

The € sign makes it return 'iso-8859-15'. If that encoding was not in the list, it would return::

('utf-8', 'this is a pyth\xc3\xb8n unicod\xe2\x82\xac string')

Working with unicode in your editor

Any modern text editor can handle utf-8 encoding. So preferably you should use that in your Python files. You tell Python that your files are utf-8 encoded by adding an encoding declaration to the top of you file.

# -*- coding: utf-8 -*-

When this is done, you can write international characters directly in your source code and every string in your file is utf-8 encoded. This makes it true that:

u'pythøn'.encode('utf-8') == 'pythøn'

Saving unicode in files

Unicode only exists in memory. You cannot write it to a text file unless you write it as pickled data. But nobody else would then be able to read it, and you cannot look at it in an ordinary text editor.

To put your data into a file, you must encode it first.

>>> st
u'this is a pyth\xf8n unicode string'
>>> f = open('unicodetest.txt', 'w')
>>> f.write(st)
Traceback (most recent call last):
File "", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 14: ordinal not in range(128)
>>> st_utf8 = st.encode('utf-8')
>>> f.write(st_utf8)
>>> f.close()

When you try to write the unicode string to a file, it tries to convert it to a string and fails. But when you encode it first there is no problem.

There is a special 2 byte code that can be inserted into the beginning of a text file to mark it as some kind of unicode encoded string. It is called a Byte Order Mark (BOM)

There are a few of those for the different encodings, but I have only ever had to use utf-8

>>> import codecs
>>> codecs.BOM_UTF8
>>> f.write(codecs.BOM_UTF8 + st_utf8)

The Byte Order Mark (BOM) is used especially on Windows. Frankly it is a bit of a mess in Python until 2.5. But many text editors can recognize it and then knows automatically that the file is utf-8 encoded.

As far as I can figure it out, the Page Template skin system in Zope does recognize the BOM. So you can use international characters in Page Templates.

At least I have tried editing files that where utf-8 encoded, but the international characters displayed wrong. They looked like: 'this is a pyth\xc3\xb8n string' So Zopes Page Template system got them wrong. Other times I have written them and they looked like 'this is a pythøn string'.

I asume that is due to the difference in having a correct BOM or not that Zope can recognize.

Generally though I work on many different Zope/OS system combos. So when I see these kind of problems in html, I generally take the lazy way out and just use html entities so it looks like: 'this is a pythøn string' :-s

Unicode in html

In html you can represent international characters as both html entities (eg ø) and as encoded strings. Normally you will use utf-8 for encoded strings.

If you choose the encoded strings you must tell what the encoding is. You do this by setting it as meta data in the head:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Cou can then use normal procedures for converting unicode html to utf-8 strings.

Encoding strings

Unicode is not the only python class to have an encode() method. The string object also have one. That can be handy in many cases.

If you need to use a url, or any other string with special characters, as a filename it will cause you problems. Any string can be encoded in a simpler format that can be used as a filename or even as part of a path.

The hex encoding is the simplest, but base64 is the most space efficient.

>>> ''.encode('hex')

>>> ''.encode('base64')

A caveat here is that base64 always have a newline character in the end, and it might have a '=' as padding. So you need to modify it a bit to remove those:

>>> ''.encode('base64')[:-1].strip('=')

And when you want to decode it again you must always append a padding character. Otherwise it will fail about 50% of the time ...:

>>> ('aHR0cDovL214bS1tYWQtc2NpZW5jZS5ibG9nc3BvdC5jb20v' + '=').decode('base64')

I once made some mailinglist software in Zope, where The best solution was to save the subscribers with their emails as the id of the subscriber object. But Zope does not accept @ in a path. All I then had to do to make it work was:

>>> id = ''.encode('hex')
>>> id

What have this got to do with unicode you might ask? Well you can use this method to use unicode string as filenames without needing to remove special characters.

First make the unicode string, then convert it to utf-8 and then convert that to hex or base64.

>>> st = u'this is a pythøn unicod€ string/text'
>>> st_utf8 = st.encode('utf-8')
>>> st_utf8
'this is a pyth\xc3\xb8n unicod\xe2\x82\xac string/text'
>>> st_hex = st_utf8.encode('hex')
>>> st_hex
>>> st == st_hex.decode('hex').decode('utf-8')

Unicode in emails

Once upon a time when Bill Gates invented email he thought "7 bits is more than enough for every character." Well ok. Maybe it was not Bill Gates. But someone must have thought it. As that is the foundation that email is built upon.

Body Text

Actully, email messages can contain 8 bit characters. But SMTP can only transmit 7 bit messages. And as Dolly Parton once said: You cannot put 10 punds of potatoes into a five pound sack. So if your are sending your email over SMTP. like everyone is, you must convert your code to 7 bits. This is called content transfer encoding.

First you make the body text in unicode.

>>> st = u'this is a pythøn unicode string'

Then you must convert it to the string encoding you want to send it as. Like latin-1, utf-8 etc.

>>> st_latin1 = st.encode('iso-8859-1')

Python recognise 'latin-1' as an encoding. Mail systems does not, so it is safer to use 'iso-8859-1'. Python will translate it automatically in the email module, but I have been bitten when composing emails wihtout it, so I have made it a habbit to always use the long form.

Latin-1 is still an 8 bit string. So we must content transfer encode that to a 7 bit ascii string.

A simple email message is made like this. Note that the set_payload() method does not do any encoding. You merely tell it what encoding your string is already in:

>>> st = u'this is a pythøn unicode string'
>>> st_latin1 = st.encode('iso-8859-1')
>>> from email.Message import Message
>>> msg = Message()
>>> msg.set_payload(st_latin1, 'iso-8859-1')
>>> str(msg)
'From nobody Sun Mar 30 15:13:31 2008\nMIME-Version: 1.0\nContent-Type: text/plain; charset="iso-8859-1"\nContent-Transfer-Encoding: quoted-printable\n\nthis is a pyth=F8n unicode string'

The email module can also encode as base64, but that is mostly used for 8 bit binary content as it makes the files smaller than quoted-printable would.

You could use base64 for all email content types, but keeping text messages human readable is simply more practical.

Sending HTML

Sending HTML works exactly like plain text from a unicode point of view. HTML files can also be made as unicode and then encoded as utf-8 or latin-1 etc. Just set the content type:

>>> del msg['Content-Type']
>>> msg.add_header('Content-Type', 'text/html', charset='iso-8859-1')
>>> str(msg)
'From nobody Sun Mar 30 15:23:14 2008\nMIME-Version: 1.0\nContent-Transfer-Encoding: quoted-printable\nContent-Transfer-Encoding: base64\nContent-Type: text/html; charset="iso-8859-1"\n\ndGhpcyBpcyBhIHB5dGg9RjhuIHVuaWNvZGUgc3RyaW5n'

Based on the conent type the email module chooses to convert html to base64 for you.

Email Headers

Email headers are special, and especially ugly. You press send on your new maillist software that has worked correctly during testing. Then all of a sudden there is one of them nasty international characters in the subject.

You did set the charset to iso-8859-1 in the add_header() method. So why does it fail?

Well because each header field needs to be set individually. The content of the headers has nothing to do with the message content of the text (payload).

you can set it like this:

>>> from email.Header import Header
>>> from email.Message import Message
>>> msg = Message()
>>> subject = u'This is a pythøn subject'
>>> subject_latin1 = subject.encode('latin-1')
>>> h = Header(subject_latin1, 'iso-8859-1')
>>> msg['Subject'] = h
>>> msg.as_string()
'Subject: =?iso-8859-1?q?This_is_a_pyth=F8n_subject?=\n\n'

The email headers like To, From etc. are a little bit tricky.

>>> from email.Utils import formataddr
>>> email = u''.encode('latin-1')
>>> name = u'max møller'.encode('latin-1')
>>> From = formataddr( (name, email) )
>>> From
'max m\xf8ller '
>>> from email.Header import Header
>>> h = Header(From, 'iso-8859-1')
>>> str(h)
>>> msg = Message()
>>> msg['From'] = From
>>> msg.as_string()
'From: max m\xf8ller \n\n'

UTF-7 and IMAP

If you are writing an email client in Python you will most likely need to support Imap. Imap has folders, and their names use a special encoding not seen anywhere else. It is an encoding based on utf-7. It is not a common problem so I will not use much space on it here. But I have written an encoder for it that you can get here:

>>> from imapUTF7 import imapUTF7Encode
>>> st_utf7_imap = imapUTF7Encode(st)

More about the issue can be found here:

5.1.3. Mailbox International Naming Convention

More info and links

onsdag den 6. februar 2008

Python eggs - a Simple Introduction

Python eggs used to be the wave of the future. But for Zope and Plone developers this has evolved into a true tsunami. They are everywhere now.

Yet there is a lot of confusion of what they are and how to use them.

To understand them, you need to understand Pythons way of organizing code files.


The basic unit of code reusability in Python: a block of code imported by some other code. It is most often a module written in Python and contained in a single .py file. Also called a script.

Let us say that this contains a function:

def helloworld():
print 'Hello World'

Then it is possible to import that function like this:

from hello import helloworld


A module that contains other modules; typically contained in a directory in the filesystem and distinguished from other directories by the presence of a file

A step up from a script is a module, which is a library with an file in it.


You can then put the helloworld function into the script, and import it like you did before:

from hello import helloworld

You could also keep it in the file from before.


But then you must import it like this:

from hello.hello import helloworld

Unless you import it into the module namespace. You do this in the script:

from hello import helloworld

Then you will once more be able to write:

from hello import helloworld

This ensures the you can reorganize your code and still remain backwards compatibility.

You can have modules inside modules. A python library is just a module, or a structure of modules.

A structure of modules is called a package.


So far it has all been about writing and organizing Python code. But the next step is distribution af said code. First step in this direction is distutils.

Distutils was written to have a single unified way to install Python modules and packages. Basically you just cd to the directory of the module and write:

python install

Then the module will automagically install itself in the python it was enwoked with.

Distutils defines a directory/file structure outside your module, that has nothing to do with the module per se, but is used distribute the module.

If you want to make a distribution of the hello module you must put it inside a directory that also contains a file.


The could contains this code, that runs the setup function:

from distutils.core import setup


You then run the code like this:

python sdist

And it will create a new directory structure like this:


The hello-1.0.tar.gz then contains the package distribution. It has this structure when unpacked:


The hello package is inside it. It is just a copy of your own package with no changes. is there too. It is also just a copy of the one you wrote to create the package with. The clever thing about distutils is that it can use the same script to create the distribution as it use to install the package.

PKG-INFO is a new file and it just contains some metadata for the package. Those can be set in the


Setuptools is built on top of distutils. It makes it possible to save modules in pypi, or somewhere else. It uses eggs for distribution.


An egg is created very much like a distutil package. You just have to change a line in your

from setuptools import setup # this is new


Then you call it with:

python bdist_egg

And you get a new file in your dist directory:


This is the egg that you can put on your website, or even better, publish to pypi. you can get an account on pypi, and then you will be able to add your eggs via the command line like: bdist_egg upload

Easy Install

When you have uploaded your egg, all the world is able to use it by installing it with easy_install:

easy_install hello

Easy install will then find the egg on pypi, download it, compile if necessary and add it to your sys.path so that Python will find it.


Buildout is a configuration based system for making complicated but repeatable setups for large systems.

Phew. That sounds complicated. Well buildout can be. But what is interresting from an eggs based point of view is that you configure what eggs are to be installed in your system.

Inside your buildout.cfg you can have a line like:

eggs =

Then buildout will automatically download and install the hello package in your system.

Buildouts can themself be distributed as eggs, and you can extend a buildout to add new packages. This is how you can install a Plone buildout and then add your own packages to it. Basically creating your own custom Plone distributions.


Python modules and packages





torsdag den 31. januar 2008

Installing ZMySQLDA in Zope using buildout on Ubuntu and perhaps Debian in general

Another note to future self. This is how I did it.


To make Zope work with Mysql I needed to compile the database adapter. For this I needed the mysql dev libraries. On Ubuntu that is done like this:
aptitude install libmysql++-dev


For ZMySQLDA to work I needed the plain Python MySQL-Python library installed and compiled.

Luckily the MySQL-Python package is distributed as an egg, so it can just be added to buildout.cfg
eggs =
... other packages
Buildout will then download, compile and install the package automatically.


ZMySQLDA is old. Latest release 2.0.8 is from 2001. But that is just because it is stable. No worry.

When I tried to install it with by adding it in the buildout.cfg:
urls =
It made a small tarbomb and put itself inside a directory structure like:
And Zope was not be able to find it.

Buildout to the rescue again. Radim Novotny has made a buildout recipe for doing it right:
So all I needed was a few additions to the buildout.cfg

parts =
recipe = cns.recipe.zmysqlda
target = ${buildout:directory}/products

And then I was good to go.

onsdag den 30. januar 2008

Installing Python, Zope and Plone on Ubuntu 7.10 - Gutsy Gibbon

Last time I installed plone on a Ubuntu system, I wrote a log of what I did. It might be that I am getting old, but I cannot seem to remember how I do it from time to time. So this is as much a note to future me as a help to others.

It is a newer version of this article on my main site.

With a little luck you should be able to run it as a script on your own server, or at least copy/paste the lines into your terminal, and the see where it fails.

# System installs, libraries etc.

# Installing python and plone on ubuntu server 7.10 Gutsy Gibbon
# much of it can probably be used on any debian system.

sudo bash
# as root: install system packages and libraries for compiling python
aptitude install gcc g++ make

# readline is needed for an interactive python prompt
aptitude install libreadline5 libreadline5-dev

# zlib is needed for zope & PIL
aptitude install zlib1g zlib1g-dev

# libjpeg is neded for PIL
aptitude install libjpeg62 libjpeg62-dev

# libssl is needed by buildout for downloading with the https protocol
aptitude install libssl0.9.8 libssl-dev

# create a user called zope.
adduser zope

# zope, plone specific install

# Do the rest as this user
su zope
# from the users home dir /home/zope
cd ~
# create the directory structure
mkdir downloads
mkdir pythons
mkdir instances

# download and install python in the zope users home dirs pythons directory
cd ~/downloads
tar xzf Python-2.4.4.tgz
cd Python-2.4.4
./configure --prefix=/home/zope/pythons/python-2.4.4
make altinstall

# make a symbolic link for this python. This is optional. But you avoid having
# to type "python2.4" instead of "python"
ln -s /home/zope/pythons/python-2.4.4/bin/python2.4 /home/zope/pythons/python-2.4.4/bin/python
# make this the default python to use by pushing it to the front of the path with this command
# you can add the command to the zope users ~/.bashrc
# to automatically use that python at login. I normally do this.

# download and install PIL
cd ~/downloads
tar xfz Imaging-1.1.6.tar.gz
cd Imaging-1.1.6
python build
python install

# download and install easy install
cd ~/downloads

# create a default buildout config file. This way your eggs and downloads will
# automatically end up in the same dir.
mkdir ~/.buildout
vi ~/.buildout/default.cfg

# insert this text (without the "## ") and save
## [buildout]
## executable = /home/zope/pythons/python-2.4.4/bin/python2.4
## eggs-directory = /home/zope/.buildout/eggs
## download-directory = /home/zope/.buildout/downloads

# then install ZopeSkel with easy install
easy_install -U ZopeSkel

# you can see what buildouts can be created with this command
paster create --list-templates

# as I write this, the list looks like this.
#Available templates:
# archetype: A Plone project that uses Archetypes
# basic_namespace: A project with a namespace package
# basic_package: A basic setuptools-enabled package
# basic_zope: A Zope project
# nested_namespace: A project with two nested namespaces.
# paste_deploy: A web application deployed through paste.deploy
# plone: A Plone project
# plone2.5_buildout: A buildout for Plone 2.5 projects
# plone2.5_theme: A Theme for Plone 2.5
# plone2_theme: A Theme Product for Plone 2.1 & Plone 2.5
# plone3_buildout: A buildout for Plone 3 projects
# plone3_portlet: A Plone 3 portlet
# plone3_theme: A Theme for Plone 3.0
# plone_app: A Plone App project
# plone_hosting: Plone hosting: buildout with ZEO and any Plone version
# recipe: A recipe project for zc.buildout

# so to create a plone 3 server you write

cd ~/instances
paster create -t plone3_buildout p3test

# It asks a few questions. If you are in doubt just press enter for default values.
# now run the bootstrap script

cd p3test

# at this point you should check out builout documentation for your options,
# if you want to run anything but a plane plone site.
# you can edit you buildout.cfg file in p3test. There is a good intro at:

# now let buildout install and compile Plone. Zope will be compiled automatically.

./bin/buildout -v

# then start up and test the site with
./bin/instance fg

Good luck!