Monday, August 27, 2012

PostgreSQL 9.1 Change data base encoding from ASCII to UTF-8

In my particular case, I have an ASCII data base that is being updated from a windows application and also from a Linux web application, in python.

In this scenario, the key is to understand that the data base contains UNICODE characters and also contains non-ascii characters, say LATIN1. This mix is a problem.

Best solution I found on Linux:

Precondition: Perform a dump
pg_dump -h localhost MyASCIIDataBase > /tmp/db.sql

1-  Gnu iconv

Let iconv do the magic:
iconv -f iso-8859-1 -t utf-8 /tmp/db.sql -o /tmp/iconv.sql

Create an empty Unicode(UTF-8) database and run:
psql -h localhost MyIconvUTFDataBase < iconv.sql

Note: This solution will work for non-ascii characters, such as latin1 or windows charsets, but you will have trouble with UTF-8 characters.

2- Gnu recode (Under Ubuntu, run apt-get install recode)

Lets make recode work on the dump file:
recode -f UTF-8 /tmp/db.sql

Next, create another empty Unicode(UTF-8) database and run:
psql -h localhost MyUTFDataBase < db.sql

Note: This solution will work for all UTF-8 characters, but non-ascii characters will be deleted.


Finally, you can combine these two solutions to get what you want, one database with all UTF-8 characters.
The way I did it was to dump each single table from MyIconvUTFDataBase that was updated from windows (with latin1), and then restore each single table on MyUTFDataBase (restored from recode):


Steps to do it for a single table:

1- Remove table form MyUTFDataBase
2- Dump the table from the iconv data base:
pg_dump -h localhost -t '"schema"."tableName"' MyIconvUTFDataBase -f /tmp/single_table.sql
3- Restore the table to the "recode" data base (MyUTFDataBase):
psql -f single_table.sql MyUTFDataBase


Hope this post might be helpful for you.

No comments:

Post a Comment