[MLUG] perl, postgres, and Russian UTF-8

Zbigniew Koziol softquake на gmail.com
Сб Май 10 02:02:40 MSD 2008


As some of you may remember, I am going to live in Russia some day
soon. I know Russian but I can not type.

I am quite good in perl. But I had no idea that it is so enormously
difficult to program in perl when using UTF-8 encoding. I learn that
slowly and painfully. Actually, the problem concerns, probably, also
other programming languages, possibly PHP too, and other as well. What
may be obvious for many of you is not obvious to me, however.

I was thinking that there must be good perl programmers here on this
list who already know solutions to problems and these solutions might
even be obvious for them. Thats why I am asking.

Are there perhaps libraries/packages around that help to deal with
UTF-8 and perl, especially in context of Russian language? For
instance, I would happily use an existing library for converting to
uppercase or lowercase (actually, these I already wrote myself; this
is just an example).  But another example: how to take a substring of
a string when input string may contain both UTF-8 Russian characters
and ASCII ? Thats very tricky! Postgres for instance expects fixed
length strings for varchar but I see that I can not just cut the
string after fixed length bytes because that will often cut one of
bytes that belongs to two-byte code and smart postgres programmers do
not allow inserting a s*hit to database...

I would appreciate some advise.


Подробная информация о списке рассылки MLUG