No, this post is not about Ruby on Rails (they just released version 1.0, by the way), but it’s half about rails, as in “railways”, and half about Ruby the language.
With respect to the former, I was finally able to book my railway trip to Rome, despite my previously reported problems. I enjoyed traveling by train much more than traveling by plane. It’s true that it takes more time, but when you compare having to wait in line at the check-in counter, at the security control, at the gate, at the plane door, then traveling for one hour in a crammed space and finally having to wait for your luggage, to sitting quite comfortably for four hours, at half the price, the choice is clear.
Ruby die-hards might comment that the experience of traveling by plane is akin to programming in Java, which brings us to the second half of this post. While traveling, I did a bit of Ruby programming, just for fun. One thing that struck me negatively about Ruby is learning that Ruby strings are made up of 8-bit bytes. Uh-oh, I smell trouble ahead. Indeed, I hit trouble as soon as I tried to parse (using Lucas Carlson’s SimpleRSS library) some RSS feeds that used different encodings (UTF-8 vs. ISO-8859-1, for example). If I were using Java and a Java XML parser, I’m pretty sure that the strings containing the text values extracted from the feed would have all been Unicode string and I would have had no problems mixing them or storing them in a UTF-8 encoded database.
I’m pretty sure this problem is mostly due to my ignorance of Ruby, but still I wonder whether using 8-bit characters in the era of globalization was a wise decision. I’d rather have Java’s 16-bit characters, if possible.
Update: Got it to work by determining the original encoding using open-uri’s charset method and iconv to convert between it and UTF-8. Suboptimal, but it works.


on the encoding issue ruby is far behind other platforms as of now, but as for xml REXML handles encoding very well.
I have the feeling simple-rss just bypasses REXML for speed/simplicity reasons.
The world is ascii7 for many people