Japanese Language, Buddhist Sutras and Ruby Programming

A while back, I talked about my efforts to get a full, liturgical version of the Amitabha Sutra, one of my favorite Buddhist texts, online with both Chinese characters and Japanese-romanized reading. Because the sutra is so long, it is not a matter of copy/pasting and writing HTML yourself. It’s too hard. So, I wrote a Perl script that would parse the romanized text, and put all the HTML tags necessary.

Trouble is, I couldn’t make it parse the Chinese characters because they’re UTF8 encoded, not ASCII text. UTF8 characters can be multiple bytes long, and using simple tools like split() in Perl can cause a single Chinese character to get split into two, unusable, bytes of gibberish. Perl can process Unicode, but it doesn’t come naturally, and I eventually gave up and tried to copy/paste the Chinese characters by hand for a while, but gave up on that too. It was just too long.

But lately, after exploring Python language, I tried to revive this old project, and got much closer. However, Python’s Japanese language text-processing requires modules I couldn’t use on my Linux distribution (Mint Linux), and I decided to try a different language again: Ruby.

Ruby, ironically, was designed by a Japanese developer. It’s designed for English, but still handles UTF-8 a lot more easily, and is a pretty nice language to learn in general. So, after playing on the Web a couple nights, I came up with this amateur script:

# encoding: UTF-8

word = Array.new
file = File.new(ARGV[0], "r")

while(line = file.gets())
word = line.split(//u)
i=0
for i in (0...word.length)
print "<td>#{word[i]}</td>"
i=i+1
if i % 5 == 0 and i % 10 != 0 then
print "<td>&nbsp;</td>"
elsif i % 10 == 0 and i > 0
print "\n"
end
end
end

file.close

If I take output from the Amitabha Sutra text on Wikipedia Japan, copy it into a text file, remove all spaces and unwanted characters, I have a plain-text file with a long, long string of Chinese characters. Using the script above, I could parse that, and add HTML tags around it like so:

<td>等</td><td>法</td><td>其</td><td>土</td><td>衆</td><td>&nbsp;</td><td>生</td><td>聞</td><td>是</td><td>音</td><td>已</td>

Then, it’s just simply copying and pasting each line into the Amitabha Sutra I am writing for the blog! This approach took more work up-front, but saved me weeks, probably months of copying and pasting each character by hand! At some point, I hope to move on to other sutras as well and get them “stamped out” for liturgical use by other people, but first I want to revise the script to get the Chinese and romanized text all organized into HTML correctly the first time. Then it’s a simple copy-paste right into the blog! :)

I haven’t finished copying this one yet, but already I’ve made a lot more progress than before. As my old boss used to say: work smarter, not harder. He was right. :)

Namu Amida Butsu

About these ads

About Doug 陀愚

A Buddhist, Father and Japanophile / Koreaphile. The total package.
This entry was posted in Buddhism, Japanese, Jodo Shinshu, Jodo Shu, Language, Linux, Technology, Uncategorized. Bookmark the permalink.

4 Responses to Japanese Language, Buddhist Sutras and Ruby Programming

  1. John says:

    Hello,

    sorry, this is not tested, but it’s a bit shorter :

    https://gist.github.com/1138995

    Happy hacking !

  2. Doug 陀愚 says:

    Oh! I love it! I liked how you collapsed the ARGV[0] value right into the File.readlines part. I am new to Ruby, so I definitely appreciate the example and I’ll have to test it out. I’ll update and let you know.

    Thank you and happy hacking to you too!

  3. gdsdfg says:

    Please, could you do same with a Chinese-romanized version?

  4. Doug 陀愚 says:

    Hello and welcome to the JLR. I would love to post in other languages, but I don’t have the time or language resources to do it at the moment, though I will definitely consider this in the future.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s