Monday, February 26, 2007

Introducing translipi

English is inadequate to correctly represent Indian-language words. This problem is especially painful for me, since this blog often deals with subjects whose vocabulary abounds in Indic terms. "Mayamalavagaula." "Kadanakutuhalam." "Nadanamakriya." I rest my case.

I could write these terms in (say) Tamil, but this would put non-Tamil readers at a loss. The script each of us is most comfortable in, is different.

This problem is now solved.

translipi (see the side-bar, if you have not noticed it already) transliterates these terms into the language you are most familiar with. For now, there is dEvanAgari, kannaDa, malayALam, tamizh and telugu. For folks like me who are most comfortable in English, there is also the Roman script with enough diacritical marks stuffed in to specify (almost) every Indic character uniquely.

Do let me know your thoughts!
* * * * *

A doubt: Is the pronunciation of the Tamil characters ந and ன identical (as I have always believed)? If so, is there any grammatical rule which specifies when to use one and when the other? For instance, ன is never used at the beginning of a word and ந never at the end.
- - - - -

Update [3 Mar]: Once Ambarish (see the comments) pointed out how ந and ன are different, it seemed so obvious and logical that I wished to kick myself for not figuring it out before. So we use:
  • ந at the beginning of words (நலம்) and when immediately followed by a dental consonant (தந்தை).
  • ன elsewhere (தினம்).
But on further cogitation, I am a bit confused. Now, by the rules above, how do we explain குடிநீர் (which is unlike தன்னீர் — or is it தண்ணீர்)? Help!

52 comments :

Manjunath said...

I think Kannada is okay. But the plural forms are not properly represented( with English plural form of 's'...and I feel it annoying when I have to add 's' to native terms). I could see that anusvara has been kicked out. I don't quite agree with that!

Ambarish said...

Nope, ந and ன are not identically pronounced. ன is the alveolar nasal, while ந is the dental nasal.

http://en.wikipedia.org/wiki/Alveolar_nasal
http://en.wikipedia.org/wiki/Dental_nasal

This also explains why the dental nasal is always initial, while the alveolar is always medial or terminal.

Only Dravidian languages have both dental and alveolar consonants. Sanskrit, Marathi, Hindi etc. (to my knowledge) lack the alveolar ones.

Manjunath said...

In Kannada, there is a strange consonant which probably is the only letter without a vowel. It is sometimes used at the end of the word(only) to represent 'na'(dental nasal) without the vowel 'a'. You may remember once I talked about Chillaksharam in Malayalam. And said that there is one letter in Kannada which probably is equal to Malayalam Chillaksharam. This letter is written like Kannada numerial nine (೯) but with an extra coil in between. It is called 'nakarapillu' in Kannada. Now, I think that is in fact alveolar nasal. However, considering too subtle sound differences between dental nasal and alveolar nasal the latter has been now completely taken out of Kannada script.

By the way, do you know why some names have this alveolar nasal but some have 'm' at the end of their name? I thought 'n' is just a short form of 'avan'(amma + avan -> amman -> father) but I do not understnad this 'm'.

eg. I have seen both,
Subrahmanyam
Subrahmanyan
Velayuthan
Velayutham

Vijayanand said...

very nice.

How did you do it. Do you have a an exhaustive lookup table.

Or do you do it post by post.

Srikanth said...

Manjunath(a),

But the plural forms are not properly represented( with English plural form of 's'...
That's because the language of the blog is still English! Specific words are now transliterated into different scripts (not translated into different languages).

I could see that anusvara has been kicked out.
Nope! There is no anuswara because I haven't put one!

Srikanth said...

Ambarish,

Thanks a lot! Among the Dravidian languages, only Tamil (currently) seems to have different characters for alveolar and dental.

I still would like to know the rules for when to use either, so that I can fix the tool accordingly.

Srikanth said...

Manjunath(a),

This letter is written like Kannada numerial nine (೯) but with an extra coil in between. It is called 'nakarapillu' in Kannada.
Great! In a way, I am glad the distiction between them is lost since it makes my work simpler for the tool.

Regarding names ending in M:
1) In Tamil, neuter/abstract nouns end in M (not anuswara). pazham* (fruit), guNam (quality) etc. So "Velayudham" may be named after the spear (Vel) of Muruga/Subrahmanya. But "Velayudhan" is named after the wielder of the spear, Muruga himself.
2) The short-form of names ends in M. E.g., Rajam (for, say, Rajalakshmi), Kalyanam (for Kalyanaraman).
3) Places (esp. those named after a deity) end in M. Sriranga -> Srirangam. Rameswara -> Rameswaram. Pattabhirama -> Pattabhiramam (now corrupted to Pattabiram). Our Finance Minister may have been named after the place (Chidambaram) or his name may be short for Chidambaranathan.

Subramaniam is mostly used in Andhra (SPB) and Subramanian in TN. But then, what about Ramanujam or Somasundaram? Probably short for Ramanujachari and Somasundaresan, respectively. Generally, one just goes with the form most common in convention -- not much thought necessarily goes into which form (N-ending or M-ending) is chosen.

- - -
* transLipi cannot be used in comments.

Srikanth said...

transLipi cannot be used in comments.
I should probably clarify that it is because of Blogger restrictions, rather than a tool limitation.

Srikanth said...

Vijayanand,

Thanks! The thingy is general enough - the words are not stored in a table , etc.

Do you (or anybody) wish to use the tool in your blog? Mail me!

Vijayanand said...

I suppose you have represented the translation via several rules...thats what I meant by look up table.

I think its an excellent tool.

Although I dont think I will use it frequently in my blog - can I have a look at it nevertheless?

I think you should publicise this tool - there may be several potential users.

Srikanth said...

Vijayanand,
Thanks for the feedback!

I will mail you the details soon (i.e., by this weekend max).

Manjunath said...

Thanks for the detailed explanation. I would like to use 'translipi' in my blog. My Gmail id is m.manjunatha. I must say, without anusvara Kannada words look bit unfamiliar!

By the way,How do you pronounce alveolar nasal? With a 'svara'( or 'a') or 'inn' (I think Malayalis pronounce it that way and unfortunately never heard anybody pronouncing Kannada alveolar nasal independently...and when added in a word it is pronounced just like dental nasal).

Srikanth said...

Manjunath,
I would like to use 'translipi' in my blog.
Sure! Will mail you once I have compiled the instructions to install it.

I must say, without anusvara Kannada words look bit unfamiliar!
It is up to us to indicate to the tool whether to use an anuswara or a nasal consonant at a particular place -- transLipi is a willing slave to our commands.

Check out this post now!

By the way,How do you pronounce alveolar nasal?
Since Ambarish is the expert in this area, I will transfer the question to him. :-)

Balaji said...

very cool feature, awesome!

Srikanth said...

Balaji, thank you!

Ambarish said...

Srikanth,

It's indeed தண்ணீர் from தண்மை (coolness) + நீர், but conjuncts where the initial ந becomes a ன do exist - நன்னாள் from நன்மை + நாள்; I'm not entirely sure why குடி + நீர் doesn't become குடினீர்.

To solve your specific issue with Translipi, I can recommend 2 options:

a. Write குடிநீர் as 2 separate words; it's probably grammatical albeit not entirely satisfactory.

b. Invent separate symbols for the two nasals.

At the end of the day, distinguishing between the 2 nasals involves tons of other heuristics as well. For instance, traditionally, words derived from Sanskrit are transliterated using the dental nasal because there's no retroflex nasal in Sanskrit, e.g. பந்நக, அந்நம் (rice) as opposed to அன்னம் (swan), அநந்த etc.

I tried this exercise when I wanted to transliterate my lyrics database from Roman to Tamil, and eventually ended up manually marking up the Roman input because automatically figuring out the right nasals was just too difficult.

Ambarish said...

Manjunath,

It's kind of hard to explain how to pronounce consonants, but let me try.

The dental consonants (त, द, न etc.) are pronounced with the tongue touching the teeth or in some cases protruding from the teeth (I've seen Marathi speakers pronounce the voiceless dental plosive with their tongue between the teeth).

Alveolar consonants are pronounced with the tongue touching the front of the alveolar ridge on the top of the mouth.

Retroflex consonants (ट, ड, ण etc.) on the other hand, are pronounced with the tongue behind the alveolar ridge.

There's tons more information at:

http://en.wikipedia.org/wiki/Dental_consonant
http://en.wikipedia.org/wiki/Alveolar_consonant
http://en.wikipedia.org/wiki/Retroflex_consonant
http://en.wikipedia.org/wiki/Place_of_articulation

Srikanth said...

Ambarish,
Thank you for the informative comments! Will reply in detail after some time -- it's been a busy day...

Sunil said...

srikanth, can you email me on how to use translipi on a blog??

this is great.

Srikanth said...

Ambarish,

Indeed there are too many exceptions for any rule we can think of for the nasals.

I have provided separate symbols for each nasal. I began looking for a rule when considering the case where the original text is written in a non-Tamil perspective. E.g., take the Sanskrit word Anandam (where both nasals are dental). Isn't it generally written ஆனந்தம் in Tamil?

But the rule breaks down very easily. I mentioned குடிநீர which can solved easily by hyphenating. However, what about this word (the first that fell upon my eyes after updating the post, from a newspaper) - கர்நாடகம்.

I think I will disable the nasal conversion rule (mentioned in the update) from Translipi.

I tried this exercise when I wanted to transliterate my lyrics database from Roman to Tamil
Interesting! Is your database viewable online? If so, can you give the URL? I'd love to have a look.

Sanskrit are transliterated using the dental nasal because there's no retroflex nasal in Sanskrit
I think you meant "alveolar" when you said "retroflex."

Srikanth said...

Sunil,

Thanks! I'll email the instructions to you.

Balaji said...

Hi Srikanth,
Blogger has now Hindi transliteration in it. You can enable it in the settings page. help page is here http://help.blogger.com/bin/answer.py?answer=58226

Srikanth said...

Balaji,
Thanks. That's a good useful feature for those who want Hindi text on their blogs.
If I may add, the functionality and the audience of Translipi are different. The tool is not rendered redundant.

Balaji said...

>>The tool is not rendered redundant.
Translipi is super cool, it does not/won't become redundant.
Infact it is more enhanced when you type in Hindi using Blogger's transliteration and then convert the words to Tamil or Kannada using Translipi! It is as good as typing in Tamil or Kannada using the english keyboard itself :)

Srikanth said...

Balaji, ;-)

Ambarish said...

[Sorry for the really late response; I didn't realise there were more comments here. Srikanth, is there an RSS/Atom feed for the comments as well?]

I have provided separate symbols for each nasal. I began looking for a rule when considering the case where the original text is written in a non-Tamil perspective. E.g., take the Sanskrit word Anandam (where both nasals are dental). Isn't it generally written ஆனந்தம் in Tamil?

So ஆனந்தம் as a loan-word in Tamil is written as above, but if you look at text transliterated from Devanagari to Tamil, it's written ஆநந்தம் (with some sort of superscript/subscript over the voiced dental to distinguish it from the voiceless dental). Atleast, I've seen it written like that, so I'm guessing it's a question of how careful the author wants to be.

But the rule breaks down very easily. I mentioned குடிநீர which can solved easily by hyphenating. However, what about this word (the first that fell upon my eyes after updating the post, from a newspaper) - கர்நாடகம்.

கர்நாடகம் is an interesting story altogether. I've heard multiple theories about the etymology, of both the music form and the state.

One of them is that Karnataka is from karu + nadu meaning dark land (referring to either the people or the soil), and Karnataka sangitam originated there. Notice that this would make the hyphen before நா appropriate :-)

Another theory is that कर्णे अटतीति कर्णाटकम् - karnatakam is that which pleases the ear. I'm not sure if this theory claims to explain the etymology of Karnataka the state. In this case, of course, it would be கர்ணாடகம் in Tamil, and you wouldn't need a hyphen anyway!

Interesting! Is your database viewable online? If so, can you give the URL? I'd love to have a look.

Not at the moment, sorry :-( There's a web-interface, but it's intranet-only. I'll be sure to let you know when I'm ready to put it up.

I think you meant "alveolar" when you said "retroflex."

Yes, of course. Good catch :-)

Srikanth said...

Ambarish,
Srikanth, is there an RSS/Atom feed for the comments as well?
I will explore this option. But this will have to wait a while...

Regarding the karnatakam etymologies: Yeah, have heard them -- but they seemed a little fantastic to me. :-)
The karu+nadu sounds good; but then, wouldn't an appended aham now be redundant?

And in all languages, the N is never retroflex, so the Sanskrit theory seems improbable too!

Ambarish said...

Regarding the karnatakam etymologies: Yeah, have heard them -- but they seemed a little fantastic to me. :-)

True that.

The karu+nadu sounds good; but then, wouldn't an appended aham now be redundant?

Yeah, aham in the sense of sthanam would be redundant. Questions, questions, no answers :-(

And in all languages, the N is never retroflex, so the Sanskrit theory seems improbable too!

Are you saying that the 'n' in Karnataka is never retroflex when pronounced? In fact, I'm not sure there exist Sanskrit words with the 'rn' combination where the 'n' is dental. Or am I reading you wrong?

Srikanth said...

Are you saying that the 'n' in Karnataka is never retroflex when pronounced?
I meant that the word "Karnataka", in all south Indian languages (including Kannada), is not written with retroflex N. On the other hand, in Sanskrit, (after R) the N is always retroflex. So this theory seemed unsound.

But then, on further thought, may be the retroflex N became a dental with time in other languages.

Maybe Manjunath can educate us (on the etymology of "Karnataka")?

Manjunath said...

Thank you very much for your explanation, Ambarish.

Regarding the name Karnataka, well, I am not very clear about it. According to Wikipedia article Karnata or Karnataka has been consistently used since 5th century BCE! I believe, all those Sanskrit works must have used retroflex rather than dental. In fact, even in Karnataka there were linguists arguing for retroflex in the name. And the names with retroflex are used though not very common.

But I am confused with so-called tatbhava form. karnata tatbhava is kannada. If it were retroflex shouldn't tatbhava be karaNaDa. Or if karnata is Sanskritised from of kannaDa then shouldn't it be karnATa rather than karNATa. I am confused with this tatsama/tatbhava business.

Marathis call us kanadi. Well, I wonder if it is kANaDi or kAnaDi and also when it was first attested.

Not much helpful, I suppose!

Ravi Mundkur said...

Srikanth
I am interested in the newly introduced translipi.Please send me a copy with instructions for usage.
bmravindra@gmail.com

Srikanth said...

Ravi,
I sent you the instructions to deploy translipi on March 26.

Penti said...

I think the discussion on ந and ன is already over. But a contribution from by side:

As has been said, ந் and ன் represent the dental and alveolar nasals respectively. But, modern pronunciation does not follow this pattern. Modern tamils tend to alveolarize the ந் when it is not following by its corresponding plosive த்.

I suppose this tendency has been there for quite sometime in tamil. That might be the reason why Sanskrit borrowings like ஆனந்தம் have the alveolar nasal instead of the original Sanskrit dental nasal.

Arun said...

Ran into your blog yesterday - very interesting!

Echoing last poster, I would also like to opine to that w.r.t pronounciation there is no difference between tamizh ந vs ன at least in practice today, where both are alveolar nasal. However, this atleast explains to me, the mystery as to why current tamizh uses two letters that in practice carry the same sound! They must have been different earlier and merged as the last comment implies.

Of course, ண is the retroflex (?) nasal and for water, it is indeed தண்ணீர், and also நீர். I think the first one is really cool water as in தண் (cool) + நீர் (water) = தண்ணிர் (cool water)

The usage of ந and ன in the script of course is different and you have already outlined. The former is always used at the start of the word, the latter is never used at the start of the word.

In middle of words, ந must be used in mei form when preceding த (and where it implies the softer da sound as opposed to harder ta sound) as in வந்தாய் (you came). A usage like தன்தாய் (one's mother) can appear here it is still two words that just happen to run together, and தாய் starts with the harder ta sound unlike da in வந்தாய்.

As you noted, ந can also occur in middle of words which are conjunction of two words where the latter starts with "na" - e.g, திருநாமம்.

Everywhere else it is ன and its variants.

Srikanth said...

Penti (sorry for the delay) and Arun:

Thanks for the explanations! I now understand the reason for "ஆனந்தம."

As I said before, I needed heuristics for the alveolar nasal mainly when the original text is not in Tamil but needs to be transliterated into Tamil. Based on your thoughts and Ambarish's suggestions, here is how transLipi now deals with it:

* Provide separate symbol for alveolar nasal. This can be used when writing with Tamil in mind.
* Change N from dental to alveolar (while transliterating into Tamil from a different script) when N occurs at the end of a word. No change otherwise. It looks strange in Tamil to have the dental nasal at the end of a word (e.g., while transliterating the Telugu word manasuna "in the mind" into Tamil).

Arunk said...

Srikanth,

I should first confess/warn that I am zero w.r.t linguistic terms (i.e. alveolar, dental etc.) and what I used in my last post was simply borrow appropriate terms from others :)

Also, I have done something similar to your translipi (btw, i love what you have done - great work!). I have come up with a single transliteration scheme for the languages used in carnatic music. This was mainly done to be have a single representation of carnatic music compositions that can then be transcribed to the various languages of the carnatic world (although malayalam is yet to be supported). So sort of similar to translipi. However, the representation is geared towards conveying proper pronunciation rather than conveying how it is represented in each script. So for my solution, the phoneme representation is in general more important than the script representation.

You can check it out at http://arunk.freepgs.com/cmtranslit

However, in that scheme too, dealing with ந vs ன was a problem as in current tamizh one can view them as an idiosyncrasy of the script given that both carry the same pronunciation.

What I did was to have an explicit representation (^n) for ந but that is needed only in contexts where it is not readily unambiguous. "n" by itself can mean ந or ன depending on context. I think that in most cases, the context unambiguously points to one of the two nas.

For example, திருநாமம் must be represented as tiru^nAmam as opposed to tirunAmam. An explicit specifier is required here. But it is not needed at the start of a word and it is not needed in words like tandAi ("nd" implies only one na). Note that in the representation tandAi and tantAi are different (da vs ta) and both unambiguously point to one of the two na's.

So perhaps you can do something similar.

BTW, if you don't mind, how does one use translipi in a blog? Also how does one input/create the special characters in the translipi representation?

Arun

Srikanth said...

Arun,

Also, I have done something similar to your translipi
I checked out your site -- it seems indeed quite a similar idea compared with translipi!

One of the seeds of the translipi project was also Carnatic lyrics. So yes, our works are indeed twin-brothers separated at birth!

I'll send you the instructions to deploy translipi. Can you let me know (or mail me) your email id?

Arunk said...

:). I thought though that your translipi has easier scalability w.r.t to extending to other languages as representations seem based on linguistics?

my email address: arunk underscore the-number-fifteen at yahoo dot com

Thanks!

Srinivas said...

Hello,
Translipi is very good and I find it very useful and one of the best software found in the Net. I like the sanskrit transliteration best.


Srinivas

రాకేశ్వర రావు said...

great tool.
I liked its use in Sahityam.net.

This is a very late comment,
I just hope you are sticking to some sort of standard.
Like Diacritic Latin or IPA.

A similar tool is at this site also.
e-maata

Anonymous said...

[url=http://www.kamagrashop.pl]viagra[/url], inspect up on my site. If you are interested in guardian, get the drift [url=http://www.4uescort.de]huren[/url] or [url=http://www.escort4u.pl]ogloszenia towarzyskie[/url]

ader45 said...

is there anyway i can learn tamil??



Codeine Cough Syrup
Clonazepam vs Xanax
tips malam pertama
malam pertama
malam pertama pengantin
kisah malam pertama
cerita malam pertama
pengalaman malam pertama
cerita lucu malam pertama
madu khaula
percocet 5 325
vicodin 5 500
antique bird cages
maytag dishwasher parts
headboards for queenbeds
ge dryer parts
ge dishwasher parts
ativan vs xanax
klonopin vs xanax
lorazepam vs xanax
zoloft weight gain
phentermine results
nexium coupon
advantix for dogs

Anonymous said...

Please add GUJARATI SCRIPT Which is very very simplified script of Devnagari Script.

saralhindi said...

Introducing translipi........


Please add GUJARATI SCRIPT Which is very very simplified script of Devnagari Script.

Anonymous said...

http://site.ru - [url=http://site.ru]site[/url] site
site

Anonymous said...

Brokersring.com - Learn how to turn $500 into $5,000 in a month!

[url=http://www.brokersring.com/]Make Money Online[/url] - The Secret Reveled with Binary Option

Binary Options is the way to [url=http://www.brokersring.com/]make money[/url] securely online

Anonymous said...

If this is people, then I promote you to see this article throughout it's entire? What they don't realize is a charge off of is not as negative as a a bankruptcy proceeding and is as a result more achievable in terms of credit improvement. Think you're someone lifestyle your life about grants provided by DSS. instant payday loans Therefore, in the bank's eye balls, they wish on the internet purchase of a vehicle in comparison with previously believed!

Anonymous said...

In that circumstances, $10,000 personal loan rapidly no problems is applicant is quite serious about signing up for the expense of home financing loan. Just-in-time delivery with obsolete capacitors makes it possible for maintenance and also repair plans to remain when they're due? All you need to do today to apply for loan payday loans is to be a United States individual over the age of 16? click here Often times the financial problems are resolved by a small amount of cash whether it is available quickly!

Anonymous said...

x0gdcdlp

grfgj9js

f54yewr4t536

e2uu2a42

twefdfbm

Anonymous said...

Видео ютуб улётное http://youtu.be/knL93B57iiI
Прикольное видео секс http://youtu.be/X2sdWXysJIc
[youtube]knL93B57iiI[/youtube]
[youtube]X2sdWXysJIc[/youtube]
video youtube http://www.youtube.com/user/aeytovaresch/
Вот ещё прикольное Видео
http://www.youtube.com/watch?v=Qr2sn4ntAu0

video Видео

Rangan said...

Wonderful Tool. Thanks to you

Can I use this in my wordpress site.
Can you send me the installation package and how to install this.

Thanks
Raja

gautam said...

Namaskar. Found this site "by accident" (!) searching for the SriSri Prahlada Stuti. You have done an amazing task, something comparable to prapatti.com, and something divinely inspired for certain.

So beautifully organized and accurately presented, and the translipi transliteration is most helpful. For one thing, even the Kanchi Kamakoti Peetham website offers its own idiosyncratic transliteration style, and the highly esteemed P. Ramachunder and others use whatever they deem to be phonetically suitable to their own ears. This has created a jungle of confusion, even for some of the Shringeri disciples' compositions about their Holy Gurunathas, since the aspirated and unaspirated Sanskrit consonants are a sticking point for many Tamil speakers.

Even the various puja paddhatis, e.g. Holy Mother Durga, published by the Kamakoti Mandali, a most learned source, suffer in this regard to a significant degree, in their mantrabhaga.

Works published from the Sringeri Sharada Peetham are also carelessly edited, if at all.

Since the Vedic period we have expended a huge part of our civilization's energy in keeping vowels, consonants, meter, pitch etc. correct, and it is extremely admirable and welcome that websites like this continue this sacred tradition of keeping pure the shastras for swadhyaya and for any other reason. There cannot be sufficient praise and gratitude for the effort this has entailed, at least from my end. God bless all associated with this work.