What does Unicode compatibility means in terms of glyph choices?
The glyph choices are to be such that, a one-to-one correspondence mapping table between the alphabet/character definitions under the present scheme and Unicode / ISCII can be established. A draft of one such mapping table is presented herewith in the annex section. Using such a table, it will be possible to save a TSC-based file in either format.
Both Unicode and ISCII scheme include a number of Tamil numerals. So the present scheme need to include these Tamil numerals. Else there cannot be a one-to-one correspondence between these forthcoming standards.
Why Unicode, ISCII compatibility?
There are major advantages by ensuring this compatibility with the "emerging" standards.
i) It is an undeniable fact that the world is heading towards multi-lingualism. This is particularly true for a country like India where the migration of people amongst different constituent states is very pronounced. The encoding standards for "multi-lingualism" Unicode and ISCII are still "evolving" and are not fully established (particularly in the implementation of most of the widely used Internet information exchange protocols such as SMTP, HTTP, NNTP, POP, PDF,...),. Hence it is proposed that the Tamil community start using TSCII as an "interim standard" and move on to multi-lingual standards (of either Unicode or ISCII) on a later date. A clean compatibility will ensure that, all Tamil materials generated in TSCII format be made available in Unicode/ISCII format at all times - present and future. None of the TSCII-based resources will be lost when Unicode/ISCII become fully functional.
ii) Secondly the present glyph encoding scheme can happily co-exist with the more sophisticated Unicode/ISCII schemes and even can make way for smooth transition to Unicode at a future date. Indian language Packages
for Unicode and ISCII are very expensive and have started appearing in the market only very recently. It is still largely under-explored domain for fool-proof implementation.
6. The standard must be usable and co-exist with other existing software until Unicode compliant software becomes available.
One-to-one correspondence table in the character definition as per the proposed standard with the popular Tamil fonts/DTP packages will ensure smooth transition and recovery of all the archived Tamil text materials produced till this date. There exist already conversion softwares that allow inter conversion of Tamil text files prepared using different font encoding schemes. Such conversion softwares based on the proposed Tamil standard will be made available to promote rapid and smooth transition to the new standard.
7. The Tamil encoding standard should allow rapid implementation of many of the routine tasks required in large databases (such as search or sort).
It is very likely that with the widespread growth of a true international standard for Tamil, large databases (library catalogues, electronic telephone directory, land/property registry, inventory of materials in departmental
stores etc. etc.) are built based on Tamil script. Routine usage of these databases often require search or sort routines. The encoding scheme should be such as to allow development of softwares for these without unnecessary demand for huge computer memory or processing capacity.
8. The output of the Tamil standard (Tamil text) should be independent of the input mode.
It is important to realize that, with glyph-encoding based font faces, text input process using the keyboard is totally independent of the glyph choices that go to constitute the font encoding scheme. Using different keyboard editors, it is possible to use the same Tamil font face and input the text using one of the several keyboard layout options: those based on Tamil typewriter layout, phonetic, romanized or transliterated,...... The resulting Tamil text will be identical in all these cases, format being determined by the encoding scheme. Keyboard editors allow facile toggling between the roman and Tamil segments and the Tamil characters can be accessed directly through the roman keyboard. For European Languages based on 8859-X encoding schemes, several keyboard editors to toggle between the standard US mode to French, German, Finnish, Swedish,.... already come part of the OS software. Once the proposed encoding scheme for Tamil becomes the standard, similar keyboard editors for Tamil text input can be made available as a system software.
There are several popular methods of input for Tamil and these are considered under different keyboard layouts: classical Tamil typewriter, romanized and phonetic or transliterated. Several Keyboard editors that allow input according to these different methods have already been developed and these can be readily adapted to include the proposed encoding scheme as the reference chart for the font in question.
9. As with the Unicode standard, the "proposed standard" does not encode idiosyncratic, personal, novel, rarely exchanged, or private-use characters, nor does it encode logos or graphics. Artificial entities, whose sole function is to serve transiently in the input of text, are excluded. Graphologies unrelated to text, such as musical and dance notations, are outside the scope of the proposed standard.
One possibility would be to agree on a supplementary ding-bat type font for exclusive usage amongst the Tamil community - one that contains symbols such as OM, religious symbols, arrows, Greek symbols etc. If the all Tamil web pages use these two (one official Tamil font and a second de-facto standard dingbat style font ), we can easily add some color and liveliness to the world of Tamil computing.
Click here to go to Part II (providing a description of the Encoding scheme, rationale for glyph choices and slot allocations).
Click here to go to the Web page carrying the Annexes.
A draft version of the proposal was first put up in the Internet on Dec. 2, 1997 and this file was last revised on October 27, 1998.
You are visitor number since July 1, 2000.
Please send your comments to Dr. K. Kalyanasundaram
This page hosted by