A ranged possibility: two-letter codes

Signs and symbols: Writing systems (hieroglyphs, nail writing) and Signed Toki Pona; unofficial scripts too
Signoj kaj simboloj: Skribsistemoj (hieroglifoj, ungoskribado) kaj la Tokipona Signolingvo; ankaŭ por neoficialaj skribsistemoj
jan-ante
Posts: 541
Joined: Fri Oct 02, 2009 4:05 pm

A ranged possibility: two-letter codes

Post by jan-ante »

Having 123 word only, toki pona provides a unique possibility to attribute a two letter code to every word and use it for the writing. Here I propose such a code for tp words and appreciate all your criticism and suggestions. Also, I invite everybody to try this code just as an experiment, because no other language can be encoded like this. The words (except separators and interjections) are encoded by 2 letters, e.g.: tk - toki, po - pona. The codes are summarised in the table below. The interjection and separators are encoded as follows:
a --- a
la --- l ^ >
li --- i * <
o --- o !
e --- e ~
pi ---@
For example: tkpo*po twal. If one use a single-letter encoding for separators, the space may not be omitted: tkpo i po twal
The words "toki pona" could be also encoded as tp, for example: !kmso~tp. mi pn~nnstsi@tp twni: jn*wistmt^on*kestll. ke^nnstni*ns. mi soaa.
If the word "jan" precedes the proper name, one may encode it as "j": ts jSonja*so~ni. on*mm@tp *jnlw@kp@tp.

codes, version 00:
Image
sorry for graphics, but this forum does not support tables. you can have a text version of this table here (Excel-editable)
The general rules of the code assignment were following: (i) the first letter should be kept as in original word, (ii) the second letter should be present in the word, (iii) if impossible, the code should be comprised of two letters in order of their appearance in the word (these words are marked with red in the table). The two exception form these rules are the code pm for pan and kj for kili. One can use the code an for pan and il for kili, but I chose to keep them free. The word mu is not encoded.
From the table we can see a phonetic preferences of jan Sonja. She likes to start the words with k, s, l, p, m. She dislikes starting the words with e, u, j, o, u, i. She likes to have k, p, n inside the word. She dislikes having e, j, t inside the word. I kindly ask jan Sonja to shift the dictionary to phonetic equilibrium when she introduce a new word. Perhaps, she could substitute kipisi with something else, e.g. with ipisi.
Also (specially for jan Kipo) one can use the two-letter monograms for calligraphy or other aesthetical purposes. I also suggest several styles of writing below.
Image

Image
janKipo
Posts: 3064
Joined: Fri Oct 09, 2009 2:20 pm

Re: A ranged possibility: two-letter codes

Post by janKipo »

Tachygraphy has its charms and even such a short-word language as tp probably can use it in recording speech and, of course, tweeting, twittering and generally using telephones in places where God didn't intend them for purposes She didn't intend either. In measured, slow, considered communication, however, it is a pain in the butt. I like this system just fine, but don't want to have to read messages in it, as it takes another order of time to read. That said, I will try to learn this (or invent a better one).
jan-ante
Posts: 541
Joined: Fri Oct 02, 2009 4:05 pm

Re: A ranged possibility: two-letter codes

Post by jan-ante »

janKipo wrote: In measured, slow, considered communication, however, it is a pain in the butt. I like this system just fine, but don't want to have to read messages in it, as it takes another order of time to read.
but what about the hierogliphs? it is a bigger pain in the butt. even the planned systems as Bliss or LoCoS will require much more efforts to learn than this one. this could rather help people to study the tp logographics (if any)
That said, I will try to learn this (or invent a better one).
not yet! first criticise it, please
janKipo
Posts: 3064
Joined: Fri Oct 09, 2009 2:20 pm

Re: A ranged possibility: two-letter codes

Post by janKipo »

Well, the basic sriticism is that we don't need it and using it just puts another roadblock in the way of learning tp. That is the same problem with hieroglyphs, kana, ideograms, etc. etc. The Latin alphabet, for all its faults is familiar, easy to use and doesn't misrepresent tp hardly at all. The others are nice for artictic purposes -- hanging scrolls in the atrium, say -- but they will not do for everyday, if this is to be a learned language, rather than an artifact in itself (an ongoing problem in the conlanger world, btw).
User avatar
jan Ote
Posts: 424
Joined: Thu Oct 08, 2009 1:15 am
Location: ma Posuka
Contact:

Re: A ranged possibility: two-letter codes

Post by jan Ote »

ajn-ante wrote:Also (specially for jan Kipo) one can use the two-letter monograms for calligraphy or other aesthetical purposes.
janKipo wrote:The others are nice for artictic purposes -- hanging scrolls in the atrium, say -- but they will not do for everyday.
Agree. And I would like for toki pona more clear, simple (as in 'pona') and rounded calligraphic shapes of signs than these from Far Eastern writing systems. Like tengwar for example.
Even for artistic purposes I would prefer a simple phonemic alphabet or a syllabic alphabet. Not a syllabary - too many symbols to memorize. And no semantic script - it appears that even toki pona aquires some new words. A writting system with a special symbol for nasal consonant at the end of syllabe (n|m), somehow similar to signs for 'n' and 'm'. And with some nice ligatures for 'li', 'la'. Familiar shapes, similar to these from Latin script would be an advantage.
janKipo
Posts: 3064
Joined: Fri Oct 09, 2009 2:20 pm

Re: A ranged possibility: two-letter codes

Post by janKipo »

Sounds like Portuguese version of the Latin alphabet would do fine. The present script is phonemic already. The Ptg may lack w or y or some other letter (I just don't remember) but it has a high tilde to go over vowels (I think all of them, but again, ...). Syllabic I take it you mean something like Hangul, which is strictly alphabetic but tend to write syllables in over and under blocks, or like Devanagari or pointed Hebrew -- again, this hurts learning.

Back to the presented system. I tend to find the consonants more significant that the vowels (one of the reasons I hate the pile: sewi, suwi, suli, seli), Of course, this doesn't break up the logjams any better than the vowels do, though it might assign the contested ones out differently (pn would go to pona, for example -- one of the problems I had reading some of the tachygrams. Nothing is going to be totally satisfactory (i.e. without unprincipelled assignments). The symbols for the grammatical particles are for the most part hard to draw and missing at least some twitter pages. I think just using the single letters would be better (including p for pi -- maybe especially, since that is hardest to do by hand).
jan-ante
Posts: 541
Joined: Fri Oct 02, 2009 4:05 pm

Re: A ranged possibility: two-letter codes

Post by jan-ante »

janKipo wrote: Well, the basic sriticism is that we don't need it and using it just puts another roadblock in the way of learning tp. That is the same problem with hieroglyphs, kana, ideograms, etc. etc. The Latin alphabet, for all its faults is familiar, easy to use and doesn't misrepresent tp hardly at all.
well, these codes are actually Latin alphabet, even its 14-letter subset. to me it is suitable because i dont have to write the ugly word "kepeken" anymore, just kk. as well as other ugly words, like kulupu (kp), sitelen (st), sinpin (sp), etc. it allows to write less expressing the same information. another point is related to separators. when i analyse a tp sentence, i first look for the separators la li o: A li B la C li D. then i analyse parts A, B, C, D separately. they in turn have a separators e and pi. so it is important fo find all the separators quickly and with minimal efforts. that is one should have them graphically distinct form the rest of text. in classic tp writing system i see too many unnecessary letters, this complicates the searching of separators (consisting of the same letters).
I think just using the single letters would be better (including p for pi -- maybe especially, since that is hardest to do by hand)
good suggestion, we should keep this option, also we should enable capitals for the separators (to make them distinct), i.e pi -- p, P, @
pn would go to pona, for example -- one of the problems I had reading some of the tachygrams
this enatils some other changes:
Image

do you like this version better?
janKipo
Posts: 3064
Joined: Fri Oct 09, 2009 2:20 pm

Re: A ranged possibility: two-letter codes

Post by janKipo »

jan-ante wrote:
janKipo wrote: Well, the basic sriticism is that we don't need it and using it just puts another roadblock in the way of learning tp. That is the same problem with hieroglyphs, kana, ideograms, etc. etc. The Latin alphabet, for all its faults is familiar, easy to use and doesn't misrepresent tp hardly at all.
well, these codes are actually Latin alphabet, even its 14-letter subset. to me it is suitable because i don't have to write the ugly word "kepeken" anymore, just kk. as well as other ugly words, like kulupu (kp), sitelen (st), sinpin (sp), etc. it allows to write less expressing the same information. another point is related to separators. when i analyse a tp sentence, i first look for the separators la li o: A li B la C li D. then i analyse parts A, B, C, D separately. they in turn have a separators e and pi. so it is important fo find all the separators quickly and with minimal efforts. that is one should have them graphically distinct form the rest of text. in classic tp writing system i see too many unnecessary letters, this complicates the searching of separators (consisting of the same letters).
The point about ease of finding separators is a good one, but I would suggest simpler markers (slash, plus dash, etc. -- which are also lower case).
I think just using the single letters would be better (including p for pi -- maybe especially, since that is hardest to do by hand)
good suggestion, we should keep this option, also we should enable capitals for the separators (to make them distinct), i.e pi -- p, P, @
pn would go to pona, for example -- one of the problems I had reading some of the tachygrams
this entails some other changes:
Image

do you like this version better?
I would like to do the statistics on actual usage and give the clearest symbol -- maybe even a single letter -- to the most common. These stats will change somewhat over time, but up at the top they are pretty stable.
jan-ante
Posts: 541
Joined: Fri Oct 02, 2009 4:05 pm

Re: A ranged possibility: two-letter codes

Post by jan-ante »

janKipo wrote: The point about ease of finding separators is a good one, but I would suggest simpler markers (slash, plus dash, etc. -- which are also lower case).
simpler?? you mean that /is simpler than ^ or * ?? why? actually, using slashs / | \ was my first idea, i tried it but to me it does not look nice. ^ an * are more distinct. they are also different from the second level separators ~ and @.
I would like to do the statistics on actual usage and give the clearest symbol -- maybe even a single letter -- to the most common
here you can see some, in the second message in that topic:
1 toki 1136
2 li 1116
3 e 1007
4 pona 1003
5 mi 860
6 jan 633
7 la 525
8 ni 435
9 ona 367
10 tawa 337
...
32 pali
...
45 pana
...
100 poki
113 poka
115 supa
116 akesi
117 suwi

from this, the version 00 was better. if we use po for pona it will make the confusion with poka and poki (##100 and 113), if you use pn you will have conflict with pana and pini (## 45 and 70), and as a consequence with pali (#32)
janKipo
Posts: 3064
Joined: Fri Oct 09, 2009 2:20 pm

Re: A ranged possibility: two-letter codes

Post by janKipo »

Well, slash -- and back-slash -- also dash and equals are easy to write long hand and are lower case on (American) keyboards. I agree that slash and backslash should not both be used, but slash and dash take care of two major breaks, with maybe equals for the less common 'la'. And, again, I like just 'p' for 'pi'.

Nice stats. What corpus (I'm sure it says in there, but I don't even read Cyrillic very well)?
The high score for 'pona' is due to 'toki pona,' otherwise it would be much lower, though still pretty high. Ditto for 'toki,' of course. Some at the low end are rather surprising, but the result of the sorts of things we talk about here. There is a fairly complete corpus -- from a variety of sources -- perpetually in preparation.

The line of argument is nice and clear and convincing (I just have to adjust my intuitions a bit).
Post Reply