Here is the list, some things surely missing, something suspect of being misplaced (preceding *), somethings perhaps requiring a separate group (following *). More generalities about each group eagerly sought.
Group F
Principal use: functions: mark syntactic boundaries
No other uses
li beginning of verb phrase
e beginning of DO
la end of condition
o beginning of imperative (replaces subject and li)
beginning of optative (complete sentence)
end of vocative
en and, coordinate head nouns (only?) (subject and in PP and ‘pi’ phrases)
anu or, coordinate head nouns, VP, sentences, modifiers …
pi right grouping within modifier string
pu unknown function (?)
kin emphasize preceding word (indeed, moreover, …)
Group I
Principal use: interjections, vocal noises outside language proper, though perhaps meaningful.
As noun: the sound, the uttering of the sound
As verb i: to utter the sound, be the sound
As verb t: to utter the sound with the meaningful content DO
As modifier: given to uttering the sound, like the emotions (if any) of the sound
a general person exclamation, useful oral interp practice giving it various
shadings
a a more extended version of a, generally negative, but with strong stress on
second also “Eureka”
a a a laugh
mu general animal sound
derogatory description of human speech: nonsense, Duckspeak, etc.
Group M
Principal use: modals, take VP complement in verb position (others?), placing action under condition of some sort.
As noun: the condition/condition force
As verb i = modal
As verb t: variable – see separate entries
As modifiers: under force of
wile strong force driving toward: must, need, ought
kan no strong force against: can, may
kama come to (become, happen, not arrive at)
*open begin, open
*pini end, finish
*awen keep on, continue
Group P
Principal use: prepositions: take NP complement in all positions (?)
As noun: general class of the complement, details at specific entry
As verb i: to be or be in the process of fulfilling the prepositional placement (special
interps for missing complements)
As verb t: cause DO to be subject of verb i.
As modifier: given to verb i
tawa to, toward
tan from
lon at
sama same
Group A
Principal use: modifiers (adjectives, adverbs)
As nouns: the property, something with that property
As verb i: has the property
As verb t: cause DO to be subject of verb i
ala not
ale/ali all
ante different
ike bad
jaki vile
jelo yellow
kiwen hard
lete cold
lili little
laso blue
loje red
moli dead
mute much
nasa unusual
*ni this
pimeja dark
pona good
*seme what?
*sewi high
sin new
suli big
suwi sweet
*taso only
*tu two
walo white
*wan one
Group N
Principal use: nouns
As verb i: to be a specimen of (either literally or figuratively – caution about culture
here)
As verb t: to make DO subject of verb I; to apply specimen to DO (perhaps two groups)
As modifier: pertaining to (yes, that vague)
akesi reptile, amphibian, small ugly animal
*anpa bottom
*esun shop, market
ijo thing
ilo tool
*insa inside
jan person
*kala fish
kalama noise
*kasi plant
ko goo
kon air
kule color
*kulupu group
*lawa head
len cloth
linja string
lipu sheet
luka arm (foreleg)(problems with all but jan&soweli)
lupa hole
ma land
mama parent
mani wealth
meli female
mije male
*monsi back
mun moon
*namako seasoning (?)
nanpa number
nasin way
nena bump
nimi word
noka (hind) leg (see luka)
oko eye
*pakala harm
palisa rod
pan grain
pipi bug
poka side
poki container
*seli heat
selo skin
sijelo body
*sike circle
*sinpin front
*soweli beast
suno light
*supa surface
telo water
tenpo time
tomo building
uta mouth
*utala conflict
waso bird
wawa power
Group Vt
Principal use: verb t: take e + DO
As nouns: the activity, generic class of DO
As verbs i: assume suppressed generic DO
As modifier: given to the activity
alasa hunt
jo have
*kapisi cut(?)
kepeken use
kute* hear
lukin* see
moku eat
*musi play
olin love
pali make
pana emit
pilin* feel
sitelen draw
sona* know
toki* say
unpa f..k
weka distance
Group Vi
Principal use: verb i:
As nouns: the activity
As verb t: cause DO to do activity
As modifier: given to activity
lape sleep
Group X
Principal use: pronouns: nouns of a contextually determined meaning
As verb i: identification
As modifier: possessive
mi 1st, speaker and those for whom speaks, I, we, me, us
sina 2nd, audience, you
ona anaphoric 3rd same as some earlier noun reference
ni deictic, words or things pointed to in context.
POS - first pass
Re: POS - first pass
Prototypical VT by corpus. Sorry if your browser doesn't wrap the code block in a scroller.
li sama e is a odd-ball. I think one person was using to mean "is the same as" and it showed up a lot in a document for math meaning is equal to.
I used the regex of \bli [a-z] e\b to find these.
Rank, Occurrence, li + verb + e
jan Kipo's intuition + rank
li sama e is a odd-ball. I think one person was using to mean "is the same as" and it showed up a lot in a document for math meaning is equal to.
I used the regex of \bli [a-z] e\b to find these.
Rank, Occurrence, li + verb + e
Code: Select all
1 438 li toki e Problably people speaking Inli
2 333 li jo e
3 243 li pana e
4 224 li pali e
5 164 li lukin e
6 120 li pilin e
7 110 li sama e copying? is equal to? is like?
8 82 li wile e
9 80 li moku e
10 68 li sona e
11 59 li weka e
12 58 li pakala e
13 57 li tawa e
14 54 li kama e to cause/become something
15 51 li kepeken e
16 41 li lawa e
17 39 li moli e
18 33 li utala e
19 30 li olin e
20 29 li anpa e defeat
21 27 li sitelen e
22 20 li pona e
23 15 li len e
24 14 li ante e
25 13 li pini e
26 13 li seli e cook
27 12 li kute e
28 11 li nimi e
29 10 li awen e
30 10 li sike e
31 10 li telo e
32 10 li tu e divide
33 9 li nasa e
34 8 li kalama e
35 8 li poka e
36 7 li open e
37 7 li unpa e
38 7 li wan e
39 5 li esun e
40 5 li lon e
41 5 li toke e
42 4 li ala e
43 4 li kulupu e
44 4 li nanpa e count
45 4 li pimeja e
46 4 li sewi e
47 3 li ike e ??
48 3 li lape e
49 2 li ken e enable?
50 2 li kon e
51 2 li lete e
52 2 li moki e
53 2 li mute e
54 2 li pinli e
55 2 li poki e
56 2 li sin e
57 2 li suno e
58 2 li tan e to cause something
59 2 li wawa e ??
60 1 li insa e ??
61 1 li ko e ??
62 1 li kule e
63 1 li lili e
64 1 li mama e to parent
65 1 li pine e spelling
66 1 li selo e ??
67 1 li sijelo e ??
68 1 li sinpin e
69 1 li sitelin e spelling
70 1 li suli e to raise
71 1 li token e Spelling
72 1 li weke e Spelling
Code: Select all
alasa hunt [not enough usage]
jo have #2
*kapisi cut(?) [not enough usage]
kepeken use #15
kute* hear #27
lukin* see #5
moku eat #9
*musi play [not on the list! not use as a single word vt]
olin love #19
pali make #4
pana emit #3
pilin* feel #6
sitelen draw #21
sona* know #10
toki* say #1
unpa f..k #37
weka distance #11
Re: POS - first pass
Prototypical nouns by corpus, assuming "e" is followed by a noun. Almost everything at one time or another has been used as a noun, if infrequently. This compare to VT, which at most 70 words have been used as VT.
Rank, occurrence, phrase
Rank, occurrence, phrase
Code: Select all
1 1185 e ni
2 384 e toki
3 323 e jan
4 310 e ona
5 211 e ijo
6 190 e telo
7 182 e ma
8 175 e nimi
9 160 e tomo
10 144 e mi
11 132 e soweli
12 125 e kasi
13 119 e lipu
14 118 e mani
15 114 e sina
16 99 e sitelen
17 91 e ilo
18 86 e sona
19 84 e nasin
20 77 e moku
21 69 e pilin
22 67 e pona
23 66 e len
24 58 e mute
25 58 e sike
26 54 e kili
27 53 e kalama
28 53 e seme
29 52 e ike
30 51 e meli
31 49 e kon
32 49 e suno
33 46 e pali
34 44 e poki
35 43 e sijelo
36 40 e kulupu
37 39 e kiwen
38 38 e nena
39 38 e tenpo
40 34 e ali
41 34 e waso
42 33 e linja
43 33 e lupa
44 32 e luka
45 30 e lon
46 29 e mije
47 28 e akesi
48 27 e sinpin
49 26 e musi
50 26 e nanpa
51 25 e lawa
52 24 e ko
53 24 e palisa
54 24 e sewi
55 23 e utala
56 22 e ala
57 22 e olin
58 22 e wawa
59 22 e wile
60 20 e lape
61 18 e kala
62 17 e ale
63 17 e pan
64 17 e wan
65 16 e monsi
66 16 e pakala
67 15 e oko
68 14 e noka
69 13 e supa
70 13 e tan
71 12 e ante
72 12 e lili
73 12 e pipi
74 12 e sama
75 12 e seli
76 12 e suli
77 12 e tawa
78 12 e uta
79 11 e mama
80 11 e tu
81 8 e ken
82 8 e lukin
83 8 e suwi
84 7 e esun
85 7 e insa
86 6 e kule
87 6 e mun
88 6 e nasim spelling
89 6 e pana
90 6 e weka
91 5 e awen
92 5 e jo
93 5 e pimeja
94 5 e pini
95 5 e unpa
96 5 e walo
97 4 e selo
98 3 e anpa
99 3 e kama
100 3 e kin
101 3 e nasa
102 2 e a ??
103 2 e e ??
104 2 e kute
105 2 e loje
106 2 e oka
107 2 e sitelin
108 2 e tempo
109 1 e jaki
110 1 e kawa
111 1 e kepeken
112 1 e laso
113 1 e lupo
114 1 e meji spelling
115 1 e mole ??
116 1 e moli
117 1 e noki spelling
118 1 e open
119 1 e pi ??
120 1 e poka ??
121 1 e sikelo spelling
122 1 e suna spelling
123 1 e tosu ??
Re: POS - first pass
Next I looked for signs of modals using \bli [a-z]+ [a-z]{2,7} e\b
Which is li followed by 2 words followed by e. I can't tell the difference between predicates and intransitive verbs, at least not with regex.
I manually removed the negative li X ala e, the adverbials li X [pona|ike|lili|mute|kin|taso|wawa] e
3 words are very clearly modals and productive (at least when used with vt verbs)
wile appears with 19 other words. li wile awen e, etc.
ken appears with 19 other words. li ken anpa e, etc.
kama appears with 8 other words. li kama jo e, etc.
If anything, we have more modals of extremely recent usage than of ancient usage.
I did find a bunch of these odd-balls. I'm not sure what to make of them.
li awen jo e - infrequent modal?
li lukin sama e - Ambivalence about using prepositional sama or accusative phrase?
li toki sama e - same as above
li lon kasi e - transitive lon to mean create? But with DO incorporated. (And I didn't do it!)
li lon lupa e - same as above.
li pana sona e - DO incorporation.
li sona weka e - DO incorporation with DO first.
li soweli tomo e - huh?
li kalama musi e - huh?
li toki sin e - re-talk, maybe a calque of respond.
li toki utala e - double verb
Which is li followed by 2 words followed by e. I can't tell the difference between predicates and intransitive verbs, at least not with regex.
I manually removed the negative li X ala e, the adverbials li X [pona|ike|lili|mute|kin|taso|wawa] e
3 words are very clearly modals and productive (at least when used with vt verbs)
wile appears with 19 other words. li wile awen e, etc.
ken appears with 19 other words. li ken anpa e, etc.
kama appears with 8 other words. li kama jo e, etc.
If anything, we have more modals of extremely recent usage than of ancient usage.
I did find a bunch of these odd-balls. I'm not sure what to make of them.
li awen jo e - infrequent modal?
li lukin sama e - Ambivalence about using prepositional sama or accusative phrase?
li toki sama e - same as above
li lon kasi e - transitive lon to mean create? But with DO incorporated. (And I didn't do it!)
li lon lupa e - same as above.
li pana sona e - DO incorporation.
li sona weka e - DO incorporation with DO first.
li soweli tomo e - huh?
li kalama musi e - huh?
li toki sin e - re-talk, maybe a calque of respond.
li toki utala e - double verb
Re: POS - first pass
Thanks for the stats; I'll have to see what mods they require.
Two items are clearly (in afterthought mode) misplaced: lape should be in M as "asleep" and ni should also be in M as "this/that", otherwise the rules don't work well. So the whole class of intransitive verbs disappears into M in verb position (which seems basically right).
Apparently 'musi' is misplaced, but is it N or M? probably M, but that needs some looking at.
The next interesting question is about whether nouns need to be divided into those which mean "apply to" as vt and those which mean "change to" or some such thing.
Other suggestions?
Can you provide full text for the anomalous cases?
Two items are clearly (in afterthought mode) misplaced: lape should be in M as "asleep" and ni should also be in M as "this/that", otherwise the rules don't work well. So the whole class of intransitive verbs disappears into M in verb position (which seems basically right).
Apparently 'musi' is misplaced, but is it N or M? probably M, but that needs some looking at.
The next interesting question is about whether nouns need to be divided into those which mean "apply to" as vt and those which mean "change to" or some such thing.
Other suggestions?
Can you provide full text for the anomalous cases?
Re: POS - first pass
So, 'musi' goes into N as "amusement" and N as transitive verbs mean "provide DO with " (the "change DO into" seems to have seeped in from A)
The first word (after 'la' or 'taso' or 'o' when there is a 'li' later) is also a pretty clear noun as is the first word after a preposition.
I get the sense that 'pilin' and maybe some others regularly take an adjectival complement, separate from the object.
The verbs with following stars ought to be the main ones to take 'e ni:' (maybe 'tan' and 'kama' too).
Other adjustments?
Have I missed any words?
The first word (after 'la' or 'taso' or 'o' when there is a 'li' later) is also a pretty clear noun as is the first word after a preposition.
I get the sense that 'pilin' and maybe some others regularly take an adjectival complement, separate from the object.
The verbs with following stars ought to be the main ones to take 'e ni:' (maybe 'tan' and 'kama' too).
Other adjustments?
Have I missed any words?
Re: POS - first pass
I'm currently reading the Stuff of Thought (http://www.amazon.com/Stuff-Thought-Lan ... 0670063274)
I'm fascinated by Pinker's idea that verbs fall into micro-classes that appear to be sort-of universal. The gist of what he says is that verbs that are semantically similar take similar constructions. So the classic
mi olin e meli
and
meli li pona tawa mi
would be resolved by saying they are different microclasses, usually having something to do with causality, spatio-temporal relationships and the like. Interestingly if one verb uses a prepositional construction, then all the semantically similar verbs will also do so. If a verb takes an accusative, then semantically similar verbs take the accusative, too.
mi toki kepeken Inli and mi toki e Inli are another pair.
I'm not sure how well the same technique would work in tp, since there are so many fewer verbs to start with.
Anyhow, I intend to run some queries on what the occurrence rate of verb + kepeken, verb + poka, verb + lon as possible indicators that people think that a verb in tp takes a particular oblique.
I'm fascinated by Pinker's idea that verbs fall into micro-classes that appear to be sort-of universal. The gist of what he says is that verbs that are semantically similar take similar constructions. So the classic
mi olin e meli
and
meli li pona tawa mi
would be resolved by saying they are different microclasses, usually having something to do with causality, spatio-temporal relationships and the like. Interestingly if one verb uses a prepositional construction, then all the semantically similar verbs will also do so. If a verb takes an accusative, then semantically similar verbs take the accusative, too.
mi toki kepeken Inli and mi toki e Inli are another pair.
I'm not sure how well the same technique would work in tp, since there are so many fewer verbs to start with.
Anyhow, I intend to run some queries on what the occurrence rate of verb + kepeken, verb + poka, verb + lon as possible indicators that people think that a verb in tp takes a particular oblique.
Re: POS - first pass
li awen jo e - infrequent modal?
Well, I think 'awen' should be a modal and here the whole thing just means "keep, continue having"
li lukin sama e - Ambivalence about using prepositional sama or accusative phrase?
li toki sama e - same as above
given the problems with 'sama' this seems a natural confusion to turn up.
li lon kasi e - transitive lon to mean create? But with DO incorporated. (And I didn't do it!)
No, just "put DO in the plant" or some such. 'kasi' as NP complement ot a P
li lon lupa e - same as above.
ditto --"in a hole"
li pana sona e - DO incorporation.
Yeah, common for "teach" with both subject and student (common not, I suppose, statistically)
li sona weka e - DO incorporation with DO first.
Even then I don't see what it should have meant. Context?
li soweli tomo e - huh
maybe shooting for "pet" the vt (back to the "turn into"sense after all)
li kalama musi e - huh?
"perform for"? "provide with amusing sound"
li toki sin e - re-talk, maybe a calque of respond.
not quite a real calque, but maybe something like that (we don't have a "return", after all)
li toki utala e - double verb
idiom for "argue", isn't it? (or is that 'utala toki'?)
Well, I think 'awen' should be a modal and here the whole thing just means "keep, continue having"
li lukin sama e - Ambivalence about using prepositional sama or accusative phrase?
li toki sama e - same as above
given the problems with 'sama' this seems a natural confusion to turn up.
li lon kasi e - transitive lon to mean create? But with DO incorporated. (And I didn't do it!)
No, just "put DO in the plant" or some such. 'kasi' as NP complement ot a P
li lon lupa e - same as above.
ditto --"in a hole"
li pana sona e - DO incorporation.
Yeah, common for "teach" with both subject and student (common not, I suppose, statistically)
li sona weka e - DO incorporation with DO first.
Even then I don't see what it should have meant. Context?
li soweli tomo e - huh
maybe shooting for "pet" the vt (back to the "turn into"sense after all)
li kalama musi e - huh?
"perform for"? "provide with amusing sound"
li toki sin e - re-talk, maybe a calque of respond.
not quite a real calque, but maybe something like that (we don't have a "return", after all)
li toki utala e - double verb
idiom for "argue", isn't it? (or is that 'utala toki'?)
Re: POS - first pass
Prototypical adjectives by corpus by being the 2nd word after "pi" Regex is \bpi [a-z]{2,7} [a-z]{2,7}\b
-pona appears a lot because everyone keeps saying "toki pona" in the corpus.
-mute is plural. Structurally same as modifier, but it's frequency belies that this is somehow different.
-ali, tu, wan. Quality words are prototypical modifiers.
-mi, sina, ona. Structurally same as modifiers, but obviously mean possessives
-jan, meli, mije. These could be either prototypical modifiers or possessives
We see plenty of prototypical verbs here, acting like participles.
-tawa, pali, jo moving, working, having/owning
I've said that some modifiers act like periphrasic derivational modifiers. After removing the numbers, possessives, plurals, then counting how many *different* words get modified by a word, one could see how "productive" they are. My attention is wandering, so suffice to say that the most productive modifiers are lili, suli, ike, pona, ala, ale, pimeja, walo, telo, toki, and a few more. Obviously, my counts would have a mixture of ordinary modifiers and things intended as "compound words" (telo suli = ocean, and not a lot of water)
Interestingly, only 78 words show up in the adjective slot *in this query*. I'm to lazy to check for other patterns where modifiers must be (2nd word after preposition, word before pi if preceded by 2 words, etc).
Rank, word, # of times this word was found 2 places after pi.
-pona appears a lot because everyone keeps saying "toki pona" in the corpus.
-mute is plural. Structurally same as modifier, but it's frequency belies that this is somehow different.
-ali, tu, wan. Quality words are prototypical modifiers.
-mi, sina, ona. Structurally same as modifiers, but obviously mean possessives
-jan, meli, mije. These could be either prototypical modifiers or possessives
We see plenty of prototypical verbs here, acting like participles.
-tawa, pali, jo moving, working, having/owning
I've said that some modifiers act like periphrasic derivational modifiers. After removing the numbers, possessives, plurals, then counting how many *different* words get modified by a word, one could see how "productive" they are. My attention is wandering, so suffice to say that the most productive modifiers are lili, suli, ike, pona, ala, ale, pimeja, walo, telo, toki, and a few more. Obviously, my counts would have a mixture of ordinary modifiers and things intended as "compound words" (telo suli = ocean, and not a lot of water)
Interestingly, only 78 words show up in the adjective slot *in this query*. I'm to lazy to check for other patterns where modifiers must be (2nd word after preposition, word before pi if preceded by 2 words, etc).
Rank, word, # of times this word was found 2 places after pi.
Code: Select all
1 pona 142
2 mute 118 Plural
3 sewi 107
4 ali 106
5 tu 82 number and modifier
6 tomo 64
7 suli 56
8 wan 56 number and modifier
9 ni 52 deixis grammatical particle
10 lawa 50
11 sona 48
12 lili 47
13 tawa 45
14 ante 36
15 ma 36
16 mi 36 possesive
17 pini 36
18 suno 34
19 soweli 30
20 ike 28
21 toki 27
22 nasa 25
23 musi 24
24 telo 23
25 pimeja 21
26 lukin 19
27 luka 18
28 kama 16
29 pali 15
30 jo 13
31 loje 13
32 mama 13 (also possessive)
33 sike 13
34 lon 12
35 moli 12
36 palisa 12
37 walo 12
38 ona 11 possesive
39 jelo 10
40 sina 10 possesive
41 noka 9
42 sama 9
43 sijelo 9
44 sin 9
45 jan 8 (also possessive)
46 seli 8
47 wawa 8
48 seme 7
49 utala 7
50 anpa 6
51 kasi 6
52 kiwen 6
53 nanpa 6
54 weka 6
55 linja 5
56 moku 5
57 poka 5
58 kulupu 4
59 meli 4 (also possessive)
60 mije 4 (also possessive)
61 pan 4
62 waso 4
63 kon 3
64 kule 3
65 lete 3
66 oko 3
67 awen 2
68 ijo 2 (probably possessive)
69 kalama 2
70 mani 2
71 mun 2
72 nimi 2
73 olin 2
74 pakala 2
75 pana 2
76 sitelen 2
77 suwi 2
78 laso 1
Re: POS - first pass
Usual (to me) transitives
kasi li pana e kon pona tawa tomo mi.
mi mute li lon kasi e ilo suno.
mi mute li lon kasi e ijo musi pona. (aikidave)
telo li kama la telo li kama lon sinpin.
mi toki e jan pi sona tomo.
jan pi sona tomo li pona e sinpin.
jan li lon lupa e ko wawa.
ko wawa li pini e telo.
jan li kalama.
jan li telo e sinpin kepeken ilo wawa.(aikidave)
a! mi kama sona. mi sona e kasi loje sama ni. ona li soweli tomo e mi. (little prince)
jan Majumi: mi mute li sona ale e ni: sike kon, tomo tawa, li tawa sewi mute tan seme? ona li wile awen lon sewi lili. mi mute li pilin a a! jan Wiko en mi li toki utala e ni: mi awen ala awen pona e sike kon? mi mute li lon toki li sona weka e tomo tawa. tenpo ni la jan Patosa li toki e "jan Palakon li lon insa, jan Palakon li lon insa". mi mute li ken kute ala. taso mi kama sona e seme? tenpo ni la mi mute li kama sona e seme li lon.(sitelen sitelen)