Page 1 of 7

POS - first pass

Posted: Mon Jul 26, 2010 1:56 pm
by janKipo
Here is the list, some things surely missing, something suspect of being misplaced (preceding *), somethings perhaps requiring a separate group (following *). More generalities about each group eagerly sought.

Group F
Principal use: functions: mark syntactic boundaries
No other uses
li beginning of verb phrase
e beginning of DO
la end of condition
o beginning of imperative (replaces subject and li)
beginning of optative (complete sentence)
end of vocative
en and, coordinate head nouns (only?) (subject and in PP and ‘pi’ phrases)
anu or, coordinate head nouns, VP, sentences, modifiers …
pi right grouping within modifier string
pu unknown function (?)
kin emphasize preceding word (indeed, moreover, …)


Group I
Principal use: interjections, vocal noises outside language proper, though perhaps meaningful.
As noun: the sound, the uttering of the sound
As verb i: to utter the sound, be the sound
As verb t: to utter the sound with the meaningful content DO
As modifier: given to uttering the sound, like the emotions (if any) of the sound
a general person exclamation, useful oral interp practice giving it various
shadings
a a more extended version of a, generally negative, but with strong stress on
second also “Eureka”
a a a laugh

mu general animal sound
derogatory description of human speech: nonsense, Duckspeak, etc.


Group M
Principal use: modals, take VP complement in verb position (others?), placing action under condition of some sort.
As noun: the condition/condition force
As verb i = modal
As verb t: variable – see separate entries
As modifiers: under force of
wile strong force driving toward: must, need, ought
kan no strong force against: can, may
kama come to (become, happen, not arrive at)
*open begin, open
*pini end, finish
*awen keep on, continue

Group P
Principal use: prepositions: take NP complement in all positions (?)
As noun: general class of the complement, details at specific entry
As verb i: to be or be in the process of fulfilling the prepositional placement (special
interps for missing complements)
As verb t: cause DO to be subject of verb i.
As modifier: given to verb i
tawa to, toward
tan from
lon at
sama same


Group A
Principal use: modifiers (adjectives, adverbs)
As nouns: the property, something with that property
As verb i: has the property
As verb t: cause DO to be subject of verb i
ala not
ale/ali all
ante different
ike bad
jaki vile
jelo yellow
kiwen hard
lete cold
lili little
laso blue
loje red
moli dead
mute much
nasa unusual
*ni this
pimeja dark
pona good
*seme what?
*sewi high
sin new
suli big
suwi sweet
*taso only
*tu two
walo white
*wan one


Group N
Principal use: nouns
As verb i: to be a specimen of (either literally or figuratively – caution about culture
here)
As verb t: to make DO subject of verb I; to apply specimen to DO (perhaps two groups)
As modifier: pertaining to (yes, that vague)
akesi reptile, amphibian, small ugly animal
*anpa bottom
*esun shop, market
ijo thing
ilo tool
*insa inside
jan person
*kala fish
kalama noise
*kasi plant
ko goo
kon air
kule color
*kulupu group
*lawa head
len cloth
linja string
lipu sheet
luka arm (foreleg)(problems with all but jan&soweli)
lupa hole
ma land
mama parent
mani wealth
meli female
mije male
*monsi back
mun moon
*namako seasoning (?)
nanpa number
nasin way
nena bump
nimi word
noka (hind) leg (see luka)
oko eye
*pakala harm
palisa rod
pan grain
pipi bug
poka side
poki container
*seli heat
selo skin
sijelo body
*sike circle
*sinpin front
*soweli beast
suno light
*supa surface
telo water
tenpo time
tomo building
uta mouth
*utala conflict
waso bird
wawa power


Group Vt
Principal use: verb t: take e + DO
As nouns: the activity, generic class of DO
As verbs i: assume suppressed generic DO
As modifier: given to the activity
alasa hunt
jo have
*kapisi cut(?)
kepeken use
kute* hear
lukin* see
moku eat
*musi play
olin love
pali make
pana emit
pilin* feel
sitelen draw
sona* know
toki* say
unpa f..k
weka distance

Group Vi
Principal use: verb i:
As nouns: the activity
As verb t: cause DO to do activity
As modifier: given to activity
lape sleep


Group X
Principal use: pronouns: nouns of a contextually determined meaning
As verb i: identification
As modifier: possessive
mi 1st, speaker and those for whom speaks, I, we, me, us
sina 2nd, audience, you
ona anaphoric 3rd same as some earlier noun reference
ni deictic, words or things pointed to in context.

Re: POS - first pass

Posted: Mon Jul 26, 2010 10:08 pm
by janMato
Prototypical VT by corpus. Sorry if your browser doesn't wrap the code block in a scroller.

li sama e is a odd-ball. I think one person was using to mean "is the same as" and it showed up a lot in a document for math meaning is equal to.

I used the regex of \bli [a-z] e\b to find these.

Rank, Occurrence, li + verb + e

Code: Select all

1	438	li toki e	Problably people speaking Inli
2	333	li jo e	
3	243	li pana e	
4	224	li pali e	
5	164	li lukin e	
6	120	li pilin e	
7	110	li sama e	copying? is equal to? is like?
8	82	li wile e	
9	80	li moku e	
10	68	li sona e	
11	59	li weka e	
12	58	li pakala e	
13	57	li tawa e	
14	54	li kama e	to cause/become something
15	51	li kepeken e	
16	41	li lawa e	
17	39	li moli e	
18	33	li utala e	
19	30	li olin e	
20	29	li anpa e	defeat
21	27	li sitelen e	
22	20	li pona e	
23	15	li len e	
24	14	li ante e	
25	13	li pini e	
26	13	li seli e	cook
27	12	li kute e	
28	11	li nimi e	
29	10	li awen e	
30	10	li sike e	
31	10	li telo e	
32	10	li tu e	divide
33	9	li nasa e	
34	8	li kalama e	
35	8	li poka e	
36	7	li open e	
37	7	li unpa e	
38	7	li wan e	
39	5	li esun e	
40	5	li lon e	
41	5	li toke e	
42	4	li ala e	
43	4	li kulupu e	
44	4	li nanpa e	count
45	4	li pimeja e	
46	4	li sewi e	
47	3	li ike e	??
48	3	li lape e	
49	2	li ken e	enable?
50	2	li kon e	
51	2	li lete e	
52	2	li moki e	
53	2	li mute e	
54	2	li pinli e	
55	2	li poki e	
56	2	li sin e	
57	2	li suno e	
58	2	li tan e	to cause something
59	2	li wawa e	??
60	1	li insa e	??
61	1	li ko e	??
62	1	li kule e	
63	1	li lili e	
64	1	li mama e	to parent
65	1	li pine e	spelling
66	1	li selo e	??
67	1	li sijelo e	??
68	1	li sinpin e	
69	1	li sitelin e	spelling
70	1	li suli e	to raise
71	1	li token e	Spelling
72	1	li weke e	Spelling
jan Kipo's intuition + rank

Code: Select all

alasa hunt  [not enough usage]
jo have  #2
*kapisi cut(?) [not enough usage]
kepeken use #15
kute* hear #27
lukin* see #5 
moku eat #9
*musi play [not on the list! not use as a single word vt]
olin love #19
pali make #4
pana emit #3
pilin* feel #6
sitelen draw  #21
sona* know #10
toki* say #1
unpa f..k #37
weka distance #11

Re: POS - first pass

Posted: Mon Jul 26, 2010 10:19 pm
by janMato
Prototypical nouns by corpus, assuming "e" is followed by a noun. Almost everything at one time or another has been used as a noun, if infrequently. This compare to VT, which at most 70 words have been used as VT.

Rank, occurrence, phrase

Code: Select all

1	1185	e ni
2	384	e toki
3	323	e jan
4	310	e ona
5	211	e ijo
6	190	e telo
7	182	e ma
8	175	e nimi
9	160	e tomo
10	144	e mi
11	132	e soweli
12	125	e kasi
13	119	e lipu
14	118	e mani
15	114	e sina
16	99	e sitelen
17	91	e ilo
18	86	e sona
19	84	e nasin
20	77	e moku
21	69	e pilin
22	67	e pona
23	66	e len
24	58	e mute
25	58	e sike
26	54	e kili
27	53	e kalama
28	53	e seme
29	52	e ike
30	51	e meli
31	49	e kon
32	49	e suno
33	46	e pali
34	44	e poki
35	43	e sijelo
36	40	e kulupu
37	39	e kiwen
38	38	e nena
39	38	e tenpo
40	34	e ali
41	34	e waso
42	33	e linja
43	33	e lupa
44	32	e luka
45	30	e lon
46	29	e mije
47	28	e akesi
48	27	e sinpin
49	26	e musi
50	26	e nanpa
51	25	e lawa
52	24	e ko
53	24	e palisa
54	24	e sewi
55	23	e utala
56	22	e ala
57	22	e olin
58	22	e wawa
59	22	e wile
60	20	e lape
61	18	e kala
62	17	e ale
63	17	e pan
64	17	e wan
65	16	e monsi
66	16	e pakala
67	15	e oko
68	14	e noka
69	13	e supa
70	13	e tan
71	12	e ante
72	12	e lili
73	12	e pipi
74	12	e sama
75	12	e seli
76	12	e suli
77	12	e tawa
78	12	e uta
79	11	e mama
80	11	e tu
81	8	e ken	
82	8	e lukin	
83	8	e suwi	
84	7	e esun	
85	7	e insa	
86	6	e kule	
87	6	e mun	
88	6	e nasim	spelling
89	6	e pana	
90	6	e weka	
91	5	e awen	
92	5	e jo	
93	5	e pimeja	
94	5	e pini	
95	5	e unpa	
96	5	e walo	
97	4	e selo
98	3	e anpa
99	3	e kama
100	3	e kin
101	3	e nasa
102	2	e a ??
103	2	e e ??
104	2	e kute
105	2	e loje
106	2	e oka
107	2	e sitelin
108	2	e tempo
109	1	e jaki
110	1	e kawa
111	1	e kepeken
112	1	e laso
113	1	e lupo	
114	1	e meji	spelling
115	1	e mole	??
116	1	e moli	
117	1	e noki	spelling	
118	1	e open	
119	1	e pi	??
120	1	e poka	??
121	1	e sikelo	spelling
122	1	e suna	spelling
123	1	e tosu	??

Re: POS - first pass

Posted: Mon Jul 26, 2010 10:46 pm
by janMato
Next I looked for signs of modals using \bli [a-z]+ [a-z]{2,7} e\b

Which is li followed by 2 words followed by e. I can't tell the difference between predicates and intransitive verbs, at least not with regex.

I manually removed the negative li X ala e, the adverbials li X [pona|ike|lili|mute|kin|taso|wawa] e

3 words are very clearly modals and productive (at least when used with vt verbs)
wile appears with 19 other words. li wile awen e, etc.
ken appears with 19 other words. li ken anpa e, etc.
kama appears with 8 other words. li kama jo e, etc.

If anything, we have more modals of extremely recent usage than of ancient usage.

I did find a bunch of these odd-balls. I'm not sure what to make of them.

li awen jo e - infrequent modal?
li lukin sama e - Ambivalence about using prepositional sama or accusative phrase?
li toki sama e - same as above
li lon kasi e - transitive lon to mean create? But with DO incorporated. (And I didn't do it!)
li lon lupa e - same as above.
li pana sona e - DO incorporation.
li sona weka e - DO incorporation with DO first.
li soweli tomo e - huh?
li kalama musi e - huh?
li toki sin e - re-talk, maybe a calque of respond.
li toki utala e - double verb

Re: POS - first pass

Posted: Tue Jul 27, 2010 10:03 am
by janKipo
Thanks for the stats; I'll have to see what mods they require.
Two items are clearly (in afterthought mode) misplaced: lape should be in M as "asleep" and ni should also be in M as "this/that", otherwise the rules don't work well. So the whole class of intransitive verbs disappears into M in verb position (which seems basically right).
Apparently 'musi' is misplaced, but is it N or M? probably M, but that needs some looking at.
The next interesting question is about whether nouns need to be divided into those which mean "apply to" as vt and those which mean "change to" or some such thing.
Other suggestions?
Can you provide full text for the anomalous cases?

Re: POS - first pass

Posted: Tue Jul 27, 2010 6:12 pm
by janKipo
So, 'musi' goes into N as "amusement" and N as transitive verbs mean "provide DO with " (the "change DO into" seems to have seeped in from A)
The first word (after 'la' or 'taso' or 'o' when there is a 'li' later) is also a pretty clear noun as is the first word after a preposition.
I get the sense that 'pilin' and maybe some others regularly take an adjectival complement, separate from the object.
The verbs with following stars ought to be the main ones to take 'e ni:' (maybe 'tan' and 'kama' too).
Other adjustments?
Have I missed any words?

Re: POS - first pass

Posted: Tue Jul 27, 2010 6:26 pm
by janMato
I'm currently reading the Stuff of Thought (http://www.amazon.com/Stuff-Thought-Lan ... 0670063274)

I'm fascinated by Pinker's idea that verbs fall into micro-classes that appear to be sort-of universal. The gist of what he says is that verbs that are semantically similar take similar constructions. So the classic
mi olin e meli
and
meli li pona tawa mi

would be resolved by saying they are different microclasses, usually having something to do with causality, spatio-temporal relationships and the like. Interestingly if one verb uses a prepositional construction, then all the semantically similar verbs will also do so. If a verb takes an accusative, then semantically similar verbs take the accusative, too.

mi toki kepeken Inli and mi toki e Inli are another pair.

I'm not sure how well the same technique would work in tp, since there are so many fewer verbs to start with.

Anyhow, I intend to run some queries on what the occurrence rate of verb + kepeken, verb + poka, verb + lon as possible indicators that people think that a verb in tp takes a particular oblique.

Re: POS - first pass

Posted: Tue Jul 27, 2010 6:30 pm
by janKipo
li awen jo e - infrequent modal?
Well, I think 'awen' should be a modal and here the whole thing just means "keep, continue having"
li lukin sama e - Ambivalence about using prepositional sama or accusative phrase?
li toki sama e - same as above
given the problems with 'sama' this seems a natural confusion to turn up.
li lon kasi e - transitive lon to mean create? But with DO incorporated. (And I didn't do it!)
No, just "put DO in the plant" or some such. 'kasi' as NP complement ot a P
li lon lupa e - same as above.
ditto --"in a hole"
li pana sona e - DO incorporation.
Yeah, common for "teach" with both subject and student (common not, I suppose, statistically)
li sona weka e - DO incorporation with DO first.
Even then I don't see what it should have meant. Context?
li soweli tomo e - huh
maybe shooting for "pet" the vt (back to the "turn into"sense after all)
li kalama musi e - huh?
"perform for"? "provide with amusing sound"
li toki sin e - re-talk, maybe a calque of respond.
not quite a real calque, but maybe something like that (we don't have a "return", after all)
li toki utala e - double verb
idiom for "argue", isn't it? (or is that 'utala toki'?)

Re: POS - first pass

Posted: Tue Jul 27, 2010 8:44 pm
by janMato
Prototypical adjectives by corpus by being the 2nd word after "pi" Regex is \bpi [a-z]{2,7} [a-z]{2,7}\b

-pona appears a lot because everyone keeps saying "toki pona" in the corpus.
-mute is plural. Structurally same as modifier, but it's frequency belies that this is somehow different.
-ali, tu, wan. Quality words are prototypical modifiers.
-mi, sina, ona. Structurally same as modifiers, but obviously mean possessives
-jan, meli, mije. These could be either prototypical modifiers or possessives

We see plenty of prototypical verbs here, acting like participles.
-tawa, pali, jo moving, working, having/owning

I've said that some modifiers act like periphrasic derivational modifiers. After removing the numbers, possessives, plurals, then counting how many *different* words get modified by a word, one could see how "productive" they are. My attention is wandering, so suffice to say that the most productive modifiers are lili, suli, ike, pona, ala, ale, pimeja, walo, telo, toki, and a few more. Obviously, my counts would have a mixture of ordinary modifiers and things intended as "compound words" (telo suli = ocean, and not a lot of water)

Interestingly, only 78 words show up in the adjective slot *in this query*. I'm to lazy to check for other patterns where modifiers must be (2nd word after preposition, word before pi if preceded by 2 words, etc).

Rank, word, # of times this word was found 2 places after pi.

Code: Select all

1	pona	142	
2	mute	118	Plural
3	sewi	107	
4	ali	106	
5	tu	82	number and modifier
6	tomo	64	
7	suli	56	
8	wan	56	number and modifier
9	ni	52	deixis grammatical particle
10	lawa	50	
11	sona	48	
12	lili	47	
13	tawa	45	
14	ante	36	
15	ma	36	
16	mi	36	possesive
17	pini	36	
18	suno	34	
19	soweli	30	
20	ike	28	
21	toki	27	
22	nasa	25	
23	musi	24	
24	telo	23	
25	pimeja	21	
26	lukin	19	
27	luka	18	
28	kama	16	
29	pali	15	
30	jo	13	
31	loje	13	
32	mama	13	(also possessive)
33	sike	13	
34	lon	12	
35	moli	12	
36	palisa	12	
37	walo	12	
38	ona	11	possesive
39	jelo	10	
40	sina	10	possesive
41	noka	9	
42	sama	9	
43	sijelo	9	
44	sin	9	
45	jan	8	(also possessive)
46	seli	8	
47	wawa	8	
48	seme	7	
49	utala	7	
50	anpa	6	
51	kasi	6	
52	kiwen	6	
53	nanpa	6	
54	weka	6	
55	linja	5	
56	moku	5	
57	poka	5	
58	kulupu	4	
59	meli	4	(also possessive)
60	mije	4	(also possessive)
61	pan	4	
62	waso	4	
63	kon	3	
64	kule	3	
65	lete	3	
66	oko	3	
67	awen	2	
68	ijo	2	(probably possessive)
69	kalama	2	
70	mani	2	
71	mun	2	
72	nimi	2	
73	olin	2	
74	pakala	2	
75	pana	2	
76	sitelen	2	
77	suwi	2	
78	laso	1	

Re: POS - first pass

Posted: Tue Jul 27, 2010 8:55 pm
by janMato
Usual (to me) transitives
kasi li pana e kon pona tawa tomo mi.
mi mute li lon kasi e ilo suno.
mi mute li lon kasi e ijo musi pona. (aikidave)
telo li kama la telo li kama lon sinpin.
mi toki e jan pi sona tomo.
jan pi sona tomo li pona e sinpin.
jan li lon lupa e ko wawa.
ko wawa li pini e telo.
jan li kalama.
jan li telo e sinpin kepeken ilo wawa.(aikidave)
a! mi kama sona. mi sona e kasi loje sama ni. ona li soweli tomo e mi. (little prince)
jan Majumi: mi mute li sona ale e ni: sike kon, tomo tawa, li tawa sewi mute tan seme? ona li wile awen lon sewi lili. mi mute li pilin a a! jan Wiko en mi li toki utala e ni: mi awen ala awen pona e sike kon? mi mute li lon toki li sona weka e tomo tawa. tenpo ni la jan Patosa li toki e "jan Palakon li lon insa, jan Palakon li lon insa". mi mute li ken kute ala. taso mi kama sona e seme? tenpo ni la mi mute li kama sona e seme li lon.(sitelen sitelen)