Page 1 of 1

Cuss Word Generation in Toki Pona

Posted: Mon Feb 01, 2010 12:43 am
by janMato
pakala nena uta!

I'm trying to create a cuss word generator in toki pona. This is from the observation that if you generate words with sounds with the same frequency as the rest of the words in a given language, the most common words are cuss words. So a random English word generator would generated 4 letter words most often because they embody the most common phonotactic patterns. (Sorry I can't find the ref to this anymore, I read it in a book somewhere)

I already wrote one version that didn't use a markov chain, and the results weren't interesting: nana, lana, nala, nani, nina, which just reflects that for the distribution stats I was using, n,l,a,i were the most common letters, so valid combinations of n,l,a,i were most commonly generated. Since four letter words in English aren't so tightly bunched in cluster of common sounds-- some of the most common cuss words don't even have English's most common vowel the schwa, I figure I'm doing something wrong-- probably not using markov chains.

Markov Chain transitions matrix for toki pona
Should the transition matrix for toki pona be syllable to syllable (odds of "ka" following "la" is 1% percent) or letter by letter, (odds of a following k is 1%)? And for further speculation--given that toki pona has a mostly closed set of isolated morphemes, how would toki pona's cuss? Would they possibly use common strings of words instead of common strings of phonemes? Maybe I should also work out a word -> word transition matrix (the odds of walo following laso is .001%).

Final question, what corpus are people using? If one doesn't exist, I plan to compile one from this site and the wikia site, since those have permissive enough licenses for republishing content.

Re: Cuss Word Generation in Toki Pona

Posted: Mon Feb 01, 2010 8:27 pm
by janKipo
Cussing is generally semantically based, isn't it? So you need to find the potential semantic elements among the words of tp to use -- and modify. Unfortunately, what the semantic content of cuss words is varies from culture to culture and time to time. Religion is often a base as is sex but very strange things may turn up, depending on the taboos of the culture. I think it is probably very hard to cuss effectively in a language without a taboo-ridden culture.

Re: Cuss Word Generation in Toki Pona

Posted: Mon Feb 01, 2010 9:25 pm
by janMato
janKipo wrote:Cussing is generally semantically based, isn't it?
Yes, but what words will get picked up to carry the load? If theres any truth in the dice, it is that most prototypical sounding words will be chosen to be the cuss words, e.g. cthulhu will never be an important insult in English. As I just looked up, the F word started out as an ordinary PIE word to mean "to strike". and the S word was the ordinary PIE word "to split". So if cuss words are constantly going from euphemism to taboo word, the taboo words should be draw seemingly at random from common, ordinary words.

Pinkerian cusswords only fall into five categories-- no clue if this is the result of a careful cross cultural study or linguistic universal.

The Supernatural- kon anpa, o tawa anpa!
Disease, Death, & Infirmity- sijelo jaki, jan moli, palisa sina o anpa!
Sexuality- unpa jaki, jan unpa, sina o unpa e sama!
Disfavoured people or groups- jan unpa, jan sona ala, jan pi mani olin

I read that in post-collapse post-cannibalism Easter Island, a common insult was something like
"I pick your mom's flesh from my teeth"
mi weka e moku pi mama meli suli sina lon anpa walo uta mi!

I guess one has to be from Easter Island for that one to really hurt.

Re: Cuss Word Generation in Toki Pona

Posted: Mon Feb 01, 2010 11:10 pm
by janKipo
Here and now it would probably be taken as a sex curse.
I don't know the source of this list, but it looks petty cross-cultural, allowing for some flexibility as to what goes in those groups. Of course, which ones are favored shifts.
I think there may be a phonological component -- as least some have said so -- but I'm not sure whether that is variable by culture or there are universals (fondness for ks say).
I recall a couple of books, neither of which I can name or even attribute, on the future of swearing and its past in English; might be worth a look-up. Robert Graves? Eric Partridge?

Re: Cuss Word Generation in Toki Pona

Posted: Tue Feb 02, 2010 1:55 am
by jan Josan
sina wile pana e nimi jaki kepeken toki pona anu seme? sina wile pana e ni tan seme? pilin seme li pakala ala pakala? sina jo e sona ala ala ala? pakala! toki pona li toki pona! mi wile ala e ko jaki sina! mi wile ala lukin e toki unpa jaki pi pakala sina. o pakala Kapata! jan Pisin Papa pipi jaki pi akesi anpa! sina sama soweli Sapinkata pi linja ala! o toki ala e jaki Pasapi pi sina Konkon Akapa utala mama meli pi ko Kaka! o tawa weka! toki jaki sina li kalama Takata. o unpa e kasi. o unpa e pipi. o unpa e kala tan ni: sina li wile unpa moli e ma ale! sina li jan Taka pipi sama lili Topi Toto Potopopo...

I don't know, it's a little like a growling puppy or trying to create tension in the pentatonic scale... :twisted:

Re: Cuss Word Generation in Toki Pona

Posted: Tue Feb 02, 2010 9:54 am
by janKipo
But impressive nonetheless, As you note in passing, the one curse word we have given us (like the one emotional noise and the one animal sound) is 'pakala.'

Does TP have enough words to have taboo words?

Posted: Sun Feb 21, 2010 10:08 pm
by janMato
Since we only have 125 words, if any of them become unavailable except in time of anger or at biker bars, would we expect the words unpa, nena, lupa, monsi, pakala to ever become taboo words in the real or hypothetical toki pona community? 5/125 of the words in a language becoming taboo would be a significant loss of vocabulary.

If they're not taboo words, than I'd suspect speakers would hear reproduce, bump, hole, broke first and only on double take would they notice that the sentence has words that sometimes mean f*k, boob, a*h*le, a*, and damn.

Re: Cuss Word Generation in Toki Pona

Posted: Mon Feb 22, 2010 12:16 am
by janKipo
Well, I still don't think 'unpa' means "reproduce" but other wise I see the point here. I suppose that ideally in tp culture these objects and activities would not be taboo and s neither would the words for them be. This would be avery Daoist move, to be sure, but a hard one for some of us. For now, the negative "cuss" words seem to be limited to those that refer to negative things 'ike, jaki, pakala' maybe 'utala' and I don't know what else. That is, there is no added negative connotation to the existing designation. Be nice to keep it that way.