just try it:
http://ngrams.googlelabs.com/
it would be great to have toki pona in list one day..
New Google tool
Re: New Google tool
toki pona doesn't even register. In fact, the only fake language name that registers is Klingon and Esperanto
Some more conlang phrases.
It seems "aux lang" was a common phrase during the war-- maybe it had a military meaning outside of the fake language sense.
Some more conlang phrases.
It seems "aux lang" was a common phrase during the war-- maybe it had a military meaning outside of the fake language sense.
Re: New Google tool
no, i mean the tool to trace thelanguage evolution over timejanMato wrote:toki pona doesn't even register. In fact, the only fake language name that registers is Klingon and Esperanto
e.g. in english you can see the frequancy drop for many service words, as well as in russian. this may reflect the compactisanion of language structure, use of longer chains of modifiers, etc. but for some words english and russian evolve differently
so, what about tp? measuring f(li) we could estimate how does the length of tp sentence changes. f(pi) might indicate the change in modifier chains complexity, etc
Re: New Google tool
That is interesting how the Russian Revolution had such an impact on basic things like how many clause introducing words people used in published texts.
I plan to write some code that will assign readbility metrics to each text file in the toki pona corpus-- mostly so that I can sort them and publish them in a graded reader. If I extend that code to spit out the date of the source material, then a graph wouldn't be too much more work.
Outside of sentence length, what metric would you use to measure complexity, or what other metrics would be of interest?
I wrote some code trying to come up with some measure of toki pona for readability scores.jan-ante wrote:so, what about tp? measuring f(li) we could estimate how does the length of tp sentence changes. f(pi) might indicate the change in modifier chains complexity, etc
I plan to write some code that will assign readbility metrics to each text file in the toki pona corpus-- mostly so that I can sort them and publish them in a graded reader. If I extend that code to spit out the date of the source material, then a graph wouldn't be too much more work.
Outside of sentence length, what metric would you use to measure complexity, or what other metrics would be of interest?
Re: New Google tool
it was the biggest revolution of minds in russian history. it brought the precise thinking to the broad masses of people. from that time every schoolchild studied literature, mathematics, chemistry, darwinism, etc. but ngrams could be even more interesting than you expect. look how defeats in both world wars affected the german thinking. you could separately try without sie so wenn to view the effect for low frequancy words. Compare, how the wars affected the english speakers. the effect wass opposit (exkept "will"). then go "back to the USSR". you can see peaks at 1928, 1942, 1953, 1990. you probably know what do the 2nt ant 4th date mean in soviet history. 1928 was the famine, 1953 was the Stalin's death. the point of turnover in late soviet era was 1975-1977, when (probably) the accumulation of pakala started; 1990 was just a culmination. this refutes the theory of Gorbi's conspiracy as the cause of soviet collapse.janMato wrote:That is interesting how the Russian Revolution had such an impact on basic things like how many clause introducing words people used in published texts.
note, that these processes were probably subconsciousness. some very evident bad style (like starting the sentence with "Также" or "Далее" (Also & Further)) dropped down in the war, but increased abruptly with advent of "freedom".
i wonder, could somebody check this for french, spanish and (if applicable) for chinese? it would be interesting to compare.
Re: New Google tool
Well, it doesn't span any historical periods and there isn't much political talk going on, but I have some metrics and I calculate them for a variety of documents. I'll have to go back to all of these to get what year they were written-- I didn't think to get that when I was gathering files for the corpus.
http://tokipona.net/tp/CorpusReadability.aspx
I also got a primative corpus search that accept regex searches
http://tokipona.net/tp/CorpusSearch.aspx
When I combine these and create a graph, I'll have something close to an N-Gram thingy.
http://tokipona.net/tp/CorpusReadability.aspx
I also got a primative corpus search that accept regex searches
http://tokipona.net/tp/CorpusSearch.aspx
When I combine these and create a graph, I'll have something close to an N-Gram thingy.
Re: New Google tool
I looked for the easiest text in the corpus. And the winner is... surprise! surprise!... "advanced- jan Kikamesi- jan Enkitu li kama" with combined readability score equal 0.0, as all its metrics are equal zerojanMato wrote:I have some metrics and I calculate them for a variety of documents. I'll have to go back to all of these to get what year they were written-- I didn't think to get that when I was gathering files for the corpus.
http://tokipona.net/tp/CorpusReadability.aspx
(The file is empty).
While the harderst to read is your "Troll" (8.9, when 1.0 is the average). Its Complex NP, Function and Words/Sentence measures are extremly high just because the sentences are delimited by commas instead of full stops.
Re: New Google tool
Empty file- fixed. There's actually a lot of work left to make this corpus usable for a variety of purposes. First is to come up with a system for metadata-- is it poetry, what year was it written, etc.
Crazy minima and maxima -- not fixed yet but addressed with more data. I included my entire compiled corpus including the stuff that isn't strictly redistributable-- I'm supposing I'll use youtubes rules (post content and take down when the owner complains) or "fair uses" as a defense should I get any care bear stares.
Crazy minima and maxima -- not fixed yet but addressed with more data. I included my entire compiled corpus including the stuff that isn't strictly redistributable-- I'm supposing I'll use youtubes rules (post content and take down when the owner complains) or "fair uses" as a defense should I get any care bear stares.
Re: New Google tool
Done. http://tokipona.net/tp/ Enter your tp search words in the box and click search. The results are restricted to sites manually determined to have toki pona content.Ricky6 wrote:Good if google would come up with a Toki Pona version..