Tyranny of Zipf's law

Discuss any other topic in here.
Diskutu ĉiujn aliajn temojn ĉi tie.
Post Reply
janMato
Posts: 1545
Joined: Wed Dec 02, 2009 12:21 pm
Location: Takoma Park, MD
Contact:

Tyranny of Zipf's law

Post by janMato »

jan Kipo has already written about Zipf Wall, rougly saying that common words are expected to be short, so as the vocabulary of word phrases grows, we need strategies to shorten the phrases.

Ziph distributions hug both axes. Where the curve hugs the y axis, a really high percent of the words in a given corpus typically are the same 150 or so words. In the long tail of a natural language, where the curve hugs the x axis, the rest of the words are incredibly rare, but every time you learn another 100 incredibly rare words, there are another 99 incredibly rare words waiting be be memorized, and altogether the rare words make up a non-negligible part of the vocabulary.

Conlangs usually try to break out of the tyranny of Zipf's law by letting users generate their own words through derivational morphology. Toki pona hardly has any morphology going on, it's all syntax-- highly isolating. Has anyone worked out a list of the the strategies permissible in toki pona? And linguisticly speaking, what is derivational morphology in the context of isolating languages? (So far it looks like closest compounds and idioms, which various times I've read tp doesn't have/shouldn't have compounds or idioms)

Here's the obvious syntactical routes
Modified nouns, with and without pi. n + [m] pi +[n] [m]
Modified verbs, with and without pi. v + [m] pi +[n] [m]
Nouns modified by prep phrases n [m] + pp
Verbs whose meaning changed depending on what prep phrase they're used with v + [pp]
la fragments np + la

Semantically, some modifiers seem to be more productive than others ala as -un, jan as -er:
moku ala => un-eat/throw up, (as different from, not food, not eating)
sona ala => forget, unlearn
jan kule. => Painter.
jan pali => Worker

kili jelo. Doesn't seem all that productive. Seems like an ordinary modifier.

Intuitively, it seems that kili jelo and muku ala were arrived at by different means, but at the moment, I'm not sure what the diagnostic test is to differentiate them.

Anyhow, am I missing any? Surely this has been hashed out before, anyone have a suitable link?
janKipo
Posts: 3064
Joined: Fri Oct 09, 2009 2:20 pm

Re: Tyranny of Zipf's law

Post by janKipo »

Not off hand. But I must add that 'ala' as "reverse" is not attested outside jan Mato, so far as I can tell. But, then, I don't know how to do it any other way either. It may not be a general pattern. 'jan' is certainly productive, but also a first place for bahuvrihi to come in, a strategy that I have not yet worked with much.
User avatar
jan Josan
Posts: 326
Joined: Sun Oct 18, 2009 12:41 pm
Location: ma tomo Nujoka
Contact:

Re: Tyranny of Zipf's law

Post by jan Josan »

The other one I've seen used with varying degrees of success is the multiple modifier string:
[n][m1][m2]

If [n1]m1] is in common usage, (say 'jan pali') then [m2] can often be understood to modify it ('jan pali mute') without much difficulty.

occasionally I've seen something like
[n][m1]m2][m3] but as the possible meanings increase exponentially, these usually need to be rearranged and broken up with a 'pi'.
janKipo
Posts: 3064
Joined: Fri Oct 09, 2009 2:20 pm

Re: Tyranny of Zipf's law

Post by janKipo »

'pi's don't exactly break things up but rather use other already existing chunks en bloc. The longest straight string I remember seeing (though I can't reproduce it) was a head and four modifiers, each modifying the whole preceding. I've see longer, but they always were the result of forgetting 'li'. With 'pi', I have seen strings of three, usually with only the last having two content words following it (otherwise we get added ambiguities, in addition to those we get with the third following word).

As I have said, the ways around the wall for a language like tp are
borrowing, however cleverly disguised
forming new words from the existing stock by portmanteaux, acronyms, "gestapo" techniques, compounding, etc.
simplifying descriptions
bahuvrihi (dropping heads and doing with just the descriptive part)
killer metaphors -- the riskiest approach, little used in most conlangs, which seem committed to literalism (onne of the jokes in the Laadan books, I recall, is convincing the Men that Laadan uses a very literal word construction method, with the inevitable unZipfy results, while, in fact, the Sisters do a great poetic, metaphoric, inspired number most of the time. Lack of such inspired numbers may have helped Laadan not take off.) The problem is that there are not a lot of these inspirations for us either -- we tend to calque or at least to fall into cultural patterns which are opaque to outsiders. Of course (as with Laadan) did we have a culture of our own, these metaphors might flow more smoothly. but, until then, ... .
jan-ante
Posts: 541
Joined: Fri Oct 02, 2009 4:05 pm

Re: Tyranny of Zipf's law

Post by jan-ante »

the zipf's law is even more tyrannical than you can suppose. here is thethe number of posts in this forum versus the rank of participant:
Image
it generrally obeys the tyrant, except for jan Kipo (#1, hypozipfic) jan Ote (#4, hyperzipfic) and jan Josan (#5, hyperzipfic)
janKipo
Posts: 3064
Joined: Fri Oct 09, 2009 2:20 pm

Re: Tyranny of Zipf's law

Post by janKipo »

Gee, should I post more to get in line? What is the rank here? number of posts? Is the graph roughly the same as Zipf's (which I can't find in my files and haven't looked at for ages) mutatis mutandis?
janMato
Posts: 1545
Joined: Wed Dec 02, 2009 12:21 pm
Location: Takoma Park, MD
Contact:

Re: Tyranny of Zipf's law

Post by janMato »

janKipo wrote:Gee, should I post more to get in line? What is the rank here? number of posts? Is the graph roughly the same as Zipf's (which I can't find in my files and haven't looked at for ages) mutatis mutandis?
I vaguely remember that normal distributions are normal because they represent maximum entropy and so they applied to all sorts of domains from the distribution of leaf sizes to the distribution of the number of particles in a piece of pocket lint.

Anyone know what's driving the general pattern of something that is following a Zipf curve? It isn't normal because the central tendency is skewed and the tail goes on for far too long before it dwindles to nothing.

And if forum posts were following zipf law, then the top poster should be incredibly common, as common as the word "the" in english, or "li" in toki pona. And the least common posters wouldn't be posting 0, they'd be posting exactly 1 per every year, 5 years or 50 years, just as that is how often one gets a chance to properly use the English word's conchomancy or rubricate in a conversation (about once in a lifetime).
jan-ante
Posts: 541
Joined: Fri Oct 02, 2009 4:05 pm

Re: Tyranny of Zipf's law

Post by jan-ante »

janKipo wrote: What is the rank here? number of posts? Is the graph roughly the same as Zipf's (which I can't find in my files and haven't looked at for ages) mutatis mutandis?
rakn is the position in list of people sorted by the number of posts they posted in this forum. you are first. jan Mato is second. i am 3rd. and so on. the graph is roughly but not exactly the same as zipf's. it has stronger slope:
#posts ~ rank^a
for classic zipf a=-1, here it is about -1.7
Post Reply