Linguistics question
Posted: Mon Jan 25, 2016 2:01 pm
How does one calculate the probability distribution for the odds a character will appear in a word?
I'm trying to create a new tp word generator. An even distribution creates words that 50% of the time end with n, which is not what we see in the dictionary, nor in a typical text.
Say it had three words, a, ab, and bb. Then There is a 2/5 chance of an a.
But what if "a" was 40% of typical texts, "ab" 59% of typical texts and "bb" was rare, 1% of typical texts.
Also, we have tons of particles. If I leave particles in a text to calculate the odds of a letter appearing in a new word, then the letters of li/pi/mi/e are going to be way over represented.
Maybe I'll just wing it for now, but I was curious if there was a real answer for this.
I'm trying to create a new tp word generator. An even distribution creates words that 50% of the time end with n, which is not what we see in the dictionary, nor in a typical text.
Say it had three words, a, ab, and bb. Then There is a 2/5 chance of an a.
But what if "a" was 40% of typical texts, "ab" 59% of typical texts and "bb" was rare, 1% of typical texts.
Also, we have tons of particles. If I leave particles in a text to calculate the odds of a letter appearing in a new word, then the letters of li/pi/mi/e are going to be way over represented.
Maybe I'll just wing it for now, but I was curious if there was a real answer for this.