From the sound of it, yes, a cross compiler is a production grammar. The input abstract structure is annotated toki pona with additional meaning specific synonyms for the particles. The output structure is a sentence of ordinary toki pona.
I get the general idea, at least, but I am not clear about what the purpose is, nor how to apply it. Given a corpus, what? We feed a sentence from it into a machine that produces, minimally, a declaration "tp" or "not tp".
That would be a byproduct of creating a cross compiler. Cross compiling ordinary toki pona to ordinary toki pona would fail on invalid toki pona. It's useful for basic grammar checking, but that's already been done.
The real purpose of a cross compiler of this sort is to allow advance toki pona users to take advantage of more powerful syntax, but still be able to machine convert their text into ordinary toki pona.
But, hopefully, also a set of parse trees that lay out all the possible valid readings for the sentence (and as much as it can manage of things that turn out not to be tp).
In both tp++ and tp, a pi chain parses to one data structure, even though it might mean a combinatorial explosion of different things based on different groupings.
The cross compiler doesn't know or care how the meaning of the sentence binds to elements in reality. The cross compiler might validate that pronouns bind to something in a predictable manner.
What does this new gizmo add? Apparently, if there are a bunch of pre-editing marks added to the text, it can guide the parser to more accurate parses (eliminate some unreasonable possibilities, for example).
The human reader would see fewer parsings (fewer ways to link up the sentence to reality), but the compiler would see only 1 of the possible ones. E.g. using mon instead of pi when indicating personal possession reduced the ways to interpret soweli pi mun lili mon jan Mato, would compile down to soweli pi mun lili pi jan Mato. They both have problems determining what is being possessed (i.e. the mun or the soweli). But in the case of mon jan Mato, the tp++ writer can see that we don't have a soweli or mun of a jan Mato sort. After compilation, soweli pi mun lili pi jan Mato is ordinary toki pona, presumably just as readable as any other toki pona.
But that doesn't seem to add to the parser's power, since it could have been written with those marks in place already and then drop them when the final sentence is reproduced after the parsing is done. The question comes, then, what does the preedit?
People who want to work with a more powerful syntax, but continue to write texts that are readable by people unaware of tp++.
And, if it is humans, what does the parser actually do? Are there, in this new gizmo, devices to automatically add the preediting marks? Or, at least, to consider the possibilities with and without them (i.e., the possibilities)?
Inferring missing symbols is incredibly difficult to do by machine. This is why all programming languages have well defined terminal symbols and spaces between words. Humans it seems can do okay without punctuation, vowels and so on. But writing a compiler that can make sense of that is hard. I had to write a few hundred lines of ugly code to infer "li" when dropped beteween mi/sina & the verb phrase.
I admit that I don't get linear grammars well and often find them uninformative. So, most of these rules just seem odd to me. The one that does make sense is the rule that always puts 'li' between subject and predicate and then erases it after 'mi' and 'sina' standing alone, The idea of shoving pieces around is appealing, but except for moving PPs to the front with 'la' (and often losing the preposition), doesn't seem to have much obvious reference to tp.
tp has a mono-sentence syntax. There is only one sentence and all the phrases are required to be in one order. This is an unnecessary constraint. It is a rule that can be dispensed with. That some phrases don't have head particles is another irregularity that makes it difficult to break the mono-sentence syntax. If there was a subject header, then you could shuffle all the particle-headed-phrases of a sentence and it would mean the same thing, but it would be more expressive because
A (smallish) language should be expressive and not to verbose. As writers, we've pushed tp to it's limits. We alrea
Unless, of course, we are actually stepping back from the surface text somehow to underlying structures. But that is an area where machines are notoriously unreliable (imagine all the structures that give rise to a 'pi' phrase, for example) or overproductive (imagine listing them all out).
Since this is a cross compiler, it passes long pi chains undigest from the start to the end. Only a human brain can reliably make sense of a long pi chain. But while writing, the tp++ author can avail themselves to alternatives to pi, which compiles down to pi. For example, a particle that means possession instead of what ever generic thing pi stands for.
So, I hope your next paper on this is a bit about what is going to happen and where whatever you are suggesting fits in. I really want a machine to do the scut work, so keep on with this project. And thanks for what you have given so far.
It's an opportunity for me to learn about compilers. I ended up in programming sort of by accident, so if I finish it, this would be the computer science term paper I never wrote.