FLEx and Compound Rules

I have been dutifully adding items to my FLEx dictionary as I study Japanese. Additionally, I have entered some text for the parser to work with. Now, it appears that the parser expects spaces between words – which is a bit of a problem for Japanese. But it appears from the description of “compounding rules” that I should be able to use them in some measure to overcome this.

So, a concrete example maybe. Take the phrase 高い山. This should be parsed as “high (adj) non-past-suffix (adj) mountain (n).” However, FLEx refuses to recognize the non-past suffix. Here are some facts:

  • I have each of these lexemes entered properly as far as I can tell (adjectives have the headword without the final -い, though it is present in the citation form).
  • I have created a Left-Headed compound rule with Adjective as the right stem and Noun as the left stem.
  • I have created an Adjective template with an obligatory affix slot for time, and added the affix -い to it.
  • I have tested with this affix marked both as suffix and as suffixing-interfix. I think it must be marked suffixing-interfix to allow FLEx’s compound parsing to work, but it doesn’t seem to make it succeed, so not sure.

Okay, so it is more complicated than that, actually. I have the adjective いい “good” entered in the system (similar difference between lexeme and citation form) . And because of this, the parser is actually returning “high (adj) good (adj.) mountain (n.).” Basically, rather than treat the suffix as a suffix, it is mixing it up with “good”. (If this compound actually were to be written, it would be 高いいい山 or 高い_いい_). It appears that the parser is using the lexeme forms of all terms without respecting their templates (containing the required suffix)

So playing, I marked the adjective template as “Requires More Derivation”, which actually did work, sort of. Now it returns BOTH of the previous parses – the proper one AND the one containing “good”. I certainly wasn’t expecting that!

If I were to write out 高い_ instead of 高い山, the parser works things out correctly – so it must be something I am not understanding about how compound rules are supposed to work, or about category templates. Any suggestions? Otherwise, I may just have to enter spaces between words, which I didn’t want to do because the texts I am entering don’t have them.

Advertisements

About George

I'm interested in theology, languages, translation and various sorts of fermentation.
This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

5 Responses to FLEx and Compound Rules

  1. Paul D. says:

    It seems to me that if your parser works at the word level, it really needs some way of distinguishing words even without spaces.

  2. George says:

    Actually, FLEx works at the morpheme level (it is a morphosyntactic processor). So it really care more about how words are built. It has some the ability to categorize words and what sort of functions they perform in a phrase, but if it has more refined tools around phrase and discourse level grammar, I haven’t seen them.

    I guess I’m trying to push the boundary of what FLEx is really for. But I don’t know of a better tool for across-the-board language capture and analysis.

  3. Mike Aubrey says:

    The text charting tool is for discourse analysis, but it’s manual, not automated. Syntactic parsing is in the plans, but it’ll be quite some time before its available and useable.

  4. George says:

    Thanks, Mike. I haven’t played much with the charting tool. Seemed a little intimidating – but I probably should dive into it a bit.

    As far as compound rules, though…can one have inflectional affixes in between a compound’s left and right stems? I cannot seem to get that to work. Derivational affixes seem to work just fine, but there are many instances where it seems perfectly appropriate to have a compound built with inflectional affixes between. The example in FLEx’s parser help even seems to suggest an inflectional affix.

  5. George says:

    More than one way to skin a cat I suppose: I altered my “derivational” adjective template to remove all endings, duplicated the two inflectional affixes (-i and -katta) as derivational affixes to a new category “Noun Modifier” and modified the compound rule to only work off of Noun Modifier, not Adjective. That did the trick.

    In one sense this nicely differentiates between adjective root morphemes “acting” as verbs rather than nominal modifiers. Then again, I don’t like that I now have Adjective inflectional endings and the same endings duplicated as derivational endings to a whole new category.

    And in general, I don’t like it. Japanese is just a lucky case because the left-hand category in the compound only has two inflectional possibilities (that I am aware of at this point). I’m sure this approach could get messy quickly if that was not the case.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s