Building on Indexed Concept Parsing

Building on Indexed Concept Parsing

The previous chapter describes a technique for parsing called indexed concept parsing. In particular, it described the implementation of indexed concept parsing in the North West Water customer service representative tutor, Casper. The evaluation of the parser in Casper indicated that the parser was successful approximately 80% of the time, when judged by whether the parser in Casper suggested a correct answer in the top seven best matches; and 60% of the time, when judged by whether the parser in Casper suggested a correct answer that was the best match. In this chapter, we will analyze the failures in the Casper parser, and ways we can augment indexed concept parsing to increase the success of the parser.

Analysis of failures

Before one can make something better, one must understand what is wrong with it in the first place. Accordingly, an analysis of the parsing failures in the Casper beta test was done. We examined cases in which the student did not accept the result from the parser, but used the pull-down menus or used the parser again, instead. These failures fall into several categories: content errors, failure to recognize anaphoric references, spelling errors and cross-talk.

Content

As is perhaps not surprising in building a conceptual parser, most of the failures were failures in representation--either something was represented in such a way that a parsing failure was caused, or (the more common case) something wasn't represented at all. Given the taxonomy we have been using for index concept parsing, content failures were failures in representing index concepts, phrasal patterns or target concepts.

Missing concepts

Many of the examples can be traced to failure to represent either what would be target concepts or index concepts. Consider the following statement:

Student: Do you have an immersion heater or is you (sic) water supplied from a multipoint heater?

The student wants to know what type of water heater the customer has. There aren't any target concepts in Casper that match this type of question, nor are there index concepts that specialize water heaters as to type (immersion heaters vs. multipoint heaters).

Missing phrasal patterns

A fairly minor problem in the beta test was the failure to associate appropriate phrasal patterns to indices. For example, one of the indices was WORKERS, that had phrasal patterns such as "workers" and "workmen" attached to it. A phrasal pattern missing, through, was "gang," and so the following statement did not activate the index WORKERS:

Student: already have a gang on site sorting out problem

Simple paraphrases and complex inferences

Another type of content error is the case of simple paraphrases of CSR statements. For example, to express the same intent as

TELL-INSTRUCT-WAIT-AND-SEE-SHORT-TERM Your water problem should clear itself up within a couple of hours

a few students typed in something like "it is just a temporary problem." But there was no set of index concepts close enough to bridge between a statement of this type to the CSR statement. As another example, consider the following pair, a student statement and a target concept which is a close paraphrase of it:

Student: This is probably caused by bubbles in the system

TELL-INFO-PROBLEM-AIR-IN-WATER Your water has air bubbles in it.

In this case, the student is expressing that the cause of the discoloration of the client's water is due to bubbles in the water.

But consider the following student statement:

Customer: But what can be done about it? (discoloration in the water, due to bubbles) ...

Student: The air is completely harmless

There is no TELL-INFO-AIR-IS-SAFE target concept in Casper, but there is the following target concept:

TELL-INFO-WATER-SAFE Your water is perfectly safe to drink.

It seems clear that TELL-INFO-WATER-SAFE is the closest match to the student's statement. But consider the inferential process that must occur for this match to be made, which is something like:

The water is safe to drink because, although it is often the case that stuff in the water is harmful to drink, air in water is not harmful. The only thing contained in this water is air. Therefore, this water is safe to drink.

Or perhaps:

Water is safe to drink if all of the stuff contained in the water is safe to drink. This water contains air (and only air). Air is safe to drink. Therefore, this water is safe to drink.

Or perhaps:

Water is safe to drink if all of the stuff contained in the water is safe to drink. This water contains air and perhaps other substances. We assume that the other substances are safe to drink. Air is safe to drink. Therefore, this water is safe to drink.

The point of dwelling on different ways to reach the same conclusion (and notice, by the way, we ignored all of the issues of going from natural language into some representational form that could handle the conclusions above, including knowing that "harmless" means "safe.") is this: At any point in our programs, we can expect, as a possibility, that deep inference and deep modeling will be required to make a best match. The inference chains above and the models behind them provide a worst-case scenario for building embedded conceptual parsers. If what we need to have is accurate models, and if we need to allow unlimited inferential capability, then we may not be able to build these parsers--the task will prove too difficult or expensive. But if it is the case that we can limit, in some way, the needs for these models and inferences, then the task may prove possible.

Phatic utterances

There is another class of student utterances that the parser (in the beta test) could not match. These were simple responses of the following type:

Customer: Well, do you know if my water is safe to drink?

Student: yes

The Casper system did not have CSR statements for simple yes and no responses. Therefore, the parser, lacking any knowledge regarding how to connect "yes" to a CSR statement, meant that the parser failed to match to any target concept. (The correct match is, of course, TELL-INFO-WATER-SAFE).

Other sources of error

In addition to errors which are caused by faulty or missing content, there were errors due to other reasons. We include under these other causes of errors the failure of the parser to handle anaphoric reference correctly, spelling and typographic mistakes on the part of the student and cross-talk within the parser.

Anaphora

Anaphoric disambiguation is generally considered to be a very difficult problem, and it is no surprise that it caused difficulty within the beta test. For example, failure to disambiguate the pronoun "they" (as BITS) in the following statement contributed to a parsing failure:

Customer: [The bits] look like tea leaves.

Student: Are they only in your cold water supply?

And in more complicated interaction:

Customer: Okay. I'll be calling you if [the problem] doesn't clear up soon.

Student: Please do

there is an anaphoric reference to calling if the problem doesn't clear. And in this example:

Customer: What are you going to do about [my water quality problem]?

Student: Being dealt with now.

there is a complex interaction between anaphoric presupposition ["(The problem is) being dealt with now (by us)"], inference and syntactic form.

Spelling and typographic errors

Not surprisingly, student customer service representatives made a number of spelling and typographical errors which resulted in parser failures. For example,

Customer: But what is wrong with my water?

Student: theproblem (sic) will be resolved in a few hours

the lack of a space between "the" and "problem" meant that the index PROBLEM was missed.

Cross-talk

Because of the scoring mechanism used in the indexed concept parser in Casper, the following situation sometimes occurred. There would be, say, two target concepts R1 and R2. The student would say, "phrase1 and phrase2" where phrase1 would, if stated alone, parse to R1, and phrase2 would, if stated alone, parse to R2. Because the scoring mechanism counts against target concepts that do not include seen index concepts, cross-talk can occur, and neither R1 nor R2 may be returned.

For example,

Student: ring us back and we will arrange for a system controller to flush the main

where "ring us back" would normally parse to TELL-INSTRUCT-CALL-BACK-NIL and "we will arrange for a system controller to flush the main" would parse to TELL-INSTRUCT-SEND-SYSTEMS-CONTROLLER-FLUSH-MAIN. But enough points accumulated against each of these parses to make them unacceptable.

Types of parsing failures in Casper

Category
N
% of total errors
Content
78
81.3%
Missing Concepts
44
46.81%
Phrasal patterns
18
19.2%
Simple inference
7
7.5%
Complex inference
7
7.5%
Phatic utterances missing
2
2.1%
Anaphoric failures
15
17.0%
Spelling errors
2
2.1%
Cross talk
5
5.3%
Summary of parsing failures

Table 6.1 summarizes the parsing failures that occurred during the beta test. Remember we examined those statements found unacceptable by the students (as evidenced by their trying to make a different statement with the parser, or by using the pull-down menus). There were a total of 94 statements (of 492) that fit this category. These numbers should be used to gain an understanding of the overall types of errors, because the scoring must necessarily be subjective. The percentages add up to more than 100% because statements could fall into more than one error category. Clearly, the lion's share of the errors occur because of problems with content. The specific class of content errors that is most worrisome--complex inference errors--does not seem to account for a larger percentage of the errors.

Improvements

As Table 6.1 indicates, the place where we should look for improvements is in the content--some four fifths of the errors appear to be due to content. As we stated previously, this should not be surprising as we develop conceptual parsers. In this section, we will discuss specific measures taken to improve parsing in Casper, including adding content, building in expectations, resolving anaphoric references, and applying more sophisticated parsing techniques. But, adding content will be the place to start .

Additional content (I)

We have described three broad classes of content in building indexed concept parsers: target concepts, index concepts and phrasal patterns. In this section, we will discuss adding instances in each of these classes. In each case, we will use examples from the Casper tutor to illustrate.

Target concepts

One place in which we might add content is to add additional target concepts. This is appropriate when, especially as a result of testing the parser in real-life situations, we notice the users of the application program trying to express concepts that are not defined in the program[1]. For example, here are some examples of sentences that novice customer representatives said in the beta testing of Casper:

Student: please run your water and the problem will clear in a short time

Student: if you leave your tap to run on reduced pressure it will clear itself

Student: if you leave your cold water tap running it will clear

Student: As the water is drawn off the problem will clear

Student: if you run the cold water tap for a while the water should clear

Student: could you try running your first cold water tap until it runs clear

Student: if you run your cold water for fifteen minutes you should find that it will clear

Student: yes, but if you flush the tap for 15 minutes it should clear up

The commonalities of these statements is something like this: "If you run the water from the cold tap for a while the problem should clear." That is, the solution to the problem is conditional to a customer action, which is running the cold water tap.

There are some related CSR statements in the Casper system:

TELL-INSTRUCT-WAIT-AND-SEE-SHORT-TERM Your water problem should clear itself up within a couple of hours.

TELL-INSTRUCT-RUNOFF-WATER-NIL Try running the water from the cold tap for a quarter hour.

Each of these is a close match: the first is a statement that the problem will clear up (the results clause of this new target concept); the second is an instruction to run the tap for a short time (the condition clause).

It may be the case that if the parser matches to either one of these closely related utterances than this is acceptable. But there are reasons to believe that this could cause problems. First of all, there are other, superficially close utterances that should not match. For example,

TELL-INSTRUCT-RUN-COLD-TAP-REPORT-RESULT Can you run the cold tap for a bit and tell me what you see?

TELL-CONDITIONAL-WATER-NOT-SAFE-IF-DOESNT-CLEAR If the problem doesn't clear up, I wouldn't recommend drinking the water.

For a variety of reasons, these may be seen as a better match than the two acceptable matches. This is likely to at least decrease the perfect match percentage, and at worse put the acceptable matches out of the top seven.

Secondly, it may not be the case that the user of the parser will agree that these statements are close enough in meaning. It's easy to imagine, for example, a student saying that "Your problem will clear up in a short time" does not mean the same as "If you run the water for a short time, your problem will clear up," just because the conditional clause is missing.

The solution, of course, is to add a new CSR statement:

TELL-INSTRUCT-RUNOFF-WATER-THEN-WAIT-AND-SEE-SHORT-TERM If you run the water from the cold tap for a while the problem should clear.

Another example comes from the following two statements (both by the same person):

Student: what is the pipe before your internal pipe madeof (sic)

Student: what is you (sic) service pipe made of

These are closely related to two other CSR statements:

ASK-PIPES-LEAD Do you have lead pipework in your house?

ASK-PIPES-MATERIAL What kind of pipework do you have in your house?

But clearly these have very different intents. The student doesn't want to talk about the internal pipework; the student wants to ask about the pipe which leads from the internal pipes to the main pipework. Again, the feedback from the beta sessions is useful, and two CSR statements are added:

ASK-SERVICE-PIPE-LEAD Are your service pipes made of lead?

ASK-SERVICE-PIPE-MATERIAL What material are your service pipes made of?

Notice, by the way, that these statements don't have to be good statements--that is, good statements to make to a customer--to be good to have in the tutorial. In fact, a tutorial system will want to have in it just those bad statements as teaching points.

Additional indices

In addition to target concepts, we can also add index concepts to the system. For the aforementioned examples about service pipes:

ASK-SERVICE-PIPE-LEAD Are your service pipes made of lead?

ASK-SERVICE-PIPE-MATERIAL What material are your service pipes made of?

it is imperative that we create an index concept service-pipe if one did not exist already; otherwise, we would not be able to index either of these target concepts differently from the questions about pipes internal to the home.

But there are cases were additional sets of index concepts are important, especially as students use paraphrases of existing CSR statements for which the system was not prepared. For example, consider the following CSR statement:

TELL-INFO-PIPES-LEAD Your pipes are made of lead.

and the following student's paraphrase:

Student: you could have lead pipework to your property.

Before the beta test, there was only one index concept set associated with TELL-INFO-PIPES-LEAD, that is, the set {pipes consist-of lead}. This student's answer indicates there is another way to express that the pipes are lead; one can state that the property contains lead pipes. Therefore, the set {property contain lead pipes} was added to the index concept sets associated with TELL-INFO-PIPES-LEAD.

Additional phrasal patterns

A third place where additional content can be added is in the phrasal patterns associated with indices. For example, the phrasal patterns associated with the index CAUSE in the original beta test were {"cause", "caused by"}. But there were several statements by students that used "due to." For example:

Student: It is probanly (sic) due to air in the pipes.

Student: your discolouration is probably due to other utilities working in your area

Student: Your problem is more than likely due to the old lead pipework

Adding "due to" to the phrasal patterns associated with the index concept CAUSE assured that this index would be activated when "due to" was part of the input sentence.

Building in expectations

The basic indexed concept parsing algorithms assumes that, before parsing begins, each target concept is just as likely to be referenced as any other. Typically, though, at a given point in the running of the application program, certain target concepts are much more likely than others. This is especially true of interactive programs such as Casper, in which the novice customer service representative goes through different phases of discovering the cause of a problem and then making recommendations to the simulated customer. It is more likely that a student will ask "Is the problem in both taps?" after the customer says, "Hello, this is Mrs. Hughes in Liverpool. I've rung up to complain about bits in my water," than after the customer says, "I've got the problem in both taps." It's also likely that the customer service representative will ask the customer her address. In sum, at any point, certain target concepts will be more likely than others.

There are various methods of generating expectations using context. A very rich model of the interaction between the customer and the customer service representative could be built for the Casper system, for example. In the case of the Casper system, we have opted for a much simpler solution. Because the list of possible customer utterances is static (remember, a sound clip is created for each utterance), we treat each customer utterance as a one-level stimulus for possible student responses. That is to say, we build a simple stimulus/response record--if the customer says X, the student is likely to reply with Y or Z. For example, if the customer says, "Hello, this is Mrs. Hughes in Liverpool. I've rung up to complain about bits in my water," we predict that any of the following CSR statements will occur:

ASK-CUSTOMER-ADDRESS What is your address?

ASK-TAPS-BOTH Is it in both the hot and the cold taps?

ASK-WATER-BITS-DESCRIPTION What kind of bits are in your water?

ASK-PROBLEM-DURATION How long have you had the problem?

The actual student statements made in response to this customer statement were:

Student: are the bits coming from your cold water tap

Student: How long have you had this problem

Student: How long have you had this problem?

Student: could you describe them to me please?

Student: Please describe the bits

Student: Please can I have your address?

Student: how long have you had the problem mrs hughes

Student: may i have your address please

Student: i'm sorry mrs hughes, may i have your address please

Student: WHAT IS YOUR ADDRESS

Student: Could you describe what the bits look like?

Each of these student responses is a paraphrase of one of the predicted CSR statements.

In Chapter 3, we describe three criteria for calculating goodness of match between a probe index concept set and target concepts. These were information about the presence of references to index concepts associated with the target concept, information about the absence of references to index concepts associated with the target concept, and information about the presence of references to index concepts not associated with the target concept. The method we are currently describing indicates a fourth criterion, that is, Is this target concept expected at this point in the dialog? (More specifically, we ask this question only if some index associated with the target concept is referenced). Unlike the other three criteria, this criterion is a binary one: Either the target concept is expected or not.

For example, the customer has just asked what she can do about her problem, and the student replies:

Student: If you leave your tap to run on reduced pressure it will clear itself

Without expectations, the parser returned as the best match:

TELL-INSTRUCT-RUN-COLD-TAP-REPORT-RESULT Could you run the water from the cold tap and tell me what you see? (Paraphrase: Can you run the water for a bit and see if it clears?)

The real best match is only second in the list:

TELL-INSTRUCT-RUNOFF-WATER-THEN-WAIT-AND-SEE-SHORT-TERM If you run the water from the cold tap for a while the problem should clear.

With expectations, this CSR statement is returned as the best match. Expectations are especially good at improving the perfect match score, and reducing the total distance from the top. Although two target concepts may have a similar score on the criteria that calculate information from the presence and absence of references to index concepts, the expectation that a target concept is expected in a particular context is enough to break the tie.

Disambiguating pronominal reference

Not surprisingly, some of the failures in parsing are caused by students' use of pronouns instead of direct references to index concepts. For example, consider the following fragment:

Student: What do the bits look like?

Customer: They look like dark flakes.

Student: Are they brown, black, or another colour?

The two closest CSR statements are:

ASK-WATER-BITS-BLACK Do you have black bits in your water?

ASK-WATER-BITS-DESCRIPTION What kind of bits are in your water? (Paraphrase: Please describe the colour of the bits in your water.)

Even with expectations, the lack of an explicit reference to bits means neither of these is returned as the best match. On examining the dialog, however, it is clear that "they" refers to bits. Disambiguating pronominal reference is a very difficult problem however, as evidenced by Winograd's famous example, "The city councilmen refused to give the women a permit for a demonstration because they feared violence/advocated revolution" (Winograd 1971, cited in Grishman 1986:130). In the limit, arbitrary amounts of inference is required to do pronoun disambiguation.

As usual, there is hope for a simpler solution. Each customer utterance is tagged with a set of index concepts a pronoun is likely to refer to. In the example above, "They look like dark flakes," this set is {bits flakes}. We also define a special index, m-pronoun, with the typical surface forms of pronouns (they, them, etc.) attached as phrasal patterns. If a pronoun is seen, the set of tagged indexed concepts is added to the set of indices seen in the sentence, and then the appraisers are run in the usual way.

Note that this does not require, as in the expectations case, an additional criterion. We simply add all of the indexed concepts tagged on a customer utterances when a pronoun is seen, and the existing criteria calculate the score for each target concept.

Added content (II)

Indexed concept parsing relies on recognizing index concepts associated with target concepts. The presence, or absence, of index concepts associated with a target concept determines whether the parser returns the target concept as its result. But notice a potential difficulty: Indexed concept parsing does not take ordering information of the index concepts into account. Here is a example from the Casper system, with two target concepts pertaining to telephone calls:

TELL-INSTRUCT-NWW-WILL-CALL-NIL I will call you back.

TELL-INSTRUCT-CALL-BACK-NIL Please give us a call back.

Both target concepts have an identical index concept set associated with them: {NWW CALL BACK}, with first person personal pronouns ("I," "we," "me," "us") attached as phrasal patterns to the index concept NWW (that is, North West Water, the water utility). The indexed concept parsing algorithms we have discussed so far can not distinguish between these target concepts, except via tagging one or the other as likely to occur in some context. On reading "I will call you back," the evaluation function will return a tie for these two target concepts (assuming they are equally likely to occur in a particular context). Although the two results will be close together in the results list, it really would be a better result if TELL-INSTRUCT-NWW-WILL-CALL-NIL had a better score than TELL-INSTRUCT-CALL-BACK-NIL upon reading "I will call you back."

Direct Memory Access Parsing

Direct Memory Access Parsing techniques can be used for finding either target concepts or index concepts. By "finding target concepts," we mean using DMAP in just the way described in the previous chapters--attaching phrasal patterns to target concepts and thus recognizing the concepts directly. By "finding index concepts," we mean using DMAP to recognize index concepts that contain structure (other than hierarchical and partonomic relations), and therefore matching phrasal patterns than can contain target to that structure (not just strings, as we have discussed so far). We discuss using DMAP to recognize target concepts and index concepts in turn.

Looking for target concepts

When we are using DMAP to find target concepts, what we are saying, in effect, is that we have a strong theory of what phrasal pattern will trigger a reference to that concept. Using the example of

TELL-INSTRUCT-NWW-WILL-CALL-NIL I will call you back

we might have strong expectations that students will type in:

"NWW will CALL-BACK"

where NWW and CALL-BACK are themselves concepts, with (say) the following phrasal patterns:

NWW: "I", "we", "North West Water"

CALL-BACK: "call back" "return your call" "phone"

and, therefore, according to the algorithm presented previously, DMAP will recognize a reference to TELL-INSTRUCT-NWW-WILL-CALL-NIL with any of the following sentences (among others):

I will call back

We will call back

North West Water will call back

I will return your call

We will return your call

If this is the case, then we have very strong evidence that the concept being referenced is TELL-INSTRUCT-NWW-WILL-CALL-NIL and not, say, TELL-INSTRUCT-CALL-BACK-NIL. In effect, then, this becomes another criterion for evaluating the score for target concepts: Is this concept referenced by a complete DMAP phrasal pattern[2]?

Looking for index concepts

We can use DMAP to look for index concepts, too. It should be clear that we can use DMAP to match phrasal patterns of the sort we have discussed previously, but we can do two other useful things with DMAP parsing of index concepts. First, we can use DMAP to find discontinuous phrases. Consider the index concept set we described for TELL-INSTRUCT-NWW-WILL-CALL-NIL: {NWW CALL BACK}. Having BACK as an index concept seems wrong. We would like an index concept like CALL-BACK instead, but there is difficulty in handling sentences such as:

I will call you back

because the "you" interrupts "call" and "back." One solution would be to create a structured index for CALL-BACK with an:OBJECT slot, and add the phrasal pattern:

call:OBJECT back

in which "you" can reference the:OBJECT of CALL-BACK in some way. But there is an easier way.

By relaxing a statement in the procedure advance-prediction, we can create discontinuous phrasal patterns. The algorithm for advancing a prediction is (as stated in Chapter 3):

To advance a prediction on item from start to end:

if the prediction is static or a dynamic prediction whose next value equals start, do:

if the phrasal pattern value is empty, do:

reference the base of the prediction from start to the end of the prediction

else do:

create a new (dynamic) prediction with the following values:

the base of the new prediction is the base of the existing prediction,

the phrasal pattern of the new prediction is all but the first item of the phrasal pattern of the existing prediction,

the start of the new prediction is one more than the start of the existing prediction,

key the new prediction on the target of the new prediction's base and the first item of the existing prediction's phrasal pattern.

If the line in bold reads:

if the prediction is static or a dynamic prediction whose next value >= start, do:

(that is, the test is changed from equal to greater than or equal), then discontinuous phrases will be recognized. In our specific example, predictions we be advanced even if "you" intervenes between "call" and "back[3]."

We can also go a long way towards solving the problem of distinguishing between target concepts with identical index concept sets, by creating index concepts which consist of other, conjoined indexed concepts. Returning to the "We'll call you/you call us" examples, we can create new index concepts for TELL-INSTRUCT-NWW-WILL-CALL-NIL and TELL-INSTRUCT-CALL-BACK-NIL. For TELL-INSTRUCT-NWW-WILL-CALL-NIL we can add the index concept NWW-CALLS, with the phrasal pattern "NWW CALL." For TELL-INSTRUCT-NWW-CALL-BACK-NIL we can add the index concept CALL-NWW, with the phrasal pattern "CALL NWW." Using the indexed concept parsing techniques described in Chapter 3, these new indices distinguish between the two target concepts.

Evaluation

We have discussed several options for improving the use of an indexed concept parser:

We can add additional content, in the form of new target concepts, index concepts and phrasal patterns.

We can build in contextual expectations about which target concepts are likely to appear.

We can build in pronominal disambiguation.

We can attach phrasal patterns directly to target concepts and use DMAP.

We can create discontinuous phrasal patterns that can distinguish ordering between statement elements.

In looking at the results of using indexed concept parsing in the Casper beta testing, we experimented with several of these options. In this section, we examine the gains one might expect by taking each of these options.

Methodology

From the transcripts of the beta testing sessions, we extracted every use of the type-in parser. There were 492 uses of the type-in box. To create an oracle for the evaluation (see Chapter 2), each of these uses was tagged with a set of targets concepts, each of which was a reasonable parse of that statement. For example, the student input:

Student: please ring back and a system controller will call and flush the main

was tagged with the following target concepts:

TELL-INSTRUCT-CALL-BACK-NIL Please give us a call back

TELL-INSTRUCT-SEND-SYSTEMS-CONTROLLER-FLUSH-MAIN I'll send out a systems controller to flush out the main.

By tagging this input with these target concepts, we are saying that either one of these target concepts is an acceptable parse of the student input[4]. An automated testing procedure was then developed to run the parser on each student input, checking the results against the correct parses. As described in Chapter 1, each parser could be described as perfect--that is, the best result of the parser was one of the correct parses, or acceptable--that is, one of the correct parses appeared within the top n parses, where n is the acceptable set size (in this case 7).

The parser was tested with and without additional content. Testing the parser without additional content was an attempt to duplicate the evaluation of the beta test. The parser was also tested with and without pronominal disambiguation. To add pronominal disambiguation, each customer utterance was tagged with an index concept which might be referred to. For example, the customer statement:

Customer: That's good to hear. But what needs to be done about the discolouration?

was tagged with the index concept DISCOLOURATION, which meant that pronouns in the next student utterance would count as references to DISCOLOURATION, as described above.

Further, the parser was tested both with and without contextual expectations. To add expectations, each customer utterance was tagged with a set of target concepts that were likely follow-up statements; again, as described above. For example, the customer statement:

Customer: "Hello, this is Mrs. Hughes in Liverpool. I've rung up to complain about bits in my water."

was tagged with the following expectations:

ASK-CUSTOMER-ADDRESS What is your address?

ASK-TAPS-BOTH Is it in both the hot and the cold taps?

ASK-WATER-BITS-DESCRIPTION What kind of bits are in your water?

ASK-PROBLEM-DURATION How long have you had the problem?

Eight series were run then: all combinations of with and without additional content, with and without contextual expectations, and with and without pronominal disambiguation.

Results

Table 6.2 summarizes the results of running the tests on the basic content.

Results of tests without additional content

Basic Content
Pronoun
Expectations
Pronouns & Expectations
Perfect (n)
274
278
341
339
Acceptable (n)
407
407
426
427
% Perfect
55.69
56.50
69.31
68.90
% Acceptable
82.72
82.72
86.59
86.79
% Perfect of Possible
60.22
61.10
74.95
74.51
% Acceptable of Poss.
89.45
89.45
93.63
93.85
Average Distance
1.36
1.34
1.22
1.23
Table 6.3 shows the results of running the tests on the additional content. Results of tests with additional content Added Content Pronoun Expectations Pronouns & Expectations Perfect (n) 332 334 373 371 Acceptable (n) 436 437 441 442 % Perfect 67.48 67.89 75.81 75.41 % Acceptable 88.62 88.82 89.63 89.84 % Perfect of Possible 72.33 72.77 81.26 80.83 % Acceptable of Poss. 94.99 95.21 96.08 96.30 Average Distance 1.31 1.29 1.17 1.19 Summary

In this chapter, we have examined various ways to improve indexed concept parsers, which included creating additional content, building in expectations, pronominal disambiguation, and using Direct Memory Access Parsing to recognize structured index concepts (and target concepts). We experimentally made these improvements to the Casper index concept parser, post hoc to the beta testing of the Casper system. These results indicated that adding content and building in expectations would make a big difference in the Casper parser, and that pronoun disambiguation would have a small effect. These changes, in combination, brought the recognition rate to nearly 96%. This meant that using DMAP to recognize structured index concepts or recognizing CSR statements directly would not provide much incremental benefit. One possible reason for this is that the number of CSR utterances--that is, the size of the target concept set--is only about 200. In Chapter 8, where we describe building a parser for a system that contains over 1,600 target concepts, we will see that recognizing structured indices can provide an advantage. But first, we describe the general methodology behind building an indexed concept parser for an application program.

[1.]This points out an additional value of a parsing system over a `point and click' interface: Users of the application program can attempt to express things that the program can't handle. Although by definition the parser can not make an accurate match for this type of input, the creators of the application program can use these mismatched expressions as feedback for further system development.

[2.]Alternatively, one could create an index concept set associated with a target concept which contains the target concept. The target concept would have a high information value, and when seen in the input text, create a good match.

[3.]We need to use these discontinuous phrasal patterns with caution, though. The index concept CALL-BACK will be referenced by the sentence "Call your mother, she's back at the ranch."

[4.]The vast majority of the inputs were tagged with either one target concept or the UNKNOWN token.

Types of parsing failures in Casper
Category	N	% of total errors
Content	78	81.3%
Missing Concepts	44	46.81%
Phrasal patterns	18	19.2%
Simple inference	7	7.5%
Complex inference	7	7.5%
Phatic utterances missing	2	2.1%
Anaphoric failures	15	17.0%
Spelling errors	2	2.1%
Cross talk	5	5.3%

Results of tests without additional content
	Basic Content	Pronoun	Expectations	Pronouns & Expectations
Perfect (n)	274	278	341	339
Acceptable (n)	407	407	426	427
% Perfect	55.69	56.50	69.31	68.90
% Acceptable	82.72	82.72	86.59	86.79
% Perfect of Possible	60.22	61.10	74.95	74.51
% Acceptable of Poss.	89.45	89.45	93.63	93.85
Average Distance	1.36	1.34	1.22	1.23

Results of tests with additional content
	Added Content	Pronoun	Expectations	Pronouns & Expectations
Perfect (n)	332	334	373	371
Acceptable (n)	436	437	441	442
% Perfect	67.48	67.89	75.81	75.41
% Acceptable	88.62	88.82	89.63	89.84
% Perfect of Possible	72.33	72.77	81.26	80.83
% Acceptable of Poss.	94.99	95.21	96.08	96.30
Average Distance	1.31	1.29	1.17	1.19