Friday, December 14, 2007

Conflict Resolution in Rules

Probably the initial work done in Conflict Resolution (CR) for rulebased Systems (RBS) was done by Charles Forgy and John McDermott and published in the collection of papers "Pattern-Directed Inference Systems", edited by D.A. Waterman and Frederick Hayes-Roth way back in 1978. (This was my "bible" in grad school.) Anyway, since that time, several other books have come out, the best of which IHMO is "Rule-based Programming with OPS5" by Thomas Cooper and Nancy Wogrin, published in 1988. Both are long since out of print but can still be found on the internet. Even Giarratano and Riley spend very little time with CR since it's considered by most to be something that we thought about a long time ago and should not need

The first one is a rather high-level book that is far more general than most of us would want to read. The second, however, is very down-to-earth and easily understood by most rulebase guys. In particular I would recommend chapter three where the difference between the LEX and the MEA strategies are explained. But, back to the original point, what exactly do we mean by CR and why do we need it? The first book explains, in excruciating detail WHY we need it; sensitivity of control and stability of rules. But HOW to accomplish this? Some RBS allow you to set your own strategy. Some have it pre-programmed in to the system. Some don't have it at all. But, for now, let's take the Cooper and Wogrin approach, explain it, and later we'll expand on it.

When the system is first initialized, along with setting up the Rete network, the system sets up an Agenda Table wherein the rules that the Left Hand Side (LHS) all evaluate to TRUE are kept and adjusted between cycles. These rules are "ordered" in the following manner:

Refraction (any rule already fired is not eligible to fire again)
Recency (based on the time stamp)

What they did not consider was either priority or ruleset membership. So, just for arguments sake, let's add those.

Refraction (any rule already fired is not eligible to fire again)
Recency (based on the time stamp)

The hardest of all to understand is the one on Specificity. For each of the following, a count of +1 is given to the rule based on the LHS for

An element Class name
A Predicate Operator with a constant (such a x <= 3)
A Predicate Operator with a "bound" variable
A disjunction such as << a b c d >>
Each predicate of disjunction within a conjunction

An example is that

{ <=> symbol << yes no maybe . >> }

counts as two in the specificity count. Normally, the CR doesn't get much beyond recency. So, why do we need CR? (I think that this is where we started...) Stability and Control. What I have discovered is that systems without a CR strategy do poorly in benchmark tests while those that at least have a depth strategy do much better. In the older systems you could set the strategy to see which one got the better response. AND it helped to see if you got a different answer using a different strategy. If you did, then that helped GREATLY when trouble shooting - you shouldn't get a different answer if the rules are constructed properly but, as Cooper and Wogrin point out, you surely can get a different answer.


Thursday, December 13, 2007

Truth or Consequences?

At what point does one tell the truth to a client? Or to anyone, for that matter? So often we see a company motto of something like "Our Customers are Number ONE" or "Customers Come First" or ... Well, you get the idea. The problem is that the customer doesn't really want to hear the truth. The customer wants to hear that things are going to be just grand, the the project will be on time and under budget, that there will be no problems, that you will exceed all of the stated goals (even though IT goals are, at best, nebulous), that the bottom line will improve dramatically, and that everyone will get a promotion and huge pay raise because of this wonderful project. Nirvana. Doesn't happen!

So, now that you are in the position of telling the truth or losing the order to another firm that you know is lying through their teeth, what do you do? The answer is (you knew it all along but wouldn't admit it, right?) that you walk away. That's right - walk away. If the customer wants you to lie now then the customer will lie about you later and, when the job goes belly up, you will be the one to blame. A consultant lives and dies on his/her reputation. A company is only as good as what it can do to help the customer. And if a 2nd or 3rd rate company has to lie to get the contract, then let them have it.

True, the customer WON'T call you back later - not normally. That would be to admit a mistake. But maybe they will listen to the next company or person who has to walk away rather than agree to everything and do nothing but take the money and run. If your integrity is for sale, then you lose that integrity on the first sale. And you never get it back. And you won't make as much money as the guys who lie, cheat and steal from their clients. But, in the end, you will be a much happier person.

I think that this will be the last in our line of philosophical blogs. From now on, lets' deal with rulebased systems, the Rete Algorithm, neural nets, statistical analysis - hard core stuff that we can all comment about.

Happy Chaunnuchah
(Yeah, it's over now)


Wednesday, December 5, 2007

CLIPS and Waltz 50


I finally got around to doing a really simple task - that of just checking the latest claim by some guys that CLIPS 6.30 ran with extra-ordinary speed. OK, it does. I think - but it uses internal deftemplates rather than going through the laborious process of creating external objects like those of us in the "real world" have to do. My experience with Jess is that using a deftemplate that is associated with an external object almost doubles the time it takes. My suspicion is that it would double again if the deftemplate were removed from the internal code entirely. So that will be my task for this coming month - to migrate the CLIPS Waltz-50 benchmark to pure external objects. And to create the testing for the WaltzDB-16 that is even harder than Waltz-50.

Meanwhile, here are the results for the test run that I did on my Mac Core 2 Duo laptop (2 GHz, 2GB of RAM) - if you see where I made a mistake somewhere, let me know. I did compare it to the OPSJ and JRules rules for Waltz-50 and CLIPS is doing some things that you cannot do with Java. Things like getting the arc-tan of an angle so quickly - and, yes, Java does have arc-tan and things like that in the Math class, but doing Math in Java is slower than in C. Java has to work at things like that. So, after I re-write either the Java benchmark or the CLIPS benchmark so that they are both pretty much the same, I'll re-run and re-test all of these. Meanwhile, here is the test result using the code provided with CLIPS 6.30. Two other things that I would like to do is to run CLIPS against Mark Proctors new benchmark as well as against our new 10K Data Cleansing benchmark. (I need to finish writing that one because I don't CARE if you cheat or not - just how fast can you get the answer. It's a good one for the business world if I ever finish it.)

CLIPS> (load waltz.clp)
Defining deftemplate: stage
Defining deftemplate: line
Defining deftemplate: edge
Defining deftemplate: junction
Defining defglobal: MOD-NUM
Defining deffunction: atan2
Defining deffunction: get-y
Defining deffunction: get-x
Defining deffunction: get-angle
Defining deffunction: inscribed-angle
Defining deffunction: make-3-junction
Defining defrule: begin +j+j
Defining defrule: reverse_edges +j+j+j
Defining defrule: done_reversing =j+j+j
Defining defrule: make-3_junction +j+j+j+j+j
Defining defrule: make_L =j=j=j+j+j
Defining defrule: done_detecting =j+j+j
Defining defrule: initial_boundary_junction_L +j+j+j+j+j+j
Defining defrule: initial_boundary_junction_arrow =j+j+j+j+j+j+j
Defining defrule: second_boundary_junction_L +j+j+j+j+j+j
Defining defrule: second_boundary_junction_arrow =j+j+j+j+j+j+j
Defining defrule: match_edge +j+j+j+j
Defining defrule: label_L =j+j+j+j+j
Defining defrule: label_tee_A =j+j+j+j+j
Defining defrule: label_tee_B =j=j+j+j+j
Defining defrule: label_fork-1 =j+j+j+j+j+j
Defining defrule: label_fork-2 =j=j+j+j+j+j
Defining defrule: label_fork-3 =j=j=j+j+j+j
Defining defrule: label_fork-4 =j=j+j+j+j+j
Defining defrule: label_arrow-1A =j+j+j+j+j+j
Defining defrule: label_arrow-1B =j=j=j+j+j+j
Defining defrule: label_arrow-2A =j=j+j+j+j+j
Defining defrule: label_arrow-2B =j=j=j+j+j+j
Defining defrule: label_arrow-3A =j=j+j+j+j+j
Defining defrule: label_arrow-3B =j=j=j+j+j+j
Defining defrule: label_arrow-4A =j=j+j+j+j+j
Defining defrule: label_arrow-4B =j=j=j+j+j+j
Defining defrule: label_arrow-5A =j=j+j+j+j+j
Defining defrule: label_arrow-5B =j=j=j+j+j+j
Defining defrule: done_labeling =j+j
Defining defrule: plot_remaining +j+j+j
Defining defrule: plot_boundaries =j+j+j
Defining defrule: done_plotting =j+j+j
CLIPS> (load waltz50.clp)
Defining defrule: startup +j+j
CLIPS> (watch statistics)
CLIPS> (reset)
CLIPS> (run)
14065 rules fired Run time is 1.03537487983704 seconds.
13584.4516550506 rules per second.
9097 mean number of facts (10806 maximum).
1 mean number of instances (1 maximum).
2298 mean number of activations (12616 maximum).