Monday, June 16, 2008

Parallel Rulebase Systems and Homeland Security

I first wrote about this back in November 2006, http://javarules.blogspot.com/2006/11/parallel-rulebase-systems.html in which I chatted aimlessly about Forgy and Gupta and others and what they had done. Since that time, I've been doing quite a bit of reading, all of which I will post somewhere that everyone can get the papers and read for themselves. Meanwhiile, let's take a look at what we "think" we can do before we do it.

First, most of the processing time goes into the match process - about 85% to 90% of the total time involved. Moving from there you can get about a 10 times improvement on the speed of that process alone. You can get improvements on the rest of the processes as well but they are dwarfed in significance by the match process. There is one company that has done quite a bit of work for IBM and others on parallelizing processes in C and C++, the Rapid Mind company, http://www.rapidmind.net out of the frozen north (Canada) - but they, too, can only seem to get a 10 to 1 improvement.

Two things to consider here: (1) Unless Dr. Forgy can do something that nobody else can do, we won't get much better performance on small systems using parallel rulebased systems. (2) CLIPS has showed us how to get about 250 to 1 improvement on rulebase performance using proper indexing and proper writing of code. Gary Riley (the CLIPS guy) and Dr. Forgy will be at the October [Technical] Rules Fest in the DFW area in, well, October. See http://www.rulesfest.org for more details on that one.

What I didn't discuss was what is the need for speed? Aren't most rulebased systems sufficiently fast enough as it is? The answer for the business applications, yes. But the answer for research, defense, homeland security, NO! The problem is that in the business world the match process is not terribly complicated BECAUSE the KE (Knowledge Engineer) who oversees the program won't allow it to happen BECAUSE the KE knows that the match process is the problem child of the rulebase world.

Unfortunately, you can't avoid the many objects, many patterns, many rules matching process in the "Big Boy" applications. For example, Homeland security (and I'm not telling things that are in any way a secret here) has between 500K and 2M rules, most of which are small LHS (Left Hand Side) with only a few CE's (Condition Elements) that have to be matched against thousands of ports of entry against millions of travelers. WOW! Even with TeraBytes of RAM you won't be able to process all of that in this century UNLESS you can parallelize most of it. Now this is where the match problem will really dominate.

Let's look at another problem: That one that you see on all of the crime shows so often, the DNA match process. Most of the time they are matching only a small number of DNA samples, usually less than a few thousand, and NOT matching on a national database sized sample. In the UK they match on their own database - one that is growing by leaps and bounds daily - but not on all of the EU. By the time they get a hit on a suspect, days or months later, the suspect has moved and left a cold trail behind or returned to his home country. (I use "his" rather than "her" because I've heard of very few female terrorist except for the poor, misguided Daughters of Islam who have been beguiled into becoming a human bomb.) In R&D or in psychology, the very large database of objects along with many, many rules (usually used in lieu of a neural net) would benefit from a parallel matching process as well.

OK - how can we get this financed? Banks, insurance companies, stock markets, none of them have that problem or they can code around the problem. R&D? No money. Government? Ah, there's the Honey Pot!! Now that we know where the Honey Pot is located, how can we get them (the government watch dogs) to open the lid for us? Simple - show them how it would work on a similar problem that they might have. So, on that thought, does anyone have a simple 2,000,000 rules that can be associated with 5,000 ports (5,000,000 if you include non-listed ports) along with 5,000,000 possible entries? Didn't think so. But Uncle Sam does. Now, if Uncle Sam would just let us play with this problem for a year or two, we could get Homeland Security down pat. But they won't. Not without so many bureaucratic layers that nothing will get done. So, I guess we'll just sit and wait for the mushroom clouds to show up on the horizon. Or next door. See you in ....

SDG
jco

Monday, June 9, 2008

Was Jess a CLIPS Spin-Off?

Here we go again - same article in 2006 I said that Jess was a CLIPS spin-off. It wasn't and isn't and wont' be. I suppose that I "ass-u-me"ed that it was derived from CLIPS because CLIPS came first and Jess uses the same defrule, deftemplate, etc., syntax that is used by CLIPS. It even uses the same file name extension that is used by CLIPS. And, sometimes, if it isn't too complex or too tool-specific, you can import a Jess file straight into CLIPS or a CLIPS file into Jess. (Or so I've been told.)

OK, I should have checked with the authors of the tools before I said that. BUT, in my own defense, if I saw a tool that used ILOG JRules code with the same file name extensions and the same syntax, I would HAVE to "assume" that it was a JRules spinoff. The same thing goes for Drools or any other tool.

So, just to set the record straight, Jess is NOT a CLIPS spinoff and is NOT derived from CLIPS!! Got it? Got it!! Can we close this case now? Two years later? Please?

SDG
jco

Open Source - Myths and Legends

What is "Open Source" software? Wikipedia gives this definition:

"Open source is a development methodology,[1] which offers practical accessibility to a product's source (goods and knowledge). Some consider open source as one of various possible design approaches, while others consider it a critical strategic element of their operations. Before open source became widely adopted, developers and producers used a variety of phrases to describe the concept; the term open source gained popularity with the rise of the Internet, which provided access to diverse production models, communication paths, and interactive communities.

The open source model of operation and decision making allows concurrent input of different agendas, approaches and priorities, and differs from the more closed, centralized models of development.[2] The principles and practices are commonly applied to the development of source code for software that is made available for public collaboration, and it is usually released as open-source software."

Lots of words but not what some have defined as Open Source. Some have defined Open Source according to the Free Software Foundation, aka, Richard Stallman. Others have used the Apache License as the defining criteria. Others, such as my editor with InfoWorld, maintains that if the software is free, the source code is free (or available for a nominal fee), and others are allowed to contribute to the core code, then the blinking stuff is, for all practical purposes, OPEN SOURCE!

Several have taken me to task for this definition: Jason Morris, Dr. Ernest Friedman-Hill (he of Jess fame), Mark Proctor (the Drools guy), and one or two others. They feel that Drools is Open Source while Jess and CLIPS are not. (Gary Riley has not weighed in on this one - not yet.) While I respect their opinion (and indeed I have to respect their opinion since they wrote this stuff) I do feel that if it walks like a duck, quacks like a duck, swims like a duck, has a bill like a duck and webbed feet like a duck, then a swan it ain't. Now we can have Mallard ducks, Brown ducks, Mottled ducks, Green ducks, but they are all ducks.

Quite some time ago, about November of 2006, InfoWorld ran an article on Rulebase Open Source Software in which I called Jess an Open Source product. It seems that this stirred up a hornet's nest and both Dr. Friedman-Hill and Mark Proctor let me know about it the following week at the BR 2006 Forum in D.C. Their complaint (and they both agreed) was that Jess was not true Open Source. Well, again, who is the official Open Source committee members? Richard Stallman wrote an article on this (http://www.gnu.org/philosophy/open-source-misses-the-point.html) in which he argues that the GNU General Public License should be the controlling factor.

Bruce Peren et al have come up with the following "Open Source" definition and (I think - don't know for sure) that they also copyrighted this slogan. (How can you copyright something that is not yours?):

Quote:
"Under the Open Source Definition, licenses must meet ten conditions in order to be considered open source licenses. Below is a copy of the definition, with unauthorized explanatory additions. There is a link to the original unmodified text below. It was taken under fair use.

1. Free Redistribution: the software can be freely given away or sold. (This was intended to expand sharing and use of the software on a legal basis.)
2. Source Code: the source code must either be included or freely obtainable. (Without source code, making changes or modifications can be impossible.)
3. Derived Works: redistribution of modifications must be allowed. (To allow legal sharing and to permit new features or repairs.)
4. Integrity of The Author's Source Code: licenses may require that modifications are redistributed only as patches.
5. No Discrimination Against Persons or Groups: no one can be locked out.
6. No Discrimination Against Fields of Endeavor: commercial users cannot be excluded.
7. Distribution of License: The rights attached to the program must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties.
8. License Must Not Be Specific to a Product: the program cannot be licensed only as part of a larger distribution.
9. License Must Not Restrict Other Software: the license cannot insist that any other software it is distributed with must also be open source.
10. License Must Be Technology-Neutral: no click-wrap licenses or other medium-specific ways of accepting the license must be required."
/Quote

Under that definition, Jess is NOT Open Source. However, CLIPS probably gets in just under the wire. As do many other free, Open Source projects.

All of that being said to say this: My deepest and most humble apologies to anyone who has been offended by my suggestion that Jess is Open Source. But I'm still right - I'm just apologizing for offending. :-) In other words, I didn't do it and I promise never to do it again.

SDG
jco