Thimk: February 2014

Tuesday, February 4, 2014

Benchmarks: Who, What, Where When and Why?

Greetings:

A BRMS (Business Rule Management System) is, after all, a rulebased system that has evolved into what we now refer to as a Decision Manager or Business Decision Manager when applied to business systems. Over the years we have tried to establish a set of benchmarks that will allow the users to test various systems for speed and efficiency on different, complex problems. The two most famous tests are the Miss Manners test and the Waltz benchmark. The original five OPS (Official Production System) benchmarks can be found at ftp://ftp.cs.utexas.edu/pub/ops5-benchmark-suite/ and will run on most any platform. All major BRMS vendors have written the code for their particular language syntax for Miss Manners, Waltz and/or WaltzDB. In addition, Dr. Forgy and I have written the NP Hard benchmarks for several systems.

The Miss Manners OPS benchmark originated about 15 years ago with OPS5 and CLIPS languages. It’s a relatively simple rulebase with only eight rules that will do a depth-first search to find a solution and the program comes with a data generator. The idea is that Miss Manners has invited 16, 32, 64 or 128 guests with various hobbies to a dinner party. She want to seat the guests in boy-girl-boy-girl arrangement so that each guest will have someone on the left or right that has a common hobby. Back in 1979 the Manners 128 program took 5,838.5 cpu seconds to run on a SPARC 1+. Today, due to massive improvements in hardware, this one runs in about 1.5 seconds. The data for Miss Manners should have X number of guests, 2 or 3 hobbies from a possibility of 5 hobbies and the guests must be equal number of male and female to be seated M-F-M-F etc.

It becomes more complex with the number of hobbies and number of guests. The original test was written to really “stress” any rulebase but some vendors found the trick of putting a single “not” statement in one of the rules that would make it run 15 or 20 times faster. Without that “trick” it’s a very good measure of how fast a system will run on any give platform and CPU. Another “trick” is to re-arrange the data so that the rules will run faster because the benchmark is data-sensitive.

The Waltz OPS benchmark is another oldie but a goodie that will really stress a rulebase system because it checks to see how well the rulebase does pattern matching. Consisting of 32 rules, it will analyze the lines of a two-dimensional drawing and label them as if they were edges in a three-dimensional object. The Waltz benchmark also comes with a data-generator and is much harder to cheat with than the Miss Manners benchmark. Waltz comes with a C program for generating data for any number of regions; 12, 25, 37, 50 or even 200. UT maintains the object C files for convenience at /ops5c/lib/libops5c.a and a math library that is used with the benchmarks. The SPARC 1+ time for Waltz-50 was 3,831.8 cpu seconds. Today’s benchmarks are between 0.2 to 1.9 seconds at worst.

The WaltzDB OPS benchmark that, like Waltz, labels the lines in a 2D drawing in order to assign configure a 3D object. The change is that WaltzDB can handle drawings with junctions of four or six lines while Waltz does junctions of only two or three. WaltzDB only has 35 rules but its data sets have many, many more junctions. WaltzDB also has its own data generator, waltzydb.c, and it also needs to be compiled. I have run tests using 4, 8, 12, 16 and even 200 regions. The WaltzDB 200 is the most difficult of all. 16 regions on the old SPARC 1+ took 8,033.3 cpu seconds but today it takes about 0.5 seconds. The WaltzDB 200 takes only about 10 – 15 seconds on most systems but can take 2 seconds or less when running Rete-NT, the latest incarnation of the Rete Algorithm.

The A.R.P. OPS benchmark is program is an Aeronautical Route Planner that will plot a course across a given territory from P1 to P2 for a airplane or CRUISE missile. There is a dataset generator that asks about 40 questions and generates a file called rav-sceneXxYxZ.dat where X, Y and Z are the 3D coordinates from the input data. There is a sample list of questions in the README file at UT. This benchmark is unique in that there are two files that have to be loaded, “filename”.dat and “arp-rp-makes”. The best time for the A.R.P. benchmark on a SPARC 1+ with 10x20x30 data set was 1,220.2 cpu seconds.

The Weaver OPS benchmark is a combination of several expert systems that communicate through a common blackboard, or maybe a whiteboard today. The “practical” application for this system is to design a VLSI (Very Large Scale Integrated) chip design, something that a chip manufacturer such as AMD or Intel face every day. Far more detail is provided in the README file. The best time for the Weaver benchmark on a SPARC 1+ was 1,053.7 cpu seconds.

The next two benchmarks come from a series of benchmarks known as “NP complete” benchmarks where NP stands for Non-deterministic Polynomial-time. We have started using these this year, (1Q2014) since we have found that Manners and/or Waltz to be either 1) easy to cheat or 2) that the benchmark fires only one or two rules over and over. Manners is guilty on both accounts. So, this year we have include both the Clique Problem and the Vertex Cover Problem for starters. Later we can expand this to other NP Complete problems.

Either of these problems can be converted to Java or C syntax but, for starters, I plan on implementing these in Drools, Jess, CLIPS, Smarts, ODM and Blaze Advisor. That should be enough for comparisons for this year. Dr. Forgy has been kind enough to have already provided the code for these two NP-Complete problems in OPS syntax that we should be able to convert to Java, C/C++ or C#. Or BASIC for that matter.

That pretty much covers the 5-W's of good journalistic articles. The HOW (5W+H) part is not the most difficult. Our plans are to provide the NP-Complete benchmarks, along with the first three UT benchmarks, for our talk at Decision Camp 2014 to be held in San Jose this November. I hope to see many of you there since registration is, again, free thanks to Sparkling Logic and eBay.

Shalom

Yaakov

Rules of the Blog Comments

FIRST: Beginning 1 Nov 2017, this page will no longer be technical only - I will comment on anything that happens to cross my mind that particular day.
SECOND: I have always felt that FaceBook was for teenagers and not for real comments and bloggers.
THIRD: Tweets are for tweens, not for real comments since (initially) comments could be only 148 characters. If your technical comment was only 148 characters, who cares?
FINALLY: Anyone can comment - even anonymous comments are allowed. No comment with crude, base or vulgar language will be published.No comment about rulebased systems that displays total ignorance of rulebased systems will be published - I will answer in a private email if your email is attached.
PS: All comments are encouraged because that (usually) just makes for a better article since I usually re-write an article several times after a few "on the mark" comments are made.

Thimk

Tuesday, February 4, 2014

Benchmarks: Who, What, Where When and Why?

Rules of the Blog Comments

Blog Archive

Free University On-Line

Links of Interest

My Blog List

Heritage

Thimk

Tuesday, February 4, 2014

Benchmarks: Who, What, Where When and Why?

Rules of the Blog Comments

Subscribe To JavaRules

Blog Archive

Free University On-Line

Links of Interest

My Blog List

Heritage