A BRMS (Business Rule Management System) is, after all, a rulebased system that has evolved into what we now refer to as a Decision Manager or Business Decision Manager when applied to business systems. Over the years we have tried to establish a set of benchmarks that will allow the users to test various systems for speed and efficiency on different, complex problems. The two most famous tests are the Miss Manners test and the Waltz benchmark. The original five OPS (Official Production System) benchmarks can be found at ftp://ftp.cs.utexas.edu/pub/ops5-benchmark-suite/ and will run on most any platform. All major BRMS vendors have written the code for their particular language syntax for Miss Manners, Waltz and/or WaltzDB. In addition, Dr. Forgy and I have written the NP Hard benchmarks for several systems.
The Miss Manners OPS benchmark originated about 15
years ago with OPS5 and CLIPS languages.
It’s a relatively simple rulebase with only eight rules that will do a
depth-first search to find a solution and the program comes with a data
generator. The idea is that Miss Manners
has invited 16, 32, 64 or 128 guests with various hobbies to a dinner
party. She want to seat the guests in
boy-girl-boy-girl arrangement so that each guest will have someone on the left
or right that has a common hobby. Back
in 1979 the Manners 128 program took 5,838.5 cpu seconds to run on a SPARC
1+. Today, due to massive improvements
in hardware, this one runs in about 1.5 seconds. The data for Miss Manners should have X
number of guests, 2 or 3 hobbies from a possibility of 5 hobbies and the guests
must be equal number of male and female to be seated M-F-M-F etc.
It becomes more complex with the number of hobbies and
number of guests. The original test was
written to really “stress” any rulebase but some vendors found the trick of putting
a single “not” statement in one of the rules that would make it run 15 or 20
times faster. Without that “trick” it’s
a very good measure of how fast a system will run on any give platform and CPU. Another “trick” is to re-arrange the data so
that the rules will run faster because the benchmark is data-sensitive.
The
Waltz OPS benchmark is another oldie but a goodie that will really stress a
rulebase system because it checks to see how well the rulebase does pattern
matching. Consisting of 32 rules, it
will analyze the lines of a two-dimensional drawing and label them as if they
were edges in a three-dimensional object.
The Waltz benchmark also comes with a data-generator and is much harder
to cheat with than the Miss Manners benchmark.
Waltz comes with a C program for generating data for any number of
regions; 12, 25, 37, 50 or even 200. UT
maintains the object C files for convenience at /ops5c/lib/libops5c.a and a
math library that is used with the benchmarks.
The SPARC 1+ time for Waltz-50 was 3,831.8 cpu seconds. Today’s benchmarks are between 0.2 to 1.9
seconds at worst.
The WaltzDB OPS benchmark that, like Waltz, labels the lines in a 2D drawing in
order to assign configure a 3D object.
The change is that WaltzDB can handle drawings with junctions of four or
six lines while Waltz does junctions of only two or three. WaltzDB only has 35 rules but its data sets
have many, many more junctions. WaltzDB
also has its own data generator, waltzydb.c, and it also needs to be
compiled. I have run tests using 4, 8,
12, 16 and even 200 regions. The WaltzDB
200 is the most difficult of all. 16
regions on the old SPARC 1+ took 8,033.3 cpu seconds but today it takes about
0.5 seconds. The WaltzDB 200 takes only
about 10 – 15 seconds on most systems but can take 2 seconds or less when
running Rete-NT, the latest incarnation of the Rete Algorithm.
The A.R.P. OPS benchmark is program is
an Aeronautical Route Planner that will plot a course across a given territory
from P1 to P2 for a airplane or CRUISE missile.
There is a dataset generator that asks about 40 questions and generates
a file called rav-sceneXxYxZ.dat where X, Y and Z are the 3D coordinates from
the input data. There is a sample list
of questions in the README file at UT.
This benchmark is unique in that there are two files that have to be
loaded, “filename”.dat and “arp-rp-makes”.
The best time for the A.R.P. benchmark on a SPARC 1+ with 10x20x30 data
set was 1,220.2 cpu seconds.
The Weaver OPS benchmark is a
combination of several expert systems that communicate through a common
blackboard, or maybe a whiteboard today.
The “practical” application for this system is to design a VLSI (Very
Large Scale Integrated) chip design, something that a chip manufacturer such as
AMD or Intel face every day. Far more
detail is provided in the README file.
The best time for the Weaver benchmark on a SPARC 1+ was 1,053.7 cpu
seconds.
The
next two benchmarks come from a series of benchmarks known as “NP complete” benchmarks
where NP stands for Non-deterministic
Polynomial-time. We have started using these this year, (1Q2014) since we
have found that Manners and/or Waltz to be either 1) easy to cheat or 2) that
the benchmark fires only one or two rules over and over. Manners is guilty on both accounts. So, this year we have include both the Clique Problem and the Vertex Cover Problem for
starters. Later we can expand this to
other NP
Complete problems.
Either
of these problems can be converted to Java or C syntax but, for starters, I plan on
implementing these in Drools, Jess, CLIPS, Smarts, ODM and Blaze Advisor. That should be enough for comparisons for
this year. Dr. Forgy has been kind
enough to have already provided the code for these two NP-Complete problems in OPS syntax that we should
be able to convert to Java, C/C++ or C#. Or BASIC for that matter.
That pretty much covers the 5-W's of good journalistic articles. The HOW (5W+H) part is not the most difficult. Our plans are to provide the NP-Complete benchmarks, along with the first three UT
benchmarks, for our talk at Decision
Camp 2014 to be held in San Jose this November. I hope to see many of you there since
registration is, again, free thanks to Sparkling Logic and eBay.
Shalom
Yaakov