Thursday, June 18, 2009
More on Benchmarks for 2009
Just another post to TRY and get some discussion started on benchmarks. All I seem to get from the vendors is, "Those old things? Those are not 'real-world' benchmarks!" OK, no argument here... So, does anyone have a "real-world" benchmark? No? Fine, then we will use what we have and hope for the best since they seem to work fine on checking the rule processing performance (using almost "real-world" rules) on most engines.
Checking performance on a modeling tool, such as Corticon, Visual Rules, Rule Burst, VisiRules, any Decision Table or Decision Tree, is absolutely NOT what checking rule processing power is all about. [I'll probably catch some flak on that remark but those just my personal feelings.] If you already have the rules hard-coded using either straight-up Java or some kind of modeling tool that produces Java code (such as sequential rules or something else along those lines) then that SHOULD run faster than a real rulebase engine that is designed from the ground up to be an inferencing engine based on whatever algorithm you like.
However, I would like this year to allow a couple of things that I have not done in the past: (1) Allow a "warm-up" time of maybe three or four code passes through all of the rules using different date each time and then running the rules for 10 consecutive passes using 10 different sets of data and taking the average time for the benchmark time. In years past, rules did not run under EJB/J2EE or similar environments (we had Java for several years before we had J2EE/EJB) and we did not allow such things. However, with the increased overhead of having to have that in the core part of the engine I think that it should be allowed. (2) I'm going to drop the old version of Miss Manners 8, 16, 32, 64, 128 and 256 and substitute Miss Manners 2009 - which is the ORF example for this year. (3) The other two benchmarks from the old days are still good, Waltz-50 and WaltzDB-16. (4) However, we are introducing a new WaltzDB-200 this year just to really get some long lead times. (5) We will run these all on the following systems
a1. Mac Core2Duo, 3GB RAM, OS X Leopard, 64-bit (which is Free BSD Unix with a pretty face)
a2. Mac Dual-Quad Core, 8GB RAM, OS X White Leopard, 64-bit [maybe...]
b. HP Intel, 3GB RAM, Dual Threaded, Windows XP, 32-bit
c. Dell Intel i7, 4-core, 8-threads, 6GB RAM, Windows Vitria 64-bit
I might try to work in some Linux if there seems to be any significant speed differentiation on an Intel running Linux or Windows - but experience teaches that usually the Windows version runs faster. But I will check it anyway just to be sure. (6) The systems that I am hoping to check will be (alphabetical order)
a. Blaze Advisor 6.7
b. Drools Version 5.x
c. CLIPS Version 3.0b
d. Jess 7.0-p2
e. JRules Version 6.7.3 or 7.x (depends...)
f. OPSJ Version 6.0
g. OPSJ-NT Version 1.0
Probably, I will publish the results here, along with the previous years of Performance benchmarks, as well as on the KBSC home page. The comparisons of 32-bit and 64-bit should tell us something about scalability. The comparisons of different OS should tell us something about scalability and transportability.
One more thing: If any of the other vendors can demonstrate a suitable version of the benchmarks I will include them - but NOT the same thing that I did a few years ago when I allowed a "similar" version of the benchmarks to be used by a vendor that could not code straight-up IF - THEN - ELSE rules using a NOT statement in there somewhere.
I do expect cheating on the part of the vendors. Somehow, I must find a benchmark somewhere that will not allow that so I'll probably throw in a one that has lots of NOT statements in it or something really rude like that. I know that the vendors don't really pay attention to benchmarks any more so I'm hoping that the customers of these and other vendors will stress performance benchmarks to their suppliers as another check of good engineering. Layering GUI after GUI after Model after Model is cool EXCEPT when you forget how to perform under the pressure of millions of transactions per day that need complex, real rulebase-type analysis.