Monday, December 29, 2008

Benchmarks 2009

Greetings:

As is my wont (don't you just love Old English?) I will be running my annual "First Quarter - No Quarter Given" benchmarks beginning in January of 2009. Right now I'm limited to existing benchmarks of Waltz-50, WaltzDB-16 (almost all Rete-engine vendors) and some of them for the WaltzDB-200, a new version of WaltzDB-16 but using 200 variations rather than 16 - that should prove interesting.

Also, the platforms will be limited to Mac OS X running on a Dual G5 with 4GB of RAM and a Core2Duo running with 3GB of RAM. I have beefed up the Windows-32bit XP (Dual Threaded single CPU) machine to 3GB of RAM just to be able to run certain software that is incompatible with either Mac (Unix) or Windows Vitria-64-bit OS on i7 CPU.

IF, and let me emphasize the IF, I can get the time I will have benchmarks for Decision Table (only?) engines with a 10K and 100K Telecom Benchmark that will do nothing more than show off processing power of single-row data validation. So far, the DT vendors have not been very helpful in coming up with a benchmark of their own this past year so I pretty much label all of the a 5 in terms of performance - meaning that it's neither good nor bad; pretty much an unknown. Mostly because my editor won't all me to give them a zero. :-)

Also, we might do some of what Peter Lin and/or Mark Proctor suggested in the way of "micro-benchmarks" that would remove any level of cheating. If we throw in Gary Riley's Sudoku and/or the 64-Queens problem, we'll have something else that is not actually business related but will give some indication of engine performance.

The benchmarks will be

1. Waltz-50
2. WaltzDB-16
3. WaltzDB-200
4. 10K Telecom
5. 100K Telecom
6. MicroBenchmarks
7. Sudoku
8. 64-Queens

The classes of vendors will be

1. Rete-based engines, internal objects (CLIPS (?), JRules, Advisor, Jess, Drools, etc.)
2. Rete-based engines, external objects (CLIPS, JRules, Advisor, Jess, Drools, etc.)
2. Compiled Java Engines (Visual Rules, OPSJ, JRules, Advisor, Drools, etc.)
3. Sequential Compiled Java Engines (Visual Rules, JRules, Advisor, Drools(?), et al)
4. Decision Table Vendors (Corticon, Visual Rules, Haley Office System, VisiRules, etc. but could include JRules, Advisor and Drools)

Folks, that's a LOT of work for one little old Texas boy unless I can find someone independent to help AND if I can get some help from the vendors writing these benchmarks to be checked by myself and any independent help that I can get. If you want to help (and thereby ensure your name be placed with the other immortals of rulebase benchmarking) send me your name and we'll get you started.

Remember, to help with the overall project, you MUST be independent and NOT working for any of the vendors that are being tested. (You can be working on any vendors project as long as you are being paid by the client and NOT by the vendor.) To help with the project from a vendor point of view, all I need is the code for all of the tests in the appropriate syntax for that vendor. I (we) still have to read it and verify that nobody cheated but that should be really helpful and will be duly noted in the tables that will be published.

Maybe, just maybe, (no agreement yet) InfoWorld or some other equally high-visibility journal will be willing to publish these benchmarks in the form of an article of some kind. Otherwise, it will be just another blog on benchmarks. :-)

SDG
jco

9 comments:

dleskov said...

Which JVM(s) are you going to use? I'd be most interested to learn how Excelsior JET performs.

James Owen said...

Greetings:

I'll be using the latest allowable for each vendor for that particular tool. For example, for ILOG JRules we'll use JDK 1.6 for Windows and for the Mac (on JRules 6.7.1 - the latest that will run on the Mac it will have to be JDK 1.5), etc. No vendors have (as yet) approved Excelsior JET but you are, of course, welcome to run your own comparisons and let me know how they did. I'll post your results with my own on this blog and, probably, on the ExSCG blog.

All of them are available for download except for OPSJ and Blaze Advisor - you will have to contact those vendors (PST and Fair Isaac) to get an evaluation copy. Also, JRules BRS is approved to run ONLY on Windows XP, not on Vista nor Win2K.

SDG
jco

woolfel said...

the old microbenchmarks I wrote to test jamocha are up on SVN. You can find them here in clips format.

http://jamocha.svn.sourceforge.net/viewvc/jamocha/benchmarks/size/

http://jamocha.svn.sourceforge.net/viewvc/jamocha/benchmarks/joins/

If I have time, I will add some examples for simple compliance rules and check them into SVN.

James Owen said...

Peter:

Thanks - it will be a couple of weeks but I definitely will go back and review them. I'm looking around for something that will be inviolate so that we can check all of the vendors, not just the Rete-based engines. If you have any ideas on those (Corticon, Visual Rules, VisiRules, etc.) let me know. All I have come up with so far is having to write out the requirements for the 10K and 100K Telecom benchmark that was used to evaluate JRules and Blaze Advisor way back in 2000. Not a bad one but it sucks if you're trying to check Rete performance.

SDG
jco

woolfel said...

with the object type node hashing of the child nodes, if the conditions test for equality (ie slot == "value") performance should be close to constant, so it shouldn't suck for RETE engines. It would suck for RETE implementations that don't implement object type node hashing of the child nodes.

As far as I know, not all decision table rule engines are optimal. Some don't bother generating an optimized decision tree, so the performance will likely vary greatly. Naive implementations that iterate over all the rules will suck obviously. for the record, I've seen commercial compliance engines do it that way and charge 500K to several millions for a license. Even some commercial non-RETE rule engines implement it naively.

James Owen said...

Peter:

The problem is that there is no chaining between rules in the examples to which I have access. If there is no chaining then the Rete algorithm isn't the solution and it becomes more of a procedural problem.

BUT, that's the example that I have. If you have a better one, I would be interested in going over it.

SDG
jco

woolfel said...

the compliance benchmark I'm working on should utilize rule chaining along with fact modifications, which many decision table engines can't handle properly. I've checked in what I have so far in the same benchmark folder.

Richard Truax said...

More a question than a comment --
Is there anyplace that the various benchmarks are specified? I've been Googling for hours and can only find documents from 2006 that talk about how inadequate some of them are.

dleskov said...

@Richard Truax:

SPECjvm2008 is a good one and is now free. Other SPEC Java benchmarks are commercial.

DaCapo benchmark suite is being actively maintained.

SciMark is a Java benchmark for scientific and numerical computing.

Some more home-made benchmarks.