OK, here's my final bleat on rulebase engine benchmarks. (Right - YOU believe me, don't you?) Anyway, benchmarks cannot, should not, be done by the vendor but rather by some independent party; a firm that does not have any vested interest in any of the outcomes other than the truth and fairness of play. If a vendor produces the benchmark, then the vendor would have to allow anyone (and by that I mean someone outside of the company, including the competitors of said vendor) to double check those figues, even if it means supplying a time-bombed version of the software to the company or person double checking the facts.
If the benchmarks are produced by a company other than the vendor, the the company should be totally independent of that vendor, someone at "arms length." This means that a company that is a "partner" of the vendor could not produce those benchmarks. However, if the company were a company that simply "used" those products in the normal performance of the business of the company then that would not only be OK but preferred. (Someone like a consulting company that is NOT a partner with any of the rulebase vendors, for example.) However, a company who did NOT use a rulebase in the pursuit of its normal business normally would not be eligible either since that company probably would not have the expertise on-board to discriminate between tests and understand if and when one vendor tries to "cheat" on the tests. And vendors do cheat.
On the subject of cheating, all vendors should be held accountable to the same standards. For example, if one vendor is allowed to use compiled code (in an form except compiling the Java or C or C++ classes themselves) then the other vendors should be allowed to do the same thing. An example of this is the compiling of the rules into Java that is run with a JIT compiler. OPSJ has ONLY this method of running the OPSJ rules. With JRules, Blaze Advisor and others it is an option. Drools and Jess can be run in interpreted mode only and so have put themselves in an unfair position. Corticon and others are NOT based on the OPS format of most benchark rules and they, also, sometimes have an advantage and sometimes a disadvantage. The comany doing the benchmarks has to do everythingl possible to make sure that the playing field is level and, where it isn't, point that out to the readers. (Ergo, the reason for the pure independence of the benchmarking company.)
Also, the company producing the benchmarks should not only make the code and data readily available, but should also produce meaningful benchmarks; meaning that Miss Manners has about run its course and probably should be dropped from contention. It's nice to see the results, but the test itself is way to easy to cheat, whether in the code itself or "under the covers" with optimization aimed strictly at that benchmark. (Yes, some companies have been doing that for years and we, the poor schmucks in the industry, have just now caught on.)
OK, let's assume that my company (KBSC) ran the benchmarks. We're independent. We aren't partners with anyone, much to our financial loss. We do NOT produce any kind of variant of the rulebase in questions. We showed our code. What else can we do to ensure the integrity of our results. Here are some vairants that should be posted with the benchmark:
Machine RAM Total
Machine RAM available (did something else have the machine busy at the time?)
The command line used to run the tests
(the -Xmx and -Xms and -server mean something to the machine)
Operating System and version
Java, C++ or C# version used
Number of "warm up" cycles
J2EE or EJB used
(if so, which one and which version and show setup)
There are lots of variables in running a benchmark. Everyone must be shown because one vendor runs better on one OS and another vendor runs better on another OS. The same thing applies for the Java version used, J2EE, etc. Unfortunately, since I (personally) left the independent arena, lots of folks have jumped into the gap but few can step up to the plate and meet all of those requirements on all of the benchmarks that they produce. (OK, Steve Nunez is trying really hard but Illation is, after all, not only a partner with ILOG but produces their own knowledge software as well.)
How do we fix the independent benchmarking problem? Well, we need an independent agancy. And it must be like UL or ACORD (the de facto global e-commerce standards body for insurance) but designed for testing a rulebase and evaluating a BRMS; and we need it now! So, here's a thought: What IF we (the industry) did the following? We, the industry and customers, form an independent laboratory. Companies (vendor and/or customers) with 1,000 or more employees would contribute $50K annually toward such an organization. Companies with 100 - 999 would contribute $20K annually. Individuals would contribute $100 per year for membership. This would entitle them to the results of any and all tests being run, input to the company for fairness (but they would have to agree that the sole arbiter of any dispute would be the benchmarking company) and monthly reports on what's happening. Stockholders, if you will, but non-voting stockholders.
And the company itself would deal with anything to do with BRMS, including evaluation of products, benchmarks, an annual ranking of products based on various criteria such as speed, integration with other products, technical support, initial costs, professional services effectivenss and their costs and anything else that might impact the customer. Vendors, of course, would be expected to contribute on the same scale as those companies subscribing to the service. (OK, I took a page out of Forrester's book!!)
[Personal grousing time] The reason that I had to leave the pure independent route was that I financially could not survive and do pure research - and nobody in the BRMS market would step up to the plate and provide the wherewithal to be independent and still survive. I even tried (in cahoots with InfoWorld) to form a company that would do what I described above but IW and the reset of the BRMS vendors couldn't see an advantage for them nor the readers. Well, maybe the readers of this blog can see what such an outfit could do to help the poor sod customers by showing them who's who and what's what BEFORE they try to do it themselves.
Who stole my spring?? - After a nice 20C degrees day yesterday, I woke up this morning to this:
3 years ago