GritBot icon Notes from Previous Releases

Release 2.01

Modified heuristics
The heuristics that guide the search for anomalies have been polished, as have the heuristics that are used to weed out false positives. These changes principally affect larger datasets containing hundreds of thousands of cases.

(Slightly) improved checking of new cases
GritBot can save its analysis so that new cases from the same application can be quickly screened for possible anomalies. Release 2.01 includes some additional tests for irregularities that did not arise in the original data.

Many of the computation-intensive aspects of GritBot are now multi-threaded, so GritBot can use up to four processors. As a result, GritBot 2.01 will run significantly faster on the newer dual-core and quad-core computers.

Release 1.06

Release 1.06 processes larger datasets quite a bit faster than 1.05. A mid-sized application (100,000 cases, 15 attributes) should be finished in about three-quarters of the time required by the previous release.

Bug fix
Release 1.05 introduced the re-use of a saved analysis to inspect new cases. Unfortunately, the results could be incorrect if the application used defined attributes with discrete values. This bug has been corrected in 1.06.

Release 1.05

Inspecting new data
The most important change in the new release is the ability to save an analysis for later use. By default, GritBot writes a sift file that collects all the checks that were made while analyzing the data. The sift file allows GritBot to apply the same checks to new data in a cases file; checking new data in this way is much faster than analyzing it ab initio.

The sift file can be quite large, so an option is provided to prevent GritBot from saving this information.

Option to control the number of anomalies reported
Since large datasets (hundreds of thousands of cases) can produce many possible anomalies, a new option restricts the number of them reported. If this option is invoked, the actual number of potential anomalies found by GritBot is still shown, but only the most interesting are displayed.

Generating lists of possible anomalies
Another option generates a simple list of case numbers of possible anomalies found. The list can be used for follow-up actions such as separating possible anomalies from the rest of the data.

Release 1.04

New data type
Timestamps are read and written in the form YYYY-MM-DD HH:MM:SS using a 24-hour clock. (Recall that GritBot already has data types for times and for dates.) A timestamp is rounded to the nearest minute and implicitly defined attributes can be used to compute functions of timestamps such as the number of minutes between two of them.

Improved detection and screening of possible anomalies
Release 1.04 incorporates improved mechanisms for detecting potential anomalies and for filtering out those that may well be spurious. For example, Release 1.04 checks more carefully for inappropriate N/A ("not applicable") values in the data. The minimum number of cases in a subset that contains a potential anomaly has been increased to the larger of 35 or 0.5% of the data.

Faster checking
GritBot is now considerable faster for larger datasets containing tens of thousands of records. This has been achieved by the selective use of sampling to estimate important properties of subsets of cases.

Release 1.03

Selective checking
By default, GritBot examines the values of all attributes in its search for possible anomalies. Release 1.03 enables the user to indicate (in the .names file) the attributes that GritBot should examine. This can speed up checking and can also restrict the possible anomalies reported to those more likely to be of interest.

Time attributes
An attribute declared to be a `time' takes values in the form HH:MM:SS. As with dates, attributes defined by formulas can subtract one time from another to give an interval (in seconds).

Faster checking
GritBot now processes large datasets much more quickly. For example, a dataset of almost a quarter of a million records with more than two hundred attributes takes Release 1.02 about 30 hours on a 500MHz PC; on the same machine Release 1.03 completes the job in four hours.

More effective checking
Some core heuristics of GritBot have been tuned to improve the detection of possible anomalies. At the same time, better `sanity checks' remove more potential false positives.

Release 1.02

More extensive filtering
GritBot finds anomalies by identifying subsets of cases in which one value for one case stands out. As a further filter, GritBot now checks that all other values for the highlighted case are compatible with other values in the set. The rationale for this is that two anomalous values might be mutually self-explanatory.

Batch mode
The Windows version now includes GritBotX, a batch mode version of GritBot.

Data formats
Two extensions of the data description language have been incorporated in Release 1.02.
  • A new value `N/A' can be used when the value of an attribute is not relevant to a case.
  • Dates may now be written as either YYYY/MM/DD or YYYY-MM-DD.

Improved error messages
Errors in your .data or .names files are now pinned down to a specific line number in the relevant file. Of course, if your data never contain errors, you won't notice this ...

© RULEQUEST RESEARCH 2015 Last updated September 2015

home products licensing download contact us