RSAR treats datasets by removing attributes that are unnecessary for a classification task. It performs greedy feature selection using various versions of the QuickReduct algorithm. It is useful in reducing redundancies in nominally-valued (i.e. discrete) datasets for exploration or as a preprocessing step to training machine learning algorithms on the data. This implementation includes a wide range of optimisations that enable it to process extremely large datasets. RSAR is available as source, Debian and RPM packages. It's been successfully compiled on GNU/Linux (x86 and SPARC) and Solaris 7 (SPARC). It does have a preference for GNU-based environments, but will happily build on others.
What Does It Do?
In layman's terms, RSAR simplifies datasets. A dataset categorises things into classes based on some of their properties (called ‘attributes’). For example, the well-known mushroom dataset contains descriptions (size, shape, colour, et cetera) of various species of mushrooms, and whether each is edible or not. An AI program trains on part of the dataset and then attempts to guess the category of other, previously unseen objects. This then forms a measure of its success. The vast majority of such systems suffer greatly when dealing with more data than absolutely necessary.
RSAR reads in a dataset, and removes from it all attributes that are irrelevant
in categorising the item. Other AI systems can then train on this data without
suffering to the same extent. Since most datasets (both research and real-world
ones) contain immense amounts of redundancy, using
rsar as a
pre-processor can be incredibly beneficial — speed-ups of hundreds of times
have been measured.
In many cases, RSAR can offer a benefit to humans too — many simplifications help humans gauge what attributes of a problem are most important. Cutting costs is also an option. Building an AI-based monitoring or diagnostic system is much cheaper if irrelevant quantities aren't measured at all (rather than measured by expensive sensors, then eventually discarded by the AI diagnostic system).
As such, RSAR has been designed to produce output for both human and machine, and offers options for doing so in various ways. Please refer to the bundled man page for more information.
This version of RSAR was developed on my own time for my university research work and is released under the terms of the GNU General Public License, version 2.