RSAR treats datasets by removing attributes that are unnecessary for a classiﬁcation task. It performs greedy feature selection using various versions of the QuickReduct algorithm. It is useful in reducing redundancies in nominally-valued (i.e. discrete) datasets for exploration or as a preprocessing step to training machine learning algorithms on the data. This implementation includes a wide range of optimisations that enable it to process extremely large datasets.
RSAR is available as source, Debian and RPM packages. It's been successfully compiled on GNU/Linux (x86 and SPARC) and Solaris 7 (SPARC). It does have a preference for GNU-based environments, but will happily build on others.
In layman's terms, RSAR simpliﬁes datasets. A dataset categorises things into classes based on some of their properties (called ‘attributes’). For example, the well-known mushroom dataset contains descriptions (size, shape, colour, et cetera) of various species of mushrooms, and whether each is edible or not. An AI program trains on part of the dataset and then attempts to guess the category of other, previously unseen objects. This then forms a measure of its success. The vast majority of such systems suﬀer greatly when dealing with more data than absolutely necessary.
RSAR reads in a dataset, and removes from it all attributes that are irrelevant in categorising the item. Other AI systems can then train on this data without suﬀering to the same extent. Since most datasets (both research and real-world ones) contain immense amounts of redundancy, using
rsar as a pre-processor can be incredibly beneﬁcial — speed-ups of hundreds of times have been measured.
In many cases, RSAR can oﬀer a beneﬁt to humans too — many simpliﬁcations help humans gauge what attributes of a problem are most important. Cutting costs is also an option. Building an AI-based monitoring or diagnostic system is much cheaper if irrelevant quantities aren't measured at all (rather than measured by expensive sensors, then eventually discarded by the AI diagnostic system).
As such, RSAR has been designed to produce output for both human and machine, and oﬀers options for doing so in various ways. Please refer to the bundled man page for more information.
This version of RSAR was developed on my own time for my university research work and is released under the terms of the GNU General Public License, version 2.