Skip to content

drfsc

License: MIT

PyPI version: 0.0.6

An open-source library for a distributed randomised feature selection and classification algorithm.

Code

drfsc

Authors and Contributors

Mark Chiu Chong Aida Brankovic

Overview

drfsc is an open-source Python implementation of the Distributed Randomised Feature Selection algorithm for Classification problems (D-RFSC) [2]. Beside addressing some of the shortcomings of the conventional FS method, its good performance has previously been shown on a range of benchmark datasets. However, to date no Python implementation is available. drfsc offers an easy to use, parallelized probabilistic population-based feature selection scheme that is flexible and can be adapted to a wide range of binary classification problems and is particularly useful for large data problems where model interpretability and model explainability is of high importance. It also allows for the specification of user-defined values of initial inclusion probabilities, hence incorporating expert domain knowledge. It provides modules for model fitting, evaluation, and visualization. Tutorial notebooks are provided to demonstrate the use of the package.

Installation

The easiest way to install is from PyPI: just use

pip install drfsc

License

We invite anyone interested to use and modify this code under a MIT license.

Dependencies

drfsc depends on the following packages:

References

The package has been developed based on research that came out at the Polytechnical University of Milan. The interested reader is referred to [2] for details related to the distribution procedure, and to [1] for a more thorough mathematical overview and for experimental comparisons to various alternate feature selection methods.

[1] Brankovic, A., Falsone, A., Prandini, M., Piroddi, L. (2018). A feature selection and classification algorithm based on randomized extraction of model populations

[2] Brankovic, A., Piroddi, L. (2019). A distributed feature selection scheme with partial information sharing.

Citations

This package is developed in CSIRO’s Australian e-Health Research Centre. If you use drfsc package in your research we would appreciate a citation to the appropriate paper(s):

  • For general use of drfsc package you can read/cite original article.
  • For information/use of the Randomised Feature Selection and classification concept you can read/cite original article [1].
  • For information/use of the Distributed Feature Selection architecture with partial information you can read/cite original article [2].