drfsc
PyPI version: 0.0.6
An open-source library for a distributed randomised feature selection and classification algorithm.
Code
Authors and Contributors
Mark Chiu Chong Aida Brankovic
Overview
drfsc
is an open-source Python implementation of the Distributed Randomised Feature Selection algorithm for Classification problems (D-RFSC) [2]. Beside addressing some of the shortcomings of the conventional FS method, its good performance has previously been shown on a range of benchmark datasets. However, to date no Python implementation is available. drfsc
offers an easy to use, parallelized probabilistic population-based feature selection scheme that is flexible and can be adapted to a wide range of binary classification problems and is particularly useful for large data problems where model interpretability and model explainability is of high importance. It also allows for the specification of user-defined values of initial inclusion probabilities, hence incorporating expert domain knowledge. It provides modules for model fitting, evaluation, and visualization. Tutorial notebooks are provided to demonstrate the use of the package.
Installation
The easiest way to install is from PyPI: just use
pip install drfsc
License
We invite anyone interested to use and modify this code under a MIT license.
Dependencies
drfsc
depends on the following packages:
References
The package has been developed based on research that came out at the Polytechnical University of Milan. The interested reader is referred to [2] for details related to the distribution procedure, and to [1] for a more thorough mathematical overview and for experimental comparisons to various alternate feature selection methods.
[1] Brankovic, A., Falsone, A., Prandini, M., Piroddi, L. (2018). A feature selection and classification algorithm based on randomized extraction of model populations
[2] Brankovic, A., Piroddi, L. (2019). A distributed feature selection scheme with partial information sharing.
Citations
This package is developed in CSIRO’s Australian e-Health Research Centre. If you use drfsc
package in your research we would appreciate a citation to the appropriate paper(s):
- For general use of
drfsc
package you can read/cite original article. - For information/use of the Randomised Feature Selection and classification concept you can read/cite original article [1].
- For information/use of the Distributed Feature Selection architecture with partial information you can read/cite original article [2].