so4gp.algorithms.grad_pfs.GradPFS¶
- class GradPFS(data_src, min_score=0.75, target_col=None)[source]¶
GradPFS is a filter-based algorithm for performing univariate or/and multivariate feature selection through gradual patterns for regression tasks. This algorithm is published in:
- Parameters:
data_src (str | DataFrame) – [required] the data in a CSV file or Pandas DataFrame.
min_score (float) – [optional] user-specified minimum correlation score for filtering redundant features, default=0.75.
target_col (int | None) – [optional] user-specified target column index, default=None.
>>> import pandas >>> from so4gp.algorithms.grad_pfs import GradPFS >>> >>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]] >>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses']) >>> >>> fs_obj = GradPFS(data_src=dummy_df) >>> gp_cor = fs_obj.univariate_fs() >>> fs_obj.generate_pdf_report(fs_type='U') >>> >>> # fs_obj.target_col = 2 >>> # m_fs = fs_obj.multivariate_fs() >>> print(gp_cor) Age Salary Cars Expenses Age 1.0 0.6 -0.4 -1.0 Salary 0.6 1.0 -0.3 -0.6 Cars -0.4 -0.3 1.0 0.4 Expenses -1.0 -0.6 0.4 1.0
- __init__(data_src, min_score=0.75, target_col=None)[source]¶
An algorithm based on the filter method for performing univariate or/and multivariate feature selection through gradual patterns for regression tasks (not suitable for classification tasks). The results are returned as a Pandas DataFrame.
- Parameters:
data_src (str | DataFrame) – [required] the data in a CSV file or Pandas DataFrame.
min_score (float) – [optional] user-specified minimum correlation score for filtering redundant features, default=0.75.
target_col (int | None) – [optional] user-specified target column index, default=None.
>>> import pandas >>> from so4gp.algorithms.grad_pfs import GradPFS >>> >>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]] >>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses']) >>> >>> fs_obj = GradPFS(data_src=dummy_df) >>> gp_cor = fs_obj.univariate_fs() >>> fs_obj.generate_pdf_report(fs_type='U') >>> >>> # fs_obj.target_col = 2 >>> # m_fs = fs_obj.multivariate_fs() >>> print(gp_cor) Age Salary Cars Expenses Age 1.0 0.6 -0.4 -1.0 Salary 0.6 1.0 -0.3 -0.6 Cars -0.4 -0.3 1.0 0.4 Expenses -1.0 -0.6 0.4 1.0
Methods
__init__(data_src[, min_score, target_col])An algorithm based on the filter method for performing univariate or/and multivariate feature selection through gradual patterns for regression tasks (not suitable for classification tasks).
find_redundant_features(corr_arr, thd_score)A method that identifies features that are redundant using their correlation score.
find_similar(corr_set, cor_arr)A method that searches a correlation matrix for a specific set of features.
generate_pdf_report([fs_type])A method that executes GradPFS algorithm for either Univariate Feature Selection ('U') or Multivariate Feature Selection ('M') and generates a PDF report.
generate_table(title, data, col_width[, ...])A method that represents data in a table format using the matplotlib library.
multivariate_fs([algorithm])A method that runs the multivariate GradPFS feature selection algorithm.
univariate_fs()A method that runs the univariate GradPFS feature selection algorithm.
Attributes
data_srcstr | pd.DataFrame
file_pathstr
thd_scorefloat
target_colint | None
titleslist | None
datalist | None