so4gp.algorithms.grad_pfs.GradPFS¶

class GradPFS(data_src, min_score=0.75, target_col=None)[source]¶

GradPFS is a filter-based algorithm for performing univariate or/and multivariate feature selection through gradual patterns for regression tasks. This algorithm is published in:

Parameters:

data_src (str | DataFrame) – [required] the data in a CSV file or Pandas DataFrame.
min_score (float) – [optional] user-specified minimum correlation score for filtering redundant features, default=0.75.
target_col (int | None) – [optional] user-specified target column index, default=None.

>>> import pandas
>>> from so4gp.algorithms.grad_pfs import GradPFS
>>>
>>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]]
>>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses'])
>>>
>>> fs_obj = GradPFS(data_src=dummy_df)
>>> gp_cor = fs_obj.univariate_fs()
>>> fs_obj.generate_pdf_report(fs_type='U')
>>>
>>> # fs_obj.target_col = 2
>>> # m_fs = fs_obj.multivariate_fs()
>>> print(gp_cor)
          Age  Salary  Cars  Expenses
Age       1.0     0.6  -0.4      -1.0
Salary    0.6     1.0  -0.3      -0.6
Cars     -0.4    -0.3   1.0       0.4
Expenses -1.0    -0.6   0.4       1.0

__init__(data_src, min_score=0.75, target_col=None)[source]¶

An algorithm based on the filter method for performing univariate or/and multivariate feature selection through gradual patterns for regression tasks (not suitable for classification tasks). The results are returned as a Pandas DataFrame.

Parameters:

data_src (str | DataFrame) – [required] the data in a CSV file or Pandas DataFrame.
min_score (float) – [optional] user-specified minimum correlation score for filtering redundant features, default=0.75.
target_col (int | None) – [optional] user-specified target column index, default=None.

>>> import pandas
>>> from so4gp.algorithms.grad_pfs import GradPFS
>>>
>>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]]
>>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses'])
>>>
>>> fs_obj = GradPFS(data_src=dummy_df)
>>> gp_cor = fs_obj.univariate_fs()
>>> fs_obj.generate_pdf_report(fs_type='U')
>>>
>>> # fs_obj.target_col = 2
>>> # m_fs = fs_obj.multivariate_fs()
>>> print(gp_cor)
          Age  Salary  Cars  Expenses
Age       1.0     0.6  -0.4      -1.0
Salary    0.6     1.0  -0.3      -0.6
Cars     -0.4    -0.3   1.0       0.4
Expenses -1.0    -0.6   0.4       1.0

Methods

`__init__`(data_src[, min_score, target_col])	An algorithm based on the filter method for performing univariate or/and multivariate feature selection through gradual patterns for regression tasks (not suitable for classification tasks).
`find_redundant_features`(corr_arr, thd_score)	A method that identifies features that are redundant using their correlation score.
`find_similar`(corr_set, cor_arr)	A method that searches a correlation matrix for a specific set of features.
`generate_pdf_report`([fs_type])	A method that executes GradPFS algorithm for either Univariate Feature Selection ('U') or Multivariate Feature Selection ('M') and generates a PDF report.
`generate_table`(title, data, col_width[, ...])	A method that represents data in a table format using the matplotlib library.
`multivariate_fs`([algorithm])	A method that runs the multivariate GradPFS feature selection algorithm.
`univariate_fs`()	A method that runs the univariate GradPFS feature selection algorithm.

Attributes

`data_src`	str \| pd.DataFrame
`file_path`	str
`thd_score`	float
`target_col`	int \| None
`titles`	list \| None
`data`	list \| None