so4gp.algorithms.grad_pfs.GradPFS

class GradPFS(data_src, min_score=0.75, target_col=None)[source]

GradPFS is a filter-based algorithm for performing univariate or/and multivariate feature selection through gradual patterns for regression tasks. This algorithm is published in:

Parameters:
  • data_src (str | DataFrame) – [required] the data in a CSV file or Pandas DataFrame.

  • min_score (float) – [optional] user-specified minimum correlation score for filtering redundant features, default=0.75.

  • target_col (int | None) – [optional] user-specified target column index, default=None.

>>> import pandas
>>> from so4gp.algorithms.grad_pfs import GradPFS
>>>
>>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]]
>>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses'])
>>>
>>> fs_obj = GradPFS(data_src=dummy_df)
>>> gp_cor = fs_obj.univariate_fs()
>>> fs_obj.generate_pdf_report(fs_type='U')
>>>
>>> # fs_obj.target_col = 2
>>> # m_fs = fs_obj.multivariate_fs()
>>> print(gp_cor)
          Age  Salary  Cars  Expenses
Age       1.0     0.6  -0.4      -1.0
Salary    0.6     1.0  -0.3      -0.6
Cars     -0.4    -0.3   1.0       0.4
Expenses -1.0    -0.6   0.4       1.0
__init__(data_src, min_score=0.75, target_col=None)[source]

An algorithm based on the filter method for performing univariate or/and multivariate feature selection through gradual patterns for regression tasks (not suitable for classification tasks). The results are returned as a Pandas DataFrame.

Parameters:
  • data_src (str | DataFrame) – [required] the data in a CSV file or Pandas DataFrame.

  • min_score (float) – [optional] user-specified minimum correlation score for filtering redundant features, default=0.75.

  • target_col (int | None) – [optional] user-specified target column index, default=None.

>>> import pandas
>>> from so4gp.algorithms.grad_pfs import GradPFS
>>>
>>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]]
>>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses'])
>>>
>>> fs_obj = GradPFS(data_src=dummy_df)
>>> gp_cor = fs_obj.univariate_fs()
>>> fs_obj.generate_pdf_report(fs_type='U')
>>>
>>> # fs_obj.target_col = 2
>>> # m_fs = fs_obj.multivariate_fs()
>>> print(gp_cor)
          Age  Salary  Cars  Expenses
Age       1.0     0.6  -0.4      -1.0
Salary    0.6     1.0  -0.3      -0.6
Cars     -0.4    -0.3   1.0       0.4
Expenses -1.0    -0.6   0.4       1.0

Methods

__init__(data_src[, min_score, target_col])

An algorithm based on the filter method for performing univariate or/and multivariate feature selection through gradual patterns for regression tasks (not suitable for classification tasks).

find_redundant_features(corr_arr, thd_score)

A method that identifies features that are redundant using their correlation score.

find_similar(corr_set, cor_arr)

A method that searches a correlation matrix for a specific set of features.

generate_pdf_report([fs_type])

A method that executes GradPFS algorithm for either Univariate Feature Selection ('U') or Multivariate Feature Selection ('M') and generates a PDF report.

generate_table(title, data, col_width[, ...])

A method that represents data in a table format using the matplotlib library.

multivariate_fs([algorithm])

A method that runs the multivariate GradPFS feature selection algorithm.

univariate_fs()

A method that runs the univariate GradPFS feature selection algorithm.

Attributes

data_src

str | pd.DataFrame

file_path

str

thd_score

float

target_col

int | None

titles

list | None

data

list | None