so4gp package

class DataGP(data_source, min_sup=0.5, eq=False)[source]

Bases: object

Parameters:
  • data_source (pd.DataFrame | str) – [required] a data source, it can either be a ‘file in csv format’ or a ‘Pandas DataFrame’

  • min_sup (float) – [optional] minimum support threshold, the default is 0.5

  • eq (bool) – [optional] encode equal values as gradual, the default is False

add_gradual_pattern(pattern)[source]

Adds a gradual pattern to the list of gradual patterns.

Parameters:

pattern – A gradual pattern

Return type:

None

classmethod analyze_gps(data_src, min_sup, est_gps, approach='bfs')[source]

For each estimated GP, computes its true support using the GRAANK approach and returns the statistics (% error, and standard deviation).

>>> import so4gp as sgp
>>> import pandas
>>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]]
>>> columns = ['Age', 'Salary', 'Cars', 'Expenses']
>>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses'])
>>>
>>> estimated_gps = list()
>>> temp_gp = sgp.GP()
>>> for gi_str in ['0+', '1-']:
>>>    temp_gp.add_gradual_item(sgp.GI.from_string(gi_str))
>>> temp_gp.support = 0.5
>>> estimated_gps.append(temp_gp)
>>> temp_gp = sgp.GP()
>>> for gi_str in ['1+', '3-', '0+']:
>>>    temp_gp.add_gradual_item(sgp.GI.from_string(gi_str))
>>> temp_gp.support = 0.48
>>> estimated_gps.append(temp_gp)
>>> res = sgp.analyze_gps(dummy_df, min_sup=0.4, est_gps=estimated_gps, approach='bfs')
>>> print(res)
Gradual Pattern       Estimated Support    True Support  Percentage Error      Standard Deviation
['0+', '1-']                       0.5              0.4             25.0%                   0.071
['1+', '3-', '0+']                 0.48             0.6            -20.0%                   0.085
Parameters:
  • data_src (DataFrame | str) – Data set file

  • min_sup (float) – Minimum support (set by user)

  • est_gps (list[GP]) – Estimated GPs

  • approach (str) – ‘Bfs’ (default) or ‘dfs’

Returns:

Tabulated results

Return type:

str

property attr_cols: ndarray
property attr_size: int
static clean_data(df)[source]

Cleans a data-frame (i.e., missing values, outliers) before extraction of GPs

Parameters:

df (pd.DataFrame) – data-frame

Returns:

list (column titles), numpy (cleaned data)

Return type:

tuple[list, ndarray]

clear_gradual_patterns()[source]

Clears the list of gradual patterns.

Return type:

None

property col_count: int
property data: ndarray
property display_patterns: list
property display_patterns_as_df: DataFrame
fit_bitmap(attr_data=None)[source]

Generates bitmaps for columns with numeric objects. It stores the bitmaps in attribute valid_bins (those bitmaps whose computed support values are greater or equal to the minimum support threshold value).

Parameters:

attr_data (np.ndarray | None) – Stepped attribute objects

Returns:

void

Return type:

None

fit_warpingset()[source]

Generates transaction ids (tids) for each column/feature with numeric objects. It stores the tids in attribute valid_tids (those tids whose computed support values are greater or equal to the minimum support threshold value).

The method decomposes the pairwise matrix of a gradual item/pattern into a warping set. Attributes that have strong correlation will produce a warping set with dense zigzag patterns when plotted as a graph. Those with weak correlation will produce a warping set with sparse zigzag patterns.

Return type:

None

static gen_gradual_warping_set(pairwise_mat, as_array=False)[source]

A method that decomposes the pairwise matrix of a gradual item/pattern into a warping set. Attributes that have strong correlation will produce a warping set with dense zigzag patterns when plotted as a graph. Those with weak correlation will produce a warping set with sparse zigzag patterns.

Parameters:
  • pairwise_mat (ndarray) – The pairwise matrix of a gradual item/pattern.

  • as_array (bool) – If True, returns the warping path as a numpy array else as a list of tuples.

Returns:

A list array of the warping path (as an edge list).

Return type:

list[tuple[int, int]] | ndarray

generate_output_files(alg_data, target_col=None, save_to_file=True)[source]

Generates output of results (as files) for the GP mining algorithm.

Parameters:
  • alg_data (dict) – Dictionary of algorithm parameters.

  • target_col (int) – Index of the target column.

  • save_to_file (bool) – If True, saves the output to files.

property gradual_patterns: list | None
static read(data_src)[source]

Reads all the contents of a file (in CSV format) or a data-frame. Checks if its columns have numeric values. It separates its column headers (titles) from the objects.

Parameters:

data_src (pd.DataFrame | str) – A data source, it can either be a ‘file in csv format’ or a ‘Pandas DataFrame’

Returns:

The title, column objects

Return type:

tuple[list, ndarray]

remove_subsets(gi_arr, gradual_patterns=None)[source]

Remove subset GPs from the list.

Parameters:
  • gi_arr (set) – Gradual items in an array

  • gradual_patterns (list[GP] | None) – List of gradual patterns (if None, use the object’s GPs)

Returns:

List of GPs

Return type:

None

property row_count: int
static test_time(date_str)[source]

Tests if a str represents a date-time variable.

Parameters:

date_str (str) – A string

Returns:

bool (True if it is a date-time variable, False otherwise)

Return type:

None | tuple[bool, float] | tuple[bool, bool]

property thd_supp: float
property time_cols: ndarray
property titles: list
property valid_bins: dict | None
property warping_set: dict[str, list] | None
class GI(attr_col, symbol)[source]

Bases: object

Parameters:
  • attr_col (int) – Column index

  • symbol (str) – Variation symbol either “+” or “-”

property as_tuple: tuple[int, str]

The Gradual Item (GI) in tuple format

property attribute_col: int

The column index of a GI

classmethod from_string(gi_str)[source]

Creates a GI from a string like ‘1+’, ‘12-’, or ‘125+’

Parameters:

gi_str (str)

Return type:

GI

static parse_gi(gi_str)[source]

Converts a stringified GI into normal GI. The accepted format is ‘1_neg’ or 1_pos’.

Parameters:

gi_str (str) – A stringified GI

Returns:

GI

Return type:

GI

static swap_gi_symbol(gi_obj)[source]

Inverts a GI symbol to the opposite variation (i.e., from - to +; or, from + to -) :return: inverted GI object

Parameters:

gi_obj (GI)

Return type:

GI

property symbol: str

The variation symbol of a GI

to_string()[source]

Returns a GI in string format :return: string

Return type:

str

class GP[source]

Bases: object

GP (Gradual Pattern). A class that is used to create GP objects. A GP object is a set of gradual items (GI), and its quality is measured by its computed support value. For example, given a data set with 3 columns (age, salary, cars) and 10 objects. A GP may take the form: {age+, salary-} with a support of 0.8. This implies that 8 out of 10 objects have the values of column age ‘increasing’ and column ‘salary’ decreasing.

>>> import so4gp as sgp
>>> gradual_pattern = sgp.GP()
>>> gradual_pattern.add_gradual_item(sgp.GI(0, "+"))
>>> gradual_pattern.add_gradual_item(sgp.GI(1, "-"))
>>> gradual_pattern.support = 0.5
>>> print(f"{gradual_pattern.to_string()}: {gradual_pattern.support}")
add_gradual_item(item)[source]

Adds a gradual item (GI) into the gradual pattern (GP) :param item: gradual item

Returns:

True if gradual item is added, None otherwise

Parameters:

item (GI)

Return type:

bool

property as_set: set[str]

{‘1+’, ‘2-‘}

Type:

Returns the gradual pattern (GP) as a set of strings

property as_swapped_set: set[str]

{‘1-’, ‘2+’}

Type:

Returns the gradual pattern (GP) as a set of strings

property avg_deviation_from_diagonal: float
check_am(gp_list, subset=True)[source]

Anti-monotonicity check. Checks if a GP is a subset or superset of an already existing GP

Parameters:
  • gp_list (list[GP] | None) – A list of existing GPs

  • subset (bool) – A check if it is a subset

Returns:

True if superset/subset, False otherwise

Return type:

bool

compute_descriptors(warping_set, obj_count)[source]

Computes gradual warping set (GWS) descriptors for a given gradual pattern.

The descriptors are defined as follows:

  1. Density (ρ_g):

    Proportion of concordant index pairs relative to all possible pairs ρ_g = |W_g| / C(n, 2)

  2. Average Deviation from Diagonal (μ_g):

    Mean absolute distance |i - j| across all pairs in W_g.

  3. Rank Dispersion (σ_g):

    Standard deviation of |i - j|, capturing variability of index distances across all pairs in W_g.

  4. Graph Connectivity (κ_g):

    Number of connected components when W_g is interpreted as an undirected graph.

  5. Singularity Score (S_g):

    Measures concentration of index participation (node degree skewness). High values indicate dominance of certain indices.

Path-like behavior (DTW-like) is approximated when:

κ_g = 1, S_g is low, and σ_g is smooth (low variance).

Parameters:
  • warping_set (ndarray | None) – np.ndarray of shape (k, 2), containing index pairs (i, j)

  • obj_count (int) – Total number of objects (n)

Returns:

True if descriptors are computed successfully, False otherwise

Return type:

bool

contains_attr(gi)[source]

Checks if any gradual item (GI) in the gradual pattern (GP) is composed of the column :param gi: gradual item :type gi: GI

Returns:

True if a column exists, False otherwise

Parameters:

gi (GI)

Return type:

bool

decompose()[source]

Breaks down all the gradual items (GIs) in the gradual pattern into columns and variation symbols and returns them as separate variables. For instance, a GP {“1+”, “3-”} will be returned as [1, 3], [1, -1]: where [1, 3] is the list of attributes/features and [1, -1] are their corresponding gradual variations (1 -> ‘+’ and 1- -> ‘-‘).

Returns:

Separate columns and variation symbols

Return type:

tuple[list[int], list[str]]

property density: float
get_computed_descriptors(descriptor_title)[source]

Returns the computed descriptors of the gradual pattern (GP)

Parameters:

descriptor_title – If True, returns a dictionary with column names as keys and descriptors as values

Returns:

List of descriptors

Return type:

list[str] | list[dict]

property gradual_items: list[GI]
property graph_connectivity: int
is_duplicate(valid_gps, invalid_gps=None)[source]

Checks if a pattern is in the list of winner GPs or loser GPs

Parameters:
  • valid_gps (list[GP] | None) – list of GPs

  • invalid_gps (list[GP]) – list of GPs

Returns:

True if a pattern is a list, False otherwise

Return type:

bool

static perform_and(bin_data_1, bin_data_2, dim)[source]

Perform logical AND operation on two bitmaps.

Parameters:
Return type:

PairwiseMatrix

print(columns, descriptor_title=False)[source]

A method that returns patterns with actual column names

Parameters:
  • columns (list[str]) – Column names

  • descriptor_title (bool) – If True, returns a dictionary with column names as keys and descriptors as values

Returns:

GP with actual column names

Return type:

tuple[str, list[str] | list[dict]]

property rank_dispersion: float
property singularity_score: float
property support: float
static swap_gp_symbols(gp_obj)[source]

Swaps the variation symbols of all the gradual items (GIs) in a gradual pattern (GP)

Parameters:

gp_obj (GP)

Return type:

GP

to_string()[source]

Returns the GP in string format :return: string

Return type:

list[str]

validate_graank(d_gp)[source]

Validates a candidate gradual pattern (GP) based on support computation. A GP is invalid if its support value is less than the minimum support threshold set by the user. It uses a breath-first approach to compute support.

Parameters:

d_gp (so4gp.DataGP # noinspection PyTypeChecker) – Data_GP object

Returns:

A valid GP or an empty GP

Return type:

GP

validate_tree(d_gp)[source]

Validates a candidate gradual pattern (GP) based on support computation. A GP is invalid if its support value is less than the minimum support threshold set by the user. It applies a depth-first (FP-Growth) approach to compute support.

Parameters:

d_gp (so4gp.DataGP # noinspection PyTypeChecker) – Data_GP object

Returns:

A valid GP or an empty GP

class PairwiseMatrix(bin_mat: numpy.ndarray, support: float)[source]

Bases: object

Parameters:
  • bin_mat (ndarray)

  • support (float)

bin_mat: ndarray
support: float
class TGP[source]

Bases: GP

A class that inherits an existing GP class to create Temporal GP objects. A TGP is a gradual pattern with a time-delay. It has a target gradual item (which is created from a user-defined attribute), and it is used as the anchor for mining patterns from a dataset. The class has the following attributes:

target_gradual_item: the gradual item on which the pattern is based.

temporal_gradual_items: gradual items which occur after specific time delays.

>>> import so4gp as sgp
>>> t_gp = sgp.TGP()
>>> t_gp.target_gradual_item = sgp.GI(1, "+")
>>> t_gp.add_temporal_gradual_item(sgp.GI(2, "-"), sgp.TimeDelay(7200, 0.8))
>>> t_gp.to_string()
class TemporalGI(gradual_item: so4gp.gradual_patterns.GI, time_delay: so4gp.gradual_patterns.TimeDelay)[source]

Bases: object

Parameters:
gradual_item: GI
time_delay: TimeDelay
add_temporal_gradual_item(item, time_delay)[source]

Adds a fuzzy temporal gradual item (fTGI) into the fuzzy temporal gradual pattern (fTGP) :param item: gradual item :type item: so4gp.GI

Parameters:
Returns:

void

is_similar_to(ftgp)[source]

Checks if two fuzzy temporal gradual patterns are similar.

Parameters:

ftgp – Fuzzy temporal gradual pattern to compare with.

Returns:

True if the patterns are similar, False otherwise.

Return type:

bool

print(columns, descriptor_title=False)[source]

A method that returns a fuzzy temporal gradual pattern (TGP) with actual column names

Parameters:
  • columns (list[str]) – Column names

  • descriptor_title (bool) – If True, prints the descriptor title

Returns:

TGP with actual column names

Return type:

tuple[str, list[str] | list[dict]]

property target_gradual_item: GI | None
property temporal_gradual_items: list[TemporalGI]
to_string()[source]

Returns the Temporal-GP in string format as a list.

Return type:

list

class TimeDelay(tstamp=0, supp=0)[source]

Bases: object

Parameters:
  • tstamp (Float) – The time-delay value as a timestamp.

  • supp (Float) – The true value of the time-delay value.

property formatted_time: dict
property sign: str
property support: float
property timestamp: float
to_string()[source]

Returns formated time-delay as a string.

Returns:

The time-delay as a string.

Return type:

str

property valid: bool
analyze_gps(data_src, min_sup, est_gps, approach='bfs')

For each estimated GP, computes its true support using the GRAANK approach and returns the statistics (% error, and standard deviation).

>>> import so4gp as sgp
>>> import pandas
>>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]]
>>> columns = ['Age', 'Salary', 'Cars', 'Expenses']
>>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses'])
>>>
>>> estimated_gps = list()
>>> temp_gp = sgp.GP()
>>> for gi_str in ['0+', '1-']:
>>>    temp_gp.add_gradual_item(sgp.GI.from_string(gi_str))
>>> temp_gp.support = 0.5
>>> estimated_gps.append(temp_gp)
>>> temp_gp = sgp.GP()
>>> for gi_str in ['1+', '3-', '0+']:
>>>    temp_gp.add_gradual_item(sgp.GI.from_string(gi_str))
>>> temp_gp.support = 0.48
>>> estimated_gps.append(temp_gp)
>>> res = sgp.analyze_gps(dummy_df, min_sup=0.4, est_gps=estimated_gps, approach='bfs')
>>> print(res)
Gradual Pattern       Estimated Support    True Support  Percentage Error      Standard Deviation
['0+', '1-']                       0.5              0.4             25.0%                   0.071
['1+', '3-', '0+']                 0.48             0.6            -20.0%                   0.085
Parameters:
  • data_src (DataFrame | str) – Data set file

  • min_sup (float) – Minimum support (set by user)

  • est_gps (list[GP]) – Estimated GPs

  • approach (str) – ‘Bfs’ (default) or ‘dfs’

Returns:

Tabulated results

Return type:

str

get_num_cores()[source]

Finds the count of CPU cores in a computer or a SLURM supercomputer. :return: Number of cpu cores (int)

Return type:

int

get_slurm_cores()[source]

Test the computer to see if it is a SLURM environment, then gets the number of CPU cores. :return: Count of CPUs (int) or False

Return type:

int | bool