so4gp package¶

class DataGP(data_source, min_sup=0.5, eq=False)[source]¶

Bases: object

Parameters:

data_source (pd.DataFrame | str) – [required] a data source, it can either be a ‘file in csv format’ or a ‘Pandas DataFrame’
min_sup (float) – [optional] minimum support threshold, the default is 0.5
eq (bool) – [optional] encode equal values as gradual, the default is False

add_gradual_pattern(pattern)[source]¶

Adds a gradual pattern to the list of gradual patterns.

Parameters:: pattern – A gradual pattern
Return type:: None

classmethod analyze_gps(data_src, min_sup, est_gps, approach='bfs')[source]¶

For each estimated GP, computes its true support using the GRAANK approach and returns the statistics (% error, and standard deviation).

>>> import so4gp as sgp
>>> import pandas
>>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]]
>>> columns = ['Age', 'Salary', 'Cars', 'Expenses']
>>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses'])
>>>
>>> estimated_gps = list()
>>> temp_gp = sgp.GP()
>>> for gi_str in ['0+', '1-']:
>>>    temp_gp.add_gradual_item(sgp.GI.from_string(gi_str))
>>> temp_gp.support = 0.5
>>> estimated_gps.append(temp_gp)
>>> temp_gp = sgp.GP()
>>> for gi_str in ['1+', '3-', '0+']:
>>>    temp_gp.add_gradual_item(sgp.GI.from_string(gi_str))
>>> temp_gp.support = 0.48
>>> estimated_gps.append(temp_gp)
>>> res = sgp.analyze_gps(dummy_df, min_sup=0.4, est_gps=estimated_gps, approach='bfs')
>>> print(res)
Gradual Pattern       Estimated Support    True Support  Percentage Error      Standard Deviation
['0+', '1-']                       0.5              0.4             25.0%                   0.071
['1+', '3-', '0+']                 0.48             0.6            -20.0%                   0.085

Parameters:

data_src (DataFrame | str) – Data set file
min_sup (float) – Minimum support (set by user)
est_gps (list[GP]) – Estimated GPs
approach (str) – ‘Bfs’ (default) or ‘dfs’

Returns:

Tabulated results

Return type:

str

property attr_cols: ndarray¶

property attr_size: int¶

static clean_data(df)[source]¶

Cleans a data-frame (i.e., missing values, outliers) before extraction of GPs

Parameters:: df (pd.DataFrame) – data-frame
Returns:: list (column titles), numpy (cleaned data)
Return type:: tuple[list, ndarray]

clear_gradual_patterns()[source]¶

Clears the list of gradual patterns.

Return type:: None

property col_count: int¶

property data: ndarray¶

property display_patterns: list¶

property display_patterns_as_df: DataFrame¶

fit_bitmap(attr_data=None)[source]¶

Generates bitmaps for columns with numeric objects. It stores the bitmaps in attribute valid_bins (those bitmaps whose computed support values are greater or equal to the minimum support threshold value).

Parameters:: attr_data (np.ndarray | None) – Stepped attribute objects
Returns:: void
Return type:: None

fit_warpingset()[source]¶

Generates transaction ids (tids) for each column/feature with numeric objects. It stores the tids in attribute valid_tids (those tids whose computed support values are greater or equal to the minimum support threshold value).

The method decomposes the pairwise matrix of a gradual item/pattern into a warping set. Attributes that have strong correlation will produce a warping set with dense zigzag patterns when plotted as a graph. Those with weak correlation will produce a warping set with sparse zigzag patterns.

Return type:: None

static gen_gradual_warping_set(pairwise_mat, as_array=False)[source]¶

A method that decomposes the pairwise matrix of a gradual item/pattern into a warping set. Attributes that have strong correlation will produce a warping set with dense zigzag patterns when plotted as a graph. Those with weak correlation will produce a warping set with sparse zigzag patterns.

Parameters:

pairwise_mat (ndarray) – The pairwise matrix of a gradual item/pattern.
as_array (bool) – If True, returns the warping path as a numpy array else as a list of tuples.

Returns:

A list array of the warping path (as an edge list).

Return type:

list[tuple[int, int]] | ndarray

generate_output_files(alg_data, target_col=None, save_to_file=True)[source]¶

Generates output of results (as files) for the GP mining algorithm.

Parameters:

alg_data (dict) – Dictionary of algorithm parameters.
target_col (int) – Index of the target column.
save_to_file (bool) – If True, saves the output to files.

property gradual_patterns: list | None¶

static read(data_src)[source]¶

Reads all the contents of a file (in CSV format) or a data-frame. Checks if its columns have numeric values. It separates its column headers (titles) from the objects.

Parameters:: data_src (pd.DataFrame | str) – A data source, it can either be a ‘file in csv format’ or a ‘Pandas DataFrame’
Returns:: The title, column objects
Return type:: tuple[list, ndarray]

remove_subsets(gi_arr, gradual_patterns=None)[source]¶

Remove subset GPs from the list.

Parameters:

gi_arr (set) – Gradual items in an array
gradual_patterns (list[GP] | None) – List of gradual patterns (if None, use the object’s GPs)

Returns:

List of GPs

Return type:

None

property row_count: int¶

static test_time(date_str)[source]¶

Tests if a str represents a date-time variable.

Parameters:: date_str (str) – A string
Returns:: bool (True if it is a date-time variable, False otherwise)
Return type:: None | tuple[bool, float] | tuple[bool, bool]

property thd_supp: float¶

property time_cols: ndarray¶

property titles: list¶

property valid_bins: dict | None¶

property warping_set: dict[str, list] | None¶

class GI(attr_col, symbol)[source]¶

Bases: object

Parameters:

attr_col (int) – Column index
symbol (str) – Variation symbol either “+” or “-”

property as_tuple: tuple[int, str]¶: The Gradual Item (GI) in tuple format

property attribute_col: int¶: The column index of a GI

classmethod from_string(gi_str)[source]¶

Creates a GI from a string like ‘1+’, ‘12-’, or ‘125+’

Parameters:: gi_str (str)
Return type:: GI

static parse_gi(gi_str)[source]¶

Converts a stringified GI into normal GI. The accepted format is ‘1_neg’ or 1_pos’.

Parameters:: gi_str (str) – A stringified GI
Returns:: GI
Return type:: GI

static swap_gi_symbol(gi_obj)[source]¶

Inverts a GI symbol to the opposite variation (i.e., from - to +; or, from + to -) :return: inverted GI object

Parameters:: gi_obj (GI)
Return type:: GI

property symbol: str¶: The variation symbol of a GI

to_string()[source]¶

Returns a GI in string format :return: string

Return type:: str

class GP[source]¶

Bases: object

GP (Gradual Pattern). A class that is used to create GP objects. A GP object is a set of gradual items (GI), and its quality is measured by its computed support value. For example, given a data set with 3 columns (age, salary, cars) and 10 objects. A GP may take the form: {age+, salary-} with a support of 0.8. This implies that 8 out of 10 objects have the values of column age ‘increasing’ and column ‘salary’ decreasing.

>>> import so4gp as sgp
>>> gradual_pattern = sgp.GP()
>>> gradual_pattern.add_gradual_item(sgp.GI(0, "+"))
>>> gradual_pattern.add_gradual_item(sgp.GI(1, "-"))
>>> gradual_pattern.support = 0.5
>>> print(f"{gradual_pattern.to_string()}: {gradual_pattern.support}")

add_gradual_item(item)[source]¶

Adds a gradual item (GI) into the gradual pattern (GP) :param item: gradual item

Returns:: True if gradual item is added, None otherwise
Parameters:: item (GI)
Return type:: bool

property as_set: set[str]¶

{‘1+’, ‘2-‘}

Type:: Returns the gradual pattern (GP) as a set of strings

property as_swapped_set: set[str]¶

{‘1-’, ‘2+’}

Type:: Returns the gradual pattern (GP) as a set of strings

property avg_deviation_from_diagonal: float¶

check_am(gp_list, subset=True)[source]¶

Anti-monotonicity check. Checks if a GP is a subset or superset of an already existing GP

Parameters:

gp_list (list[GP] | None) – A list of existing GPs
subset (bool) – A check if it is a subset

Returns:

True if superset/subset, False otherwise

Return type:

bool

compute_descriptors(warping_set, obj_count)[source]¶

Computes gradual warping set (GWS) descriptors for a given gradual pattern.

The descriptors are defined as follows:

Density (ρ_g):
Proportion of concordant index pairs relative to all possible pairs ρ_g = |W_g| / C(n, 2)
Average Deviation from Diagonal (μ_g):
Mean absolute distance |i - j| across all pairs in W_g.
Rank Dispersion (σ_g):
Standard deviation of |i - j|, capturing variability of index distances across all pairs in W_g.
Graph Connectivity (κ_g):
Number of connected components when W_g is interpreted as an undirected graph.
Singularity Score (S_g):
Measures concentration of index participation (node degree skewness). High values indicate dominance of certain indices.

Path-like behavior (DTW-like) is approximated when:: κ_g = 1, S_g is low, and σ_g is smooth (low variance).

Parameters:

warping_set (ndarray | None) – np.ndarray of shape (k, 2), containing index pairs (i, j)
obj_count (int) – Total number of objects (n)

Returns:

True if descriptors are computed successfully, False otherwise

Return type:

bool

contains_attr(gi)[source]¶

Checks if any gradual item (GI) in the gradual pattern (GP) is composed of the column :param gi: gradual item :type gi: GI

Returns:: True if a column exists, False otherwise
Parameters:: gi (GI)
Return type:: bool

decompose()[source]¶

Breaks down all the gradual items (GIs) in the gradual pattern into columns and variation symbols and returns them as separate variables. For instance, a GP {“1+”, “3-”} will be returned as [1, 3], [1, -1]: where [1, 3] is the list of attributes/features and [1, -1] are their corresponding gradual variations (1 -> ‘+’ and 1- -> ‘-‘).

Returns:: Separate columns and variation symbols
Return type:: tuple[list[int], list[str]]

property density: float¶

get_computed_descriptors(descriptor_title)[source]¶

Returns the computed descriptors of the gradual pattern (GP)

Parameters:: descriptor_title – If True, returns a dictionary with column names as keys and descriptors as values
Returns:: List of descriptors
Return type:: list[str] | list[dict]

property gradual_items: list[GI]¶

property graph_connectivity: int¶

is_duplicate(valid_gps, invalid_gps=None)[source]¶

Checks if a pattern is in the list of winner GPs or loser GPs

Parameters:

valid_gps (list[GP] | None) – list of GPs
invalid_gps (list[GP]) – list of GPs

Returns:

True if a pattern is a list, False otherwise

Return type:

bool

static perform_and(bin_data_1, bin_data_2, dim)[source]¶

Perform logical AND operation on two bitmaps.

Parameters:

bin_data_1 (PairwiseMatrix | None) – Bitmap 1
bin_data_2 (PairwiseMatrix | None) – bitmap 2
dim (int) – dimension of the bitmaps

Return type:

PairwiseMatrix

print(columns, descriptor_title=False)[source]¶

A method that returns patterns with actual column names

Parameters:

columns (list[str]) – Column names
descriptor_title (bool) – If True, returns a dictionary with column names as keys and descriptors as values

Returns:

GP with actual column names

Return type:

tuple[str, list[str] | list[dict]]

property rank_dispersion: float¶

property singularity_score: float¶

property support: float¶

static swap_gp_symbols(gp_obj)[source]¶

Swaps the variation symbols of all the gradual items (GIs) in a gradual pattern (GP)

Parameters:: gp_obj (GP)
Return type:: GP

to_string()[source]¶

Returns the GP in string format :return: string

Return type:: list[str]

validate_graank(d_gp)[source]¶

Validates a candidate gradual pattern (GP) based on support computation. A GP is invalid if its support value is less than the minimum support threshold set by the user. It uses a breath-first approach to compute support.

Parameters:: d_gp (so4gp.DataGP # noinspection PyTypeChecker) – Data_GP object
Returns:: A valid GP or an empty GP
Return type:: GP

validate_tree(d_gp)[source]¶

Validates a candidate gradual pattern (GP) based on support computation. A GP is invalid if its support value is less than the minimum support threshold set by the user. It applies a depth-first (FP-Growth) approach to compute support.

Parameters:: d_gp (so4gp.DataGP # noinspection PyTypeChecker) – Data_GP object
Returns:: A valid GP or an empty GP

class PairwiseMatrix(bin_mat: numpy.ndarray, support: float)[source]¶

Bases: object

Parameters:

bin_mat (ndarray)
support (float)

bin_mat: ndarray¶

support: float¶

class TGP[source]¶

Bases: GP

A class that inherits an existing GP class to create Temporal GP objects. A TGP is a gradual pattern with a time-delay. It has a target gradual item (which is created from a user-defined attribute), and it is used as the anchor for mining patterns from a dataset. The class has the following attributes:

target_gradual_item: the gradual item on which the pattern is based.

temporal_gradual_items: gradual items which occur after specific time delays.

>>> import so4gp as sgp
>>> t_gp = sgp.TGP()
>>> t_gp.target_gradual_item = sgp.GI(1, "+")
>>> t_gp.add_temporal_gradual_item(sgp.GI(2, "-"), sgp.TimeDelay(7200, 0.8))
>>> t_gp.to_string()

class TemporalGI(gradual_item: so4gp.gradual_patterns.GI, time_delay: so4gp.gradual_patterns.TimeDelay)[source]¶

Bases: object

Parameters:

gradual_item (GI)
time_delay (TimeDelay)

gradual_item: GI¶

time_delay: TimeDelay¶

add_temporal_gradual_item(item, time_delay)[source]¶

Adds a fuzzy temporal gradual item (fTGI) into the fuzzy temporal gradual pattern (fTGP) :param item: gradual item :type item: so4gp.GI

Parameters:

time_delay (TimeDelay) – time delay
item (GI)

Returns:

void

is_similar_to(ftgp)[source]¶

Checks if two fuzzy temporal gradual patterns are similar.

Parameters:: ftgp – Fuzzy temporal gradual pattern to compare with.
Returns:: True if the patterns are similar, False otherwise.
Return type:: bool

print(columns, descriptor_title=False)[source]¶

A method that returns a fuzzy temporal gradual pattern (TGP) with actual column names

Parameters:

columns (list[str]) – Column names
descriptor_title (bool) – If True, prints the descriptor title

Returns:

TGP with actual column names

Return type:

tuple[str, list[str] | list[dict]]

property target_gradual_item: GI | None¶

property temporal_gradual_items: list[TemporalGI]¶

to_string()[source]¶

Returns the Temporal-GP in string format as a list.

Return type:: list

class TimeDelay(tstamp=0, supp=0)[source]¶

Bases: object

Parameters:

tstamp (Float) – The time-delay value as a timestamp.
supp (Float) – The true value of the time-delay value.

property formatted_time: dict¶

property sign: str¶

property support: float¶

property timestamp: float¶

to_string()[source]¶

Returns formated time-delay as a string.

Returns:: The time-delay as a string.
Return type:: str

property valid: bool¶

analyze_gps(data_src, min_sup, est_gps, approach='bfs')¶

For each estimated GP, computes its true support using the GRAANK approach and returns the statistics (% error, and standard deviation).

>>> import so4gp as sgp
>>> import pandas
>>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]]
>>> columns = ['Age', 'Salary', 'Cars', 'Expenses']
>>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses'])
>>>
>>> estimated_gps = list()
>>> temp_gp = sgp.GP()
>>> for gi_str in ['0+', '1-']:
>>>    temp_gp.add_gradual_item(sgp.GI.from_string(gi_str))
>>> temp_gp.support = 0.5
>>> estimated_gps.append(temp_gp)
>>> temp_gp = sgp.GP()
>>> for gi_str in ['1+', '3-', '0+']:
>>>    temp_gp.add_gradual_item(sgp.GI.from_string(gi_str))
>>> temp_gp.support = 0.48
>>> estimated_gps.append(temp_gp)
>>> res = sgp.analyze_gps(dummy_df, min_sup=0.4, est_gps=estimated_gps, approach='bfs')
>>> print(res)
Gradual Pattern       Estimated Support    True Support  Percentage Error      Standard Deviation
['0+', '1-']                       0.5              0.4             25.0%                   0.071
['1+', '3-', '0+']                 0.48             0.6            -20.0%                   0.085

Parameters:

data_src (DataFrame | str) – Data set file
min_sup (float) – Minimum support (set by user)
est_gps (list[GP]) – Estimated GPs
approach (str) – ‘Bfs’ (default) or ‘dfs’

Returns:

Tabulated results

Return type:

str

get_num_cores()[source]¶

Finds the count of CPU cores in a computer or a SLURM supercomputer. :return: Number of cpu cores (int)

Return type:: int

get_slurm_cores()[source]¶

Test the computer to see if it is a SLURM environment, then gets the number of CPU cores. :return: Count of CPUs (int) or False

Return type:: int | bool