so4gp.algorithms.tgrad_ami.TGradAMI

class TGradAMI(*args, min_error=0.0001, **kwargs)[source]
Parameters:
  • args – [required] data source path of Pandas DataFrame, [optional] minimum-support, [optional] eq

  • kwargs – [required] target-column or attribute or feature, [optional] minimum representativity

  • min_error (float) – [optional] minimum Mutual Information error margin.

>>> from so4gp.algorithms import TGradAMI
>>> import pandas
>>>
>>> dummy_data = [["2021-03", 30, 3, 1, 10], ["2021-04", 35, 2, 2, 8], ["2021-05", 40, 4, 2, 7], ["2021-06", 50, 1, 1, 6], ["2021-07", 52, 7, 1, 2]]
>>> dummy_df = pandas.DataFrame(dummy_data, columns=['Date', 'Age', 'Salary', 'Cars', 'Expenses'])
>>>
>>> mine_obj = TGradAMI(dummy_df, min_sup=0.5, target_col=1, min_rep=0.5, min_error=0.1)
>>> result_dict = mine_obj.discover_tgp(use_clustering=True, eval_mode=False)
>>>
>>> # print(result['Patterns'])
>>> print(result_dict)
__init__(*args, min_error=0.0001, **kwargs)[source]

Algorithm for estimating time-lag using Average Mutual Information (AMI) and KMeans clustering which is extended to mining gradual patterns. The average mutual information I(X; Y) is a measure of the “information” amount that the random variables X and Y provide about one another.

This algorithm extends the work published in: https://ieeexplore.ieee.org/abstract/document/8858883. TGradAMI is an algorithm that improves the classical TGrad algorithm for extracting more accurate temporal gradual patterns. It computes Mutual Information (MI) with respect to target-column with original dataset to get the actual relationship between variables: by computing MI for every possible time-delay and if the transformed dataset has the same almost identical MI to the original dataset, then it selects that as the best time-delay. Instead of min-representativity value, the algorithm relies on the error-margin between MIs.

Parameters:
  • args – [required] data source path of Pandas DataFrame, [optional] minimum-support, [optional] eq

  • kwargs – [required] target-column or attribute or feature, [optional] minimum representativity

  • min_error (float) – [optional] minimum Mutual Information error margin.

>>> from so4gp.algorithms import TGradAMI
>>> import pandas
>>>
>>> dummy_data = [["2021-03", 30, 3, 1, 10], ["2021-04", 35, 2, 2, 8], ["2021-05", 40, 4, 2, 7], ["2021-06", 50, 1, 1, 6], ["2021-07", 52, 7, 1, 2]]
>>> dummy_df = pandas.DataFrame(dummy_data, columns=['Date', 'Age', 'Salary', 'Cars', 'Expenses'])
>>>
>>> mine_obj = TGradAMI(dummy_df, min_sup=0.5, target_col=1, min_rep=0.5, min_error=0.1)
>>> result_dict = mine_obj.discover_tgp(use_clustering=True, eval_mode=False)
>>>
>>> # print(result['Patterns'])
>>> print(result_dict)

Methods

__init__(*args[, min_error])

Algorithm for estimating time-lag using Average Mutual Information (AMI) and KMeans clustering which is extended to mining gradual patterns.

add_gradual_pattern(pattern)

Adds a gradual pattern to the list of gradual patterns.

analyze_gps(data_src, min_sup, est_gps[, ...])

For each estimated GP, computes its true support using the GRAANK approach and returns the statistics (% error, and standard deviation).

build_mf_w_clusters(time_data)

A method that builds the boundaries of a fuzzy Triangular membership function (MF) using Singular Value Decomposition (to estimate the number of centers) and KMeans algorithm to group time data according to the identified centers.

clean_data(df)

Cleans a data-frame (i.e., missing values, outliers) before extraction of GPs

clear_gradual_patterns()

Clears the list of gradual patterns.

discover([ignore_support, apriori_level, ...])

Uses apriori algorithm to find gradual pattern (GP) candidates.

discover_tgp([use_clustering, ...])

A method that applies mutual information concept, clustering, and hill-climbing algorithm to find the best data transformation that maintains MI and estimate the best time-delay value of the mined Fuzzy Temporal Gradual Patterns (FTGPs).

find_best_mutual_info()

A method that computes the mutual information I(X; Y) of the original dataset and all the transformed datasets w.r.t.

fit_bitmap([attr_data])

Generates bitmaps for columns with numeric objects.

fit_warpingset()

Generates transaction ids (tids) for each column/feature with numeric objects.

gather_delayed_data(optimal_dict, max_step)

A method that combined attribute data with different data transformations and computes the corresponding time-delay values for each attribute.

gen_gradual_warping_set(pairwise_mat[, as_array])

A method that decomposes the pairwise matrix of a gradual item/pattern into a warping set.

generate_output_files(alg_data[, ...])

Generates output of results (as files) for the GP mining algorithm.

get_fuzzy_time_lag(bin_data, time_data[, ...])

A method that uses a fuzzy membership function to select the most accurate time-delay value.

get_time_diffs(step)

A method that computes the difference between 2 timestamps separated by a specific transformation step.

get_timestamp(time_str)

A method that computes the corresponding timestamp from a DateTime string.

read(data_src)

Reads all the contents of a file (in CSV format) or a data-frame.

remove_subsets(gi_arr[, gradual_patterns])

Remove subset GPs from the list.

test_time(date_str)

Tests if a str represents a date-time variable.

transform_and_mine(step[, return_patterns])

A method that: (1) transforms data according to a step value and, (2) mines the transformed data for FTGPs.

Attributes

attr_cols

attr_size

col_count

data

display_patterns

display_patterns_as_df

error_margin

feature_cols

full_attr_data

gradual_patterns

max_step

mi_error

min_rep

row_count

target_col

thd_supp

time_cols

titles

valid_bins

warping_set