so4gp.algorithms.tgrad_ami.TGradAMI¶

class TGradAMI(*args, min_error=0.0001, **kwargs)[source]¶

Parameters:

args – [required] data source path of Pandas DataFrame, [optional] minimum-support, [optional] eq
kwargs – [required] target-column or attribute or feature, [optional] minimum representativity
min_error (float) – [optional] minimum Mutual Information error margin.

>>> from so4gp.algorithms import TGradAMI
>>> import pandas
>>>
>>> dummy_data = [["2021-03", 30, 3, 1, 10], ["2021-04", 35, 2, 2, 8], ["2021-05", 40, 4, 2, 7], ["2021-06", 50, 1, 1, 6], ["2021-07", 52, 7, 1, 2]]
>>> dummy_df = pandas.DataFrame(dummy_data, columns=['Date', 'Age', 'Salary', 'Cars', 'Expenses'])
>>>
>>> mine_obj = TGradAMI(dummy_df, min_sup=0.5, target_col=1, min_rep=0.5, min_error=0.1)
>>> result_dict = mine_obj.discover_tgp(use_clustering=True, eval_mode=False)
>>>
>>> # print(result['Patterns'])
>>> print(result_dict)

__init__(*args, min_error=0.0001, **kwargs)[source]¶

Algorithm for estimating time-lag using Average Mutual Information (AMI) and KMeans clustering which is extended to mining gradual patterns. The average mutual information I(X; Y) is a measure of the “information” amount that the random variables X and Y provide about one another.

This algorithm extends the work published in: https://ieeexplore.ieee.org/abstract/document/8858883. TGradAMI is an algorithm that improves the classical TGrad algorithm for extracting more accurate temporal gradual patterns. It computes Mutual Information (MI) with respect to target-column with original dataset to get the actual relationship between variables: by computing MI for every possible time-delay and if the transformed dataset has the same almost identical MI to the original dataset, then it selects that as the best time-delay. Instead of min-representativity value, the algorithm relies on the error-margin between MIs.

Parameters:

args – [required] data source path of Pandas DataFrame, [optional] minimum-support, [optional] eq
kwargs – [required] target-column or attribute or feature, [optional] minimum representativity
min_error (float) – [optional] minimum Mutual Information error margin.

>>> from so4gp.algorithms import TGradAMI
>>> import pandas
>>>
>>> dummy_data = [["2021-03", 30, 3, 1, 10], ["2021-04", 35, 2, 2, 8], ["2021-05", 40, 4, 2, 7], ["2021-06", 50, 1, 1, 6], ["2021-07", 52, 7, 1, 2]]
>>> dummy_df = pandas.DataFrame(dummy_data, columns=['Date', 'Age', 'Salary', 'Cars', 'Expenses'])
>>>
>>> mine_obj = TGradAMI(dummy_df, min_sup=0.5, target_col=1, min_rep=0.5, min_error=0.1)
>>> result_dict = mine_obj.discover_tgp(use_clustering=True, eval_mode=False)
>>>
>>> # print(result['Patterns'])
>>> print(result_dict)

Methods

`__init__`(*args[, min_error])	Algorithm for estimating time-lag using Average Mutual Information (AMI) and KMeans clustering which is extended to mining gradual patterns.
`add_gradual_pattern`(pattern)	Adds a gradual pattern to the list of gradual patterns.
`analyze_gps`(data_src, min_sup, est_gps[, ...])	For each estimated GP, computes its true support using the GRAANK approach and returns the statistics (% error, and standard deviation).
`build_mf_w_clusters`(time_data)	A method that builds the boundaries of a fuzzy Triangular membership function (MF) using Singular Value Decomposition (to estimate the number of centers) and KMeans algorithm to group time data according to the identified centers.
`clean_data`(df)	Cleans a data-frame (i.e., missing values, outliers) before extraction of GPs
`clear_gradual_patterns`()	Clears the list of gradual patterns.
`discover`([ignore_support, apriori_level, ...])	Uses apriori algorithm to find gradual pattern (GP) candidates.
`discover_tgp`([use_clustering, ...])	A method that applies mutual information concept, clustering, and hill-climbing algorithm to find the best data transformation that maintains MI and estimate the best time-delay value of the mined Fuzzy Temporal Gradual Patterns (FTGPs).
`find_best_mutual_info`()	A method that computes the mutual information I(X; Y) of the original dataset and all the transformed datasets w.r.t.
`fit_bitmap`([attr_data])	Generates bitmaps for columns with numeric objects.
`fit_warpingset`()	Generates transaction ids (tids) for each column/feature with numeric objects.
`gather_delayed_data`(optimal_dict, max_step)	A method that combined attribute data with different data transformations and computes the corresponding time-delay values for each attribute.
`gen_gradual_warping_set`(pairwise_mat[, as_array])	A method that decomposes the pairwise matrix of a gradual item/pattern into a warping set.
`generate_output_files`(alg_data[, ...])	Generates output of results (as files) for the GP mining algorithm.
`get_fuzzy_time_lag`(bin_data, time_data[, ...])	A method that uses a fuzzy membership function to select the most accurate time-delay value.
`get_time_diffs`(step)	A method that computes the difference between 2 timestamps separated by a specific transformation step.
`get_timestamp`(time_str)	A method that computes the corresponding timestamp from a DateTime string.
`read`(data_src)	Reads all the contents of a file (in CSV format) or a data-frame.
`remove_subsets`(gi_arr[, gradual_patterns])	Remove subset GPs from the list.
`test_time`(date_str)	Tests if a str represents a date-time variable.
`transform_and_mine`(step[, return_patterns])	A method that: (1) transforms data according to a step value and, (2) mines the transformed data for FTGPs.

Attributes

`attr_cols`
`attr_size`
`col_count`
`data`
`display_patterns`
`display_patterns_as_df`
`error_margin`
`feature_cols`
`full_attr_data`
`gradual_patterns`
`max_step`
`mi_error`
`min_rep`
`row_count`
`target_col`
`thd_supp`
`time_cols`
`titles`
`valid_bins`
`warping_set`