so4gp.algorithms.tgrad_ami.TGradAMI¶
- class TGradAMI(*args, min_error=0.0001, **kwargs)[source]¶
- Parameters:
args – [required] data source path of Pandas DataFrame, [optional] minimum-support, [optional] eq
kwargs – [required] target-column or attribute or feature, [optional] minimum representativity
min_error (float) – [optional] minimum Mutual Information error margin.
>>> from so4gp.algorithms import TGradAMI >>> import pandas >>> >>> dummy_data = [["2021-03", 30, 3, 1, 10], ["2021-04", 35, 2, 2, 8], ["2021-05", 40, 4, 2, 7], ["2021-06", 50, 1, 1, 6], ["2021-07", 52, 7, 1, 2]] >>> dummy_df = pandas.DataFrame(dummy_data, columns=['Date', 'Age', 'Salary', 'Cars', 'Expenses']) >>> >>> mine_obj = TGradAMI(dummy_df, min_sup=0.5, target_col=1, min_rep=0.5, min_error=0.1) >>> result_dict = mine_obj.discover_tgp(use_clustering=True, eval_mode=False) >>> >>> # print(result['Patterns']) >>> print(result_dict)
- __init__(*args, min_error=0.0001, **kwargs)[source]¶
Algorithm for estimating time-lag using Average Mutual Information (AMI) and KMeans clustering which is extended to mining gradual patterns. The average mutual information I(X; Y) is a measure of the “information” amount that the random variables X and Y provide about one another.
This algorithm extends the work published in: https://ieeexplore.ieee.org/abstract/document/8858883. TGradAMI is an algorithm that improves the classical TGrad algorithm for extracting more accurate temporal gradual patterns. It computes Mutual Information (MI) with respect to target-column with original dataset to get the actual relationship between variables: by computing MI for every possible time-delay and if the transformed dataset has the same almost identical MI to the original dataset, then it selects that as the best time-delay. Instead of min-representativity value, the algorithm relies on the error-margin between MIs.
- Parameters:
args – [required] data source path of Pandas DataFrame, [optional] minimum-support, [optional] eq
kwargs – [required] target-column or attribute or feature, [optional] minimum representativity
min_error (float) – [optional] minimum Mutual Information error margin.
>>> from so4gp.algorithms import TGradAMI >>> import pandas >>> >>> dummy_data = [["2021-03", 30, 3, 1, 10], ["2021-04", 35, 2, 2, 8], ["2021-05", 40, 4, 2, 7], ["2021-06", 50, 1, 1, 6], ["2021-07", 52, 7, 1, 2]] >>> dummy_df = pandas.DataFrame(dummy_data, columns=['Date', 'Age', 'Salary', 'Cars', 'Expenses']) >>> >>> mine_obj = TGradAMI(dummy_df, min_sup=0.5, target_col=1, min_rep=0.5, min_error=0.1) >>> result_dict = mine_obj.discover_tgp(use_clustering=True, eval_mode=False) >>> >>> # print(result['Patterns']) >>> print(result_dict)
Methods
__init__(*args[, min_error])Algorithm for estimating time-lag using Average Mutual Information (AMI) and KMeans clustering which is extended to mining gradual patterns.
add_gradual_pattern(pattern)Adds a gradual pattern to the list of gradual patterns.
analyze_gps(data_src, min_sup, est_gps[, ...])For each estimated GP, computes its true support using the GRAANK approach and returns the statistics (% error, and standard deviation).
build_mf_w_clusters(time_data)A method that builds the boundaries of a fuzzy Triangular membership function (MF) using Singular Value Decomposition (to estimate the number of centers) and KMeans algorithm to group time data according to the identified centers.
clean_data(df)Cleans a data-frame (i.e., missing values, outliers) before extraction of GPs
clear_gradual_patterns()Clears the list of gradual patterns.
discover([ignore_support, apriori_level, ...])Uses apriori algorithm to find gradual pattern (GP) candidates.
discover_tgp([use_clustering, ...])A method that applies mutual information concept, clustering, and hill-climbing algorithm to find the best data transformation that maintains MI and estimate the best time-delay value of the mined Fuzzy Temporal Gradual Patterns (FTGPs).
find_best_mutual_info()A method that computes the mutual information I(X; Y) of the original dataset and all the transformed datasets w.r.t.
fit_bitmap([attr_data])Generates bitmaps for columns with numeric objects.
fit_warpingset()Generates transaction ids (tids) for each column/feature with numeric objects.
gather_delayed_data(optimal_dict, max_step)A method that combined attribute data with different data transformations and computes the corresponding time-delay values for each attribute.
gen_gradual_warping_set(pairwise_mat[, as_array])A method that decomposes the pairwise matrix of a gradual item/pattern into a warping set.
generate_output_files(alg_data[, ...])Generates output of results (as files) for the GP mining algorithm.
get_fuzzy_time_lag(bin_data, time_data[, ...])A method that uses a fuzzy membership function to select the most accurate time-delay value.
get_time_diffs(step)A method that computes the difference between 2 timestamps separated by a specific transformation step.
get_timestamp(time_str)A method that computes the corresponding timestamp from a DateTime string.
read(data_src)Reads all the contents of a file (in CSV format) or a data-frame.
remove_subsets(gi_arr[, gradual_patterns])Remove subset GPs from the list.
test_time(date_str)Tests if a str represents a date-time variable.
transform_and_mine(step[, return_patterns])A method that: (1) transforms data according to a step value and, (2) mines the transformed data for FTGPs.
Attributes
attr_colsattr_sizecol_countdatadisplay_patternsdisplay_patterns_as_dferror_marginfeature_colsfull_attr_datagradual_patternsmax_stepmi_errormin_reprow_counttarget_colthd_supptime_colstitlesvalid_binswarping_set