so4gp.algorithms.graank_aco.AntGRAANK¶
- class AntGRAANK(*args, max_iter=1, e_factor=0.5, **kwargs)[source]¶
- Parameters:
args – [required] data source path of Pandas DataFrame, [optional] minimum-support, [optional] eq
max_iter (int) – [optional] maximum_iteration, default is 1
e_factor (float) –
[optional] evaporation factor, default is 0.5
>>> from so4gp.algorithms import AntGRAANK >>> import pandas >>> >>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]] >>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses']) >>> >>> mine_obj = AntGRAANK(data_source=dummy_df, min_sup=0.5, max_iter=3, e_factor=0.5) >>> result_json = mine_obj.discover() >>> # print(result['Patterns']) >>> print(result_json) {"Algorithm": "ACO-GRAANK", "Best Patterns": [[["Expenses-", "Age+"], 1.0]], "Invalid Count": 1, "Iterations":3}
- __init__(*args, max_iter=1, e_factor=0.5, **kwargs)[source]¶
Extract gradual patterns (GPs) from a numeric data source using the Ant Colony Optimization approach (proposed in a published paper by Dickson Owuor). A GP is a set of gradual items (GI), and its quality is measured by its computed support value. For example, given a data set with 3 columns (age, salary, cars) and 10 objects. A GP may take the form: {age+, salary-} with a support of 0.8. This implies that 8 out of 10 objects have the values of column age ‘increasing’ and column ‘salary’ decreasing.
In this approach, it is assumed that every column can be converted into a gradual item (GI). If the GI is valid (i.e., its computed support is greater than the minimum support threshold), then it is either increasing or decreasing (+ or -), otherwise it is irrelevant (x). Therefore, a pheromone matrix is built using the number of columns and the possible variations (increasing, decreasing, irrelevant) or (+, -, x). The algorithm starts by randomly generating GP candidates using the pheromone matrix, each candidate is validated by confirming that its computed support is greater or equal to the minimum support threshold. The valid GPs are used to update the pheromone levels and better candidates are generated.
- Parameters:
args – [required] data source path of Pandas DataFrame, [optional] minimum-support, [optional] eq
max_iter (int) – [optional] maximum_iteration, default is 1
e_factor (float) –
[optional] evaporation factor, default is 0.5
>>> from so4gp.algorithms import AntGRAANK >>> import pandas >>> >>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]] >>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses']) >>> >>> mine_obj = AntGRAANK(data_source=dummy_df, min_sup=0.5, max_iter=3, e_factor=0.5) >>> result_json = mine_obj.discover() >>> # print(result['Patterns']) >>> print(result_json) {"Algorithm": "ACO-GRAANK", "Best Patterns": [[["Expenses-", "Age+"], 1.0]], "Invalid Count": 1, "Iterations":3}
Methods
__init__(*args[, max_iter, e_factor])Extract gradual patterns (GPs) from a numeric data source using the Ant Colony Optimization approach (proposed in a published paper by Dickson Owuor).
add_gradual_pattern(pattern)Adds a gradual pattern to the list of gradual patterns.
analyze_gps(data_src, min_sup, est_gps[, ...])For each estimated GP, computes its true support using the GRAANK approach and returns the statistics (% error, and standard deviation).
clean_data(df)Cleans a data-frame (i.e., missing values, outliers) before extraction of GPs
clear_gradual_patterns()Clears the list of gradual patterns.
discover()Applies ant-colony optimization algorithm and uses pheromone levels to find GP candidates.
fit_bitmap([attr_data])Generates bitmaps for columns with numeric objects.
fit_warpingset()Generates transaction ids (tids) for each column/feature with numeric objects.
gen_gradual_warping_set(pairwise_mat[, as_array])A method that decomposes the pairwise matrix of a gradual item/pattern into a warping set.
generate_output_files(alg_data[, ...])Generates output of results (as files) for the GP mining algorithm.
read(data_src)Reads all the contents of a file (in CSV format) or a data-frame.
remove_subsets(gi_arr[, gradual_patterns])Remove subset GPs from the list.
test_time(date_str)Tests if a str represents a date-time variable.
Attributes
attr_colsattr_sizecol_countdatadisplay_patternsdisplay_patterns_as_dfgradual_patternsrow_countthd_supptime_colstitlesvalid_binswarping_set