so4gp.algorithms.cluster_gp.ClusterGP¶
- class ClusterGP(*args, e_prob=0.5, max_iter=10, **kwargs)[source]¶
- Parameters:
args – [required] data source path of Pandas DataFrame, [optional] minimum-support, [optional] eq
e_prob (float) – [optional] erasure probability, the default is 0.5
max_iter (int) – [optional] maximum iteration for score vector estimation, the default is 10
>>> import pandas >>> from so4gp.algorithms import ClusterGP >>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]] >>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses']) >>> >>> mine_obj = ClusterGP(data_source=dummy_df, min_sup=0.5, max_iter=3, e_prob=0.5) >>> result_json = mine_obj.discover() >>> print(result_json)
- __init__(*args, e_prob=0.5, max_iter=10, **kwargs)[source]¶
CluDataGP stands for Clustering DataGP. It is a class that inherits the DataGP class to create data-gp objects for the clustering approach. This class inherits the DataGP class which is used to create data-gp objects. The classical data-gp object is meant to store all the parameters required by GP algorithms to extract gradual patterns (GP). It takes a numeric file (in CSV format) as input and converts it into an object whose attributes are used by algorithms to extract GPs.
A class for creating data-gp objects for the clustering approach. This class inherits the DataGP class which is used to create data-gp objects. This class adds the parameters required for clustering gradual items to the data-gp object.
- Parameters:
args – [required] data source path of Pandas DataFrame, [optional] minimum-support, [optional] eq
e_prob (float) – [optional] erasure probability, the default is 0.5
max_iter (int) – [optional] maximum iteration for score vector estimation, the default is 10
>>> import pandas >>> from so4gp.algorithms import ClusterGP >>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]] >>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses']) >>> >>> mine_obj = ClusterGP(data_source=dummy_df, min_sup=0.5, max_iter=3, e_prob=0.5) >>> result_json = mine_obj.discover() >>> print(result_json)
Methods
__init__(*args[, e_prob, max_iter])CluDataGP stands for Clustering DataGP.
add_gradual_pattern(pattern)Adds a gradual pattern to the list of gradual patterns.
analyze_gps(data_src, min_sup, est_gps[, ...])For each estimated GP, computes its true support using the GRAANK approach and returns the statistics (% error, and standard deviation).
clean_data(df)Cleans a data-frame (i.e., missing values, outliers) before extraction of GPs
clear_gradual_patterns()Clears the list of gradual patterns.
discover()Applies spectral clustering to determine which gradual items belong to the same group based on the similarity of net-win vectors.
fit_bitmap([attr_data])Generates bitmaps for columns with numeric objects.
fit_warpingset()Generates transaction ids (tids) for each column/feature with numeric objects.
gen_gradual_warping_set(pairwise_mat[, as_array])A method that decomposes the pairwise matrix of a gradual item/pattern into a warping set.
generate_output_files(alg_data[, ...])Generates output of results (as files) for the GP mining algorithm.
read(data_src)Reads all the contents of a file (in CSV format) or a data-frame.
remove_subsets(gi_arr[, gradual_patterns])Remove subset GPs from the list.
test_time(date_str)Tests if a str represents a date-time variable.
Attributes
attr_colsattr_sizecol_countdatadisplay_patternsdisplay_patterns_as_dfgradual_patternsrow_countthd_supptime_colstitlesvalid_binswarping_set