so4gp.algorithms.cluster_gp.ClusterGP

class ClusterGP(*args, e_prob=0.5, max_iter=10, **kwargs)[source]
Parameters:
  • args – [required] data source path of Pandas DataFrame, [optional] minimum-support, [optional] eq

  • e_prob (float) – [optional] erasure probability, the default is 0.5

  • max_iter (int) – [optional] maximum iteration for score vector estimation, the default is 10

>>> import pandas
>>> from so4gp.algorithms import ClusterGP
>>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]]
>>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses'])
>>>
>>> mine_obj = ClusterGP(data_source=dummy_df, min_sup=0.5, max_iter=3, e_prob=0.5)
>>> result_json = mine_obj.discover()
>>> print(result_json)
__init__(*args, e_prob=0.5, max_iter=10, **kwargs)[source]

CluDataGP stands for Clustering DataGP. It is a class that inherits the DataGP class to create data-gp objects for the clustering approach. This class inherits the DataGP class which is used to create data-gp objects. The classical data-gp object is meant to store all the parameters required by GP algorithms to extract gradual patterns (GP). It takes a numeric file (in CSV format) as input and converts it into an object whose attributes are used by algorithms to extract GPs.

A class for creating data-gp objects for the clustering approach. This class inherits the DataGP class which is used to create data-gp objects. This class adds the parameters required for clustering gradual items to the data-gp object.

Parameters:
  • args – [required] data source path of Pandas DataFrame, [optional] minimum-support, [optional] eq

  • e_prob (float) – [optional] erasure probability, the default is 0.5

  • max_iter (int) – [optional] maximum iteration for score vector estimation, the default is 10

>>> import pandas
>>> from so4gp.algorithms import ClusterGP
>>> dummy_data = [[30, 3, 1, 10], [35, 2, 2, 8], [40, 4, 2, 7], [50, 1, 1, 6], [52, 7, 1, 2]]
>>> dummy_df = pandas.DataFrame(dummy_data, columns=['Age', 'Salary', 'Cars', 'Expenses'])
>>>
>>> mine_obj = ClusterGP(data_source=dummy_df, min_sup=0.5, max_iter=3, e_prob=0.5)
>>> result_json = mine_obj.discover()
>>> print(result_json)

Methods

__init__(*args[, e_prob, max_iter])

CluDataGP stands for Clustering DataGP.

add_gradual_pattern(pattern)

Adds a gradual pattern to the list of gradual patterns.

analyze_gps(data_src, min_sup, est_gps[, ...])

For each estimated GP, computes its true support using the GRAANK approach and returns the statistics (% error, and standard deviation).

clean_data(df)

Cleans a data-frame (i.e., missing values, outliers) before extraction of GPs

clear_gradual_patterns()

Clears the list of gradual patterns.

discover()

Applies spectral clustering to determine which gradual items belong to the same group based on the similarity of net-win vectors.

fit_bitmap([attr_data])

Generates bitmaps for columns with numeric objects.

fit_warpingset()

Generates transaction ids (tids) for each column/feature with numeric objects.

gen_gradual_warping_set(pairwise_mat[, as_array])

A method that decomposes the pairwise matrix of a gradual item/pattern into a warping set.

generate_output_files(alg_data[, ...])

Generates output of results (as files) for the GP mining algorithm.

read(data_src)

Reads all the contents of a file (in CSV format) or a data-frame.

remove_subsets(gi_arr[, gradual_patterns])

Remove subset GPs from the list.

test_time(date_str)

Tests if a str represents a date-time variable.

Attributes

attr_cols

attr_size

col_count

data

display_patterns

display_patterns_as_df

gradual_patterns

row_count

thd_supp

time_cols

titles

valid_bins

warping_set