iteraa.iaa ========== .. py:module:: iteraa.iaa .. autoapi-nested-parse:: Created on Sun Jan 27 13:39:27 2019 Modified on Mon Jan 15 13:15:29 2024 by Jonathan Yik Chang Ting @original author: Benyamin Motevalli This class is developed based on "Archetypal Analysis" by Adele Cutler and Leo Breiman, Technometrics, November 1994, Vol.36, No.4, pp. 338-347 Classes ------- .. autoapisummary:: iteraa.iaa.ArchetypalAnalysis Module Contents --------------- .. py:class:: ArchetypalAnalysis(nArchetypes=2, iterative=False, nSubsets=10, shuffle=True, onlyZ=False, subsetsSampleIdxs=[], tolerance=0.001, maxIter=200, randomState=RANDOM_STATE, C=0.0001, initialize='furthestSum', redundancyTry=30, robust=False, computeXtX=False, stepsFISTA=3, stepsAS=50, randominit=False, numThreads=-1, verbose=False) Parameters: ----------- nArchetypes: Defines the number of archetypes. nDim: Number of features (Dimensions). nData: Number of data in dataset. X: Dimension: nDim x nData Array of data points. It is the transpose of input data. archetypes: Dimension: nDim x nArchetypes Array of archetypes. Each columns represents an archetype. alfa: Dimension: nArchetypes x nData Each column defines the weight coefficients for each archetype to approximate data point i. Xi = sum([alfa]ik x Zk) beta: Dimension: nData x nArchetypes Each column defines the weight coefficients for each data point from which archetype k is constructed. tolerance: Defines when to stop optimization. maxIter: Defines the maximum number of iterations randomState: Defines the random seed number for initialization. No effect if "furthestSum" is selected. C: is a constraint coefficient to ensure that the summation of alfa's and beta's equals to 1. C is conisdered to be inverse of M^2 in the original paper. initialize: Defines the initialization method to guess initial archetypes: 1. furthestSum (Default): the idea and code taken from https://github.com/ulfaslak/py_pcha and the original author is: Ulf Aslak Jensen. 2. random: Randomly selects archetypes in the feature space. The points could be any point in space 3. randomIdx: Randomly selects archetypes from points in the dataset. .. py:attribute:: nArchetypes :value: 2 .. py:attribute:: nDim :value: None .. py:attribute:: nData :value: None .. py:attribute:: iterative :value: False .. py:attribute:: nSubsets :value: 10 .. py:attribute:: shuffle :value: True .. py:attribute:: onlyZ :value: False .. py:attribute:: tolerance :value: 0.001 .. py:attribute:: C :value: 0.0001 .. py:attribute:: randomState :value: 42 .. py:attribute:: archetypes :value: [] .. py:attribute:: alfa :value: [] .. py:attribute:: beta :value: [] .. py:attribute:: explainedVariance_ :value: [] .. py:attribute:: RSS_ :value: None .. py:attribute:: RSS0_ :value: None .. py:attribute:: RSSi_ :value: [] .. py:attribute:: initialize :value: '' .. py:attribute:: closeMatch .. py:attribute:: maxIter :value: 200 .. py:attribute:: redundancyTry :value: 30 .. py:attribute:: robust :value: False .. py:attribute:: computeXtX :value: False .. py:attribute:: stepsFISTA :value: 3 .. py:attribute:: stepsAS :value: 50 .. py:attribute:: randominit :value: False .. py:attribute:: numThreads :value: -1 .. py:attribute:: subsetsZs :value: [] .. py:attribute:: subsetsSampleIdxs :value: [] .. py:attribute:: runTime :value: 0.0 .. py:attribute:: verbose :value: False .. py:method:: fit(X) .. py:method:: fitClassical(X) .. py:method:: fit_transform(X) .. py:method:: transform(Xnew) .. py:method:: _initializeArchetypes() .. py:method:: _randomInitialize() .. py:method:: _furthestSumInitialize() .. py:method:: _randomIdxInitialize() .. py:method:: __optimizeAlfaForTransform(Xnew, nData) This functions aims to obtain corresponding alfa values for a new data point after the fitting is done and archetypes are determined. Having alfas, we can approximate the new data-points in terms of archetypes. NOTE: Xnew dimension is nData x nDim. Here, the original data is passed instead of transpose. .. py:method:: _optimizeAlfa() self.archetypes: has a shape of nDim x nArchetypes self.alfai: has a shape of nArchetypes x 1. xi: has a shape of nDim x 1. The problem to minimize is: xi = self.archetypes x self.alfai .. py:method:: _optimizeBeta() .. py:method:: _findNewArchetype(k) In some circumstance, summation of alfa's for an archetype k becomes zero. That means archetype k is redundant. This function aims to find a new candidate from data set to replace archetype k. .. py:method:: _returnVbarL(l) .. py:method:: _rankArchetypes() This function aims to rank archetypes. To do this, each data point is approximated just using one of the archetypes. Then, we check how good is the approximation by calculating the explained variance. Then, we sort the archetypes based on the explained variance scores. Note that, unlike to PCA, the summation of each individual explained variance scores will not be equal to the calculated explained variance when all archetypes are considered. .. py:method:: plotSimplex(alfa, archIDs=[0, 1, 2], plotArgs={}, gridOn=True, showLabel=True, labelAll=False, figSize=(3, 3), dpi=DPI, gridLineWidth=0.1, color='#303F9F', alpha=0.8, markerSize=20, figNamePrefix='') # groupColor = None, color = None, marker = None, size = None groupColor: Dimension: nData x 1 Description: Contains the category of data point. .. py:method:: parallelPlot(lstFeat, dfColor, featIDs=[0, 1, 2], archIDs=[0, 1, 2], sampIDs=[0, 1, 2], linewidth='0.3', archColor='k', figSize=(15, 5), dpi=DPI, figNamePrefix='') Based on source: http://benalexkeen.com/parallel-coordinates-in-matplotlib/ lstFeat: list of features. dfColor: A dataframe of collection of colors corresponding to each data point. .. py:method:: _extractArchetypeProfiles() This function extracts the profile of each archetype. Each value in each dimension of archetypeProfile shows the portion of data that is covered by that archetype in that specific direction. .. py:method:: plotProfile(allFeatNames=None, featIDs=None, archIDs=[0, 1], figSize=(14, 5), dpi=DPI, figNamePrefix='') This function plots the profile of the archetypes. allFeatNames: Optional input. list of all feature names. featIDs: Optional input. list of names of features of interest. .. py:method:: plotRadarProfile(allFeatNames=None, featIDs=[0], archIDs=[0, 1], fillAlpha=0.2, linewidth=1, ncol=1, sepArchs=False, showLabel=True, labelAll=False, showName=False, closeFig=False, showLegend=True, figSize=(6, 6), dpi=DPI, title=None, figNamePrefix='') .. py:method:: _extractCloseMatch() .. py:method:: plotCloseMatch(archIDs=[0, 1], archSpaceIDs=[0, 1], sepSamps=False, showLabel=True, labelAll=False, showLegend=False, figSize=(6, 6), dpi=DPI, title=None, figNamePrefix='')