iteraa.iaa
==========

.. py:module:: iteraa.iaa

.. autoapi-nested-parse::

   Created on Sun Jan 27 13:39:27 2019
   Modified on Mon Jan 15 13:15:29 2024 by Jonathan Yik Chang Ting

   @original author: Benyamin Motevalli

   This class is developed based on "Archetypal Analysis" by Adele Cutler and Leo
   Breiman, Technometrics, November 1994, Vol.36, No.4, pp. 338-347


Classes
-------

.. autoapisummary::

   iteraa.iaa.ArchetypalAnalysis


Module Contents
---------------

.. py:class:: ArchetypalAnalysis(nArchetypes=2, iterative=False, nSubsets=10, shuffle=True, onlyZ=False, subsetsSampleIdxs=[], tolerance=0.001, maxIter=200, randomState=RANDOM_STATE, C=0.0001, initialize='furthestSum', redundancyTry=30, robust=False, computeXtX=False, stepsFISTA=3, stepsAS=50, randominit=False, numThreads=-1, verbose=False)

   Parameters:
   -----------

   nArchetypes:   Defines the number of archetypes.
   nDim:          Number of features (Dimensions).
   nData:         Number of data in dataset.

   X:
       Dimension:  nDim x nData

                   Array of data points. It is the transpose of input data.

   archetypes:

       Dimension:  nDim x nArchetypes

                   Array of archetypes. Each columns represents an archetype.

   alfa:
       Dimension:  nArchetypes x nData

                   Each column defines the weight coefficients for each
                   archetype to approximate data point i.
                   Xi = sum([alfa]ik x Zk)

   beta:
       Dimension:  nData x nArchetypes

                   Each column defines the weight coefficients for each data
                   point from which archetype k is constructed.

   tolerance:      Defines when to stop optimization.

   maxIter:       Defines the maximum number of iterations

   randomState:   Defines the random seed number for initialization. No effect if "furthestSum" is selected.

   C:              is a constraint coefficient to ensure that the summation of
                   alfa's and beta's equals to 1. C is conisdered to be inverse
                   of M^2 in the original paper.

   initialize:     Defines the initialization method to guess initial archetypes:

                       1. furthestSum (Default): the idea and code taken from https://github.com/ulfaslak/py_pcha and the original author is: Ulf Aslak Jensen.
                       2. random:  Randomly selects archetypes in the feature space. The points could be any point in space
                       3. randomIdx:  Randomly selects archetypes from points in the dataset.


   .. py:attribute:: nArchetypes
      :value: 2


   .. py:attribute:: nDim
      :value: None


   .. py:attribute:: nData
      :value: None


   .. py:attribute:: iterative
      :value: False


   .. py:attribute:: nSubsets
      :value: 10


   .. py:attribute:: shuffle
      :value: True


   .. py:attribute:: onlyZ
      :value: False


   .. py:attribute:: tolerance
      :value: 0.001


   .. py:attribute:: C
      :value: 0.0001


   .. py:attribute:: randomState
      :value: 42


   .. py:attribute:: archetypes
      :value: []


   .. py:attribute:: alfa
      :value: []


   .. py:attribute:: beta
      :value: []


   .. py:attribute:: explainedVariance_
      :value: []


   .. py:attribute:: RSS_
      :value: None


   .. py:attribute:: RSS0_
      :value: None


   .. py:attribute:: RSSi_
      :value: []


   .. py:attribute:: initialize
      :value: ''


   .. py:attribute:: closeMatch


   .. py:attribute:: maxIter
      :value: 200


   .. py:attribute:: redundancyTry
      :value: 30


   .. py:attribute:: robust
      :value: False


   .. py:attribute:: computeXtX
      :value: False


   .. py:attribute:: stepsFISTA
      :value: 3


   .. py:attribute:: stepsAS
      :value: 50


   .. py:attribute:: randominit
      :value: False


   .. py:attribute:: numThreads
      :value: -1


   .. py:attribute:: subsetsZs
      :value: []


   .. py:attribute:: subsetsSampleIdxs
      :value: []


   .. py:attribute:: runTime
      :value: 0.0


   .. py:attribute:: verbose
      :value: False


   .. py:method:: fit(X)


   .. py:method:: fitClassical(X)


   .. py:method:: fit_transform(X)


   .. py:method:: transform(Xnew)


   .. py:method:: _initializeArchetypes()


   .. py:method:: _randomInitialize()


   .. py:method:: _furthestSumInitialize()


   .. py:method:: _randomIdxInitialize()


   .. py:method:: __optimizeAlfaForTransform(Xnew, nData)

      This functions aims to obtain corresponding alfa values for a new data
      point after the fitting is done and archetypes are determined.

      Having alfas, we can approximate the new data-points in terms of
      archetypes.

      NOTE: Xnew dimension is nData x nDim. Here, the original data is passed
      instead of transpose.


   .. py:method:: _optimizeAlfa()

      self.archetypes: has a shape of nDim x nArchetypes
      self.alfai:      has a shape of nArchetypes x 1.
      xi:              has a shape of nDim x 1.

      The problem to minimize is:

          xi = self.archetypes x self.alfai


   .. py:method:: _optimizeBeta()


   .. py:method:: _findNewArchetype(k)

      In some circumstance, summation of alfa's for an archetype k becomes zero.
      That means archetype k is redundant. This function aims to find a new candidate
      from data set to replace archetype k.


   .. py:method:: _returnVbarL(l)


   .. py:method:: _rankArchetypes()

      This function aims to rank archetypes. To do this, each data point is
      approximated just using one of the archetypes. Then, we check how good
      is the approximation by calculating the explained variance. Then, we
      sort the archetypes based on the explained variance scores. Note that,
      unlike to PCA, the summation of each individual explained variance
      scores will not be equal to the calculated explained variance when all
      archetypes are considered.


   .. py:method:: plotSimplex(alfa, archIDs=[0, 1, 2], plotArgs={}, gridOn=True, showLabel=True, labelAll=False, figSize=(3, 3), dpi=DPI, gridLineWidth=0.1, color='#303F9F', alpha=0.8, markerSize=20, figNamePrefix='')

      # groupColor = None, color = None, marker = None, size = None
      groupColor:

          Dimension:      nData x 1

          Description:    Contains the category of data point.


   .. py:method:: parallelPlot(lstFeat, dfColor, featIDs=[0, 1, 2], archIDs=[0, 1, 2], sampIDs=[0, 1, 2], linewidth='0.3', archColor='k', figSize=(15, 5), dpi=DPI, figNamePrefix='')

      Based on source: http://benalexkeen.com/parallel-coordinates-in-matplotlib/

      lstFeat:
                  list of features.

      dfColor:
                  A dataframe of collection of colors corresponding to each
                  data point.


   .. py:method:: _extractArchetypeProfiles()

      This function extracts the profile of each archetype. Each value in
      each dimension of archetypeProfile shows the portion of data that
      is covered by that archetype in that specific direction.


   .. py:method:: plotProfile(allFeatNames=None, featIDs=None, archIDs=[0, 1], figSize=(14, 5), dpi=DPI, figNamePrefix='')

      This function plots the profile of the archetypes.

      allFeatNames:
          Optional input. list of all feature names.
      featIDs:
          Optional input. list of names of features of interest.


   .. py:method:: plotRadarProfile(allFeatNames=None, featIDs=[0], archIDs=[0, 1], fillAlpha=0.2, linewidth=1, ncol=1, sepArchs=False, showLabel=True, labelAll=False, showName=False, closeFig=False, showLegend=True, figSize=(6, 6), dpi=DPI, title=None, figNamePrefix='')


   .. py:method:: _extractCloseMatch()


   .. py:method:: plotCloseMatch(archIDs=[0, 1], archSpaceIDs=[0, 1], sepSamps=False, showLabel=True, labelAll=False, showLegend=False, figSize=(6, 6), dpi=DPI, title=None, figNamePrefix='')