iteraa.piaa

Functions

subsetSplit(X, nSubsets, dataName[, ...])

Split data into subsets.

submitAAjobs(nArchetypes, dataName[, splitKeyword, ...])

Submit jobs of executing individual archetypal analysis to a job scheduler.

runAA(fName, nArchetypes[, outputsPicklesPath, ...])

Executes archetypal analysis.

fitPIAA(X, nArchetypes, numSubset, dataName[, ...])

Combining results from individual archetypal analysis runs to obtain final archetypes.

Module Contents

iteraa.piaa.subsetSplit(X, nSubsets, dataName, subsetsSampleIdxs=[], subsetsPicklesPath=SUBSETS_PICKLES_PATH, postfixStr='', shuffle=True, randomState=RANDOM_STATE, verbose=False)[source]

Split data into subsets.

Parameters:
  • X (numpy.ndarray) – Whole data set.

  • numSubset (int) – Number of subsets.

  • subsetsSampleIdxs (list[int]) – Identifiers for subset samples.

  • dataName (str) – Name of dataset.

  • subsetsPicklesPath (str) – Path to directory containing data subset pickle files.

  • postfixStr (str) – Postfix name.

  • shuffle (bool) – Whether to shuffle data.

  • randomState (int) – Random seed.

  • verbose (bool) – Whether information is printed.

Returns:

runTime – Duration of execution.

Return type:

float

iteraa.piaa.submitAAjobs(nArchetypes, dataName, splitKeyword='data', postfixStr='', jobscriptsDirPath=JOBSCRIPTS_DIR_PATH, subsetsPicklesPath=SUBSETS_PICKLES_PATH, outputsPicklesPath=OUTPUTS_PICKLES_PATH, AAscriptPath=AA_SCRIPT_PATH, project='q27', queue='normal', numCPUs=48, wallTime='00:05:00', mem=5, jobFS=1, email='Jonathan.Ting@anu.edu.au', verbose=False)[source]

Submit jobs of executing individual archetypal analysis to a job scheduler.

Parameters:
  • X (numpy.ndarray) – Whole data set.

  • dataName (str) – Name of dataset.

  • splitKeyword (str) – Keyword for data subsets.

  • postfixStr (str) – Postfix name.

  • jobscriptsPicklesPath (str) – Path to directory containing data jobscript files.

  • subsetsPicklesPath (str) – Path to directory containing data subset pickle files.

  • outputsPicklesPath (str) – Path to directory containing data output pickle files.

  • AAscriptPath (str) – Path to script to run archetypal analysis.

  • verbose (bool) – Whether information is printed.

Return type:

None

iteraa.piaa.runAA(fName, nArchetypes, outputsPicklesPath=OUTPUTS_PICKLES_PATH, splitKeyword='data', robust=False, tolerance=0.001, computeXtX=False, stepsFISTA=3, stepsAS=50, randominit=False, numThreads=-1, onlyZ=False)[source]

Executes archetypal analysis.

Parameters:
  • fName (str) – Path to pickle file containing data.

  • nArchetypes (int) – Number of archetypes.

  • outputsPicklesPath (str) – Path to directory containing output pickle files.

  • splitKeyword (str) – Keyword for data subsets.

  • robust (bool) – Whether to use robust archetypal analysis.

  • tolerance (float) – Tolerance.

  • computeXtX (bool) – Whether to compute XtX.

  • stepsFISTA (int) – Number of FISTA steps.

  • stepsAS (int) – Number of active subset steps.

  • randominit (bool) – Whether to initialise randomly.

  • numThreads (int) – Number of threads for algorithm execution.

  • onlyZ (bool) – Whether to stop early by returning only Z matrix.

Return type:

None

iteraa.piaa.fitPIAA(X, nArchetypes, numSubset, dataName, outputsPicklesPath=OUTPUTS_PICKLES_PATH, postfixStr='', shuffle=True, robust=False, onlyZ=False, C=0.0001, tolerance=0.001, computeXtX=False, stepsFISTA=3, stepsAS=50, randominit=False, randomState=RANDOM_STATE, numThreads=-1, splitRunTime=0.0, verbose=True)[source]

Combining results from individual archetypal analysis runs to obtain final archetypes.

Parameters:
  • X (numpy.ndarray) – Whole data set.

  • nArchetypes (int) – Number of archetypes.

  • numSubset (int) – Number of subsets.

  • dataName (str) – Name of dataset.

  • outputsPicklesPath (str) – Path to directory containing output pickle files.

  • postfixStr (str) – Postfix name.

  • shuffle (bool) – Whether to shuffle data.

  • robust (bool) – Whether to use robust archetypal analysis.

  • onlyZ (bool) – Whether to stop early by returning only Z matrix.

  • C (float) –

  • tolerance (float) – Tolerance.

  • computeXtX (bool) – Whether to compute XtX.

  • stepsFISTA (int) – Number of FISTA steps.

  • stepsAS (int) – Number of active subset steps.

  • randominit (bool) – Whether to initialise randomly.

  • randomState (int) – Random seed.

  • numThreads (int) – Number of threads for algorithm execution.

  • splitRuntime (float) – Execution duration for splitting of data into subsets (seconds)

  • verbose (bool) – Whether information is printed.

Returns:

AA – Object with fitted results.

Return type:

ArchetypalAnalysis object