iteraa.piaa
Functions
|
Split data into subsets. |
|
Submit jobs of executing individual archetypal analysis to a job scheduler. |
|
Executes archetypal analysis. |
|
Combining results from individual archetypal analysis runs to obtain final archetypes. |
Module Contents
- iteraa.piaa.subsetSplit(X, nSubsets, dataName, subsetsSampleIdxs=[], subsetsPicklesPath=SUBSETS_PICKLES_PATH, postfixStr='', shuffle=True, randomState=RANDOM_STATE, verbose=False)[source]
Split data into subsets.
- Parameters:
X (numpy.ndarray) – Whole data set.
numSubset (int) – Number of subsets.
subsetsSampleIdxs (list[int]) – Identifiers for subset samples.
dataName (str) – Name of dataset.
subsetsPicklesPath (str) – Path to directory containing data subset pickle files.
postfixStr (str) – Postfix name.
shuffle (bool) – Whether to shuffle data.
randomState (int) – Random seed.
verbose (bool) – Whether information is printed.
- Returns:
runTime – Duration of execution.
- Return type:
float
- iteraa.piaa.submitAAjobs(nArchetypes, dataName, splitKeyword='data', postfixStr='', jobscriptsDirPath=JOBSCRIPTS_DIR_PATH, subsetsPicklesPath=SUBSETS_PICKLES_PATH, outputsPicklesPath=OUTPUTS_PICKLES_PATH, AAscriptPath=AA_SCRIPT_PATH, project='q27', queue='normal', numCPUs=48, wallTime='00:05:00', mem=5, jobFS=1, email='Jonathan.Ting@anu.edu.au', verbose=False)[source]
Submit jobs of executing individual archetypal analysis to a job scheduler.
- Parameters:
X (numpy.ndarray) – Whole data set.
dataName (str) – Name of dataset.
splitKeyword (str) – Keyword for data subsets.
postfixStr (str) – Postfix name.
jobscriptsPicklesPath (str) – Path to directory containing data jobscript files.
subsetsPicklesPath (str) – Path to directory containing data subset pickle files.
outputsPicklesPath (str) – Path to directory containing data output pickle files.
AAscriptPath (str) – Path to script to run archetypal analysis.
verbose (bool) – Whether information is printed.
- Return type:
None
- iteraa.piaa.runAA(fName, nArchetypes, outputsPicklesPath=OUTPUTS_PICKLES_PATH, splitKeyword='data', robust=False, tolerance=0.001, computeXtX=False, stepsFISTA=3, stepsAS=50, randominit=False, numThreads=-1, onlyZ=False)[source]
Executes archetypal analysis.
- Parameters:
fName (str) – Path to pickle file containing data.
nArchetypes (int) – Number of archetypes.
outputsPicklesPath (str) – Path to directory containing output pickle files.
splitKeyword (str) – Keyword for data subsets.
robust (bool) – Whether to use robust archetypal analysis.
tolerance (float) – Tolerance.
computeXtX (bool) – Whether to compute XtX.
stepsFISTA (int) – Number of FISTA steps.
stepsAS (int) – Number of active subset steps.
randominit (bool) – Whether to initialise randomly.
numThreads (int) – Number of threads for algorithm execution.
onlyZ (bool) – Whether to stop early by returning only Z matrix.
- Return type:
None
- iteraa.piaa.fitPIAA(X, nArchetypes, numSubset, dataName, outputsPicklesPath=OUTPUTS_PICKLES_PATH, postfixStr='', shuffle=True, robust=False, onlyZ=False, C=0.0001, tolerance=0.001, computeXtX=False, stepsFISTA=3, stepsAS=50, randominit=False, randomState=RANDOM_STATE, numThreads=-1, splitRunTime=0.0, verbose=True)[source]
Combining results from individual archetypal analysis runs to obtain final archetypes.
- Parameters:
X (numpy.ndarray) – Whole data set.
nArchetypes (int) – Number of archetypes.
numSubset (int) – Number of subsets.
dataName (str) – Name of dataset.
outputsPicklesPath (str) – Path to directory containing output pickle files.
postfixStr (str) – Postfix name.
shuffle (bool) – Whether to shuffle data.
robust (bool) – Whether to use robust archetypal analysis.
onlyZ (bool) – Whether to stop early by returning only Z matrix.
C (float) –
tolerance (float) – Tolerance.
computeXtX (bool) – Whether to compute XtX.
stepsFISTA (int) – Number of FISTA steps.
stepsAS (int) – Number of active subset steps.
randominit (bool) – Whether to initialise randomly.
randomState (int) – Random seed.
numThreads (int) – Number of threads for algorithm execution.
splitRuntime (float) – Execution duration for splitting of data into subsets (seconds)
verbose (bool) – Whether information is printed.
- Returns:
AA – Object with fitted results.
- Return type:
ArchetypalAnalysis object