OnlineCA.jl (Julia API)
Correspondence Analysis (CA)
OnlineCA.ca — Function
ca(;input, outdir, dim, noversamples, niter, chunksize)Out-of-core Correspondence Analysis (CA) for dense data.
Uses Halko-style randomized SVD on the implicit standardized residual matrix S = Dr^{-1/2} (P - rp cp') Dc^{-1/2}, where P = X/n.
The matrix S is never explicitly formed. Instead, matrix-vector products Sv and S'u are computed by streaming through the data.
Input Arguments
input: Julia Binary file generated byOnlinePCA.csv2binfunction.outdir: The directory user want to save the result.dim: The number of dimensions of CA.noversamples: The number of over-sampling for randomized SVD.niter: The number of power iterations for randomized SVD.chunksize: The number of rows reading at once (0 = all rows at once).
Output Arguments
F: Row coordinates (N × dim)G: Column coordinates (M × dim)σ: Singular values (dim,)Inertia: Explained inertia by the dimensions (dim,)TotalInertia: Total inertia (scalar)
Sparse Correspondence Analysis (SparseCA)
OnlineCA.sparse_ca — Function
sparse_ca(;input, outdir, dim, noversamples, niter, chunksize)Out-of-core Correspondence Analysis (CA) for sparse data (MatrixMarket format).
Input Arguments
input: Julia Binary file generated byOnlinePCA.mm2binfunction.outdir: The directory user want to save the result.dim: The number of dimensions of CA.noversamples: The number of over-sampling for randomized SVD.niter: The number of power iterations for randomized SVD.chunksize: The number of rows reading at once (0 = all rows at once).
Output Arguments
F: Row coordinates (N × dim)G: Column coordinates (M × dim)σ: Singular values (dim,)Inertia: Explained inertia by the dimensions (dim,)TotalInertia: Total inertia (scalar)
BinCOO Correspondence Analysis (BinCOOCA)
OnlineCA.bincoo_ca — Function
bincoo_ca(;input, outdir, dim, noversamples, niter, chunksize)Out-of-core Correspondence Analysis (CA) for binary COO data.
Input Arguments
input: Julia Binary file generated byOnlineNMF.bincoo2binfunction.outdir: The directory user want to save the result.dim: The number of dimensions of CA.noversamples: The number of over-sampling for randomized SVD.niter: The number of power iterations for randomized SVD.chunksize: The number of rows reading at once (0 = all rows at once).
Output Arguments
F: Row coordinates (N × dim)G: Column coordinates (M × dim)σ: Singular values (dim,)Inertia: Explained inertia by the dimensions (dim,)TotalInertia: Total inertia (scalar)
Multiple Correspondence Analysis (MCA)
OnlineCA.mca — Function
mca(table::AbstractMatrix{<:Integer};
dim=3, correction=:none, var_names=nothing,
noversamples=5, niter=3, chunksize=0,
rng=default_rng(), seed=nothing, T=Float64,
outdir=nothing, keep_indicator_file::Bool=false)In-memory MCA. table is an N × q matrix where each entry is the positive integer category code for variable j of observation i. Internally the function
- materializes the indicator matrix to a temporary Bincoo + Zstandard file on disk,
- runs
bincoo_caon it, - wraps the result with MCA metadata (
variables,categories,var_of_category,q,correction) and the optionally corrected eigenvalues (inertia_adjusted,total_inertia_adjusted).
The temporary file is removed on return unless keep_indicator_file is set to true (in which case the path is included in the result for reuse / inspection).
OnlineCA.categorical_to_bincoo — Function
categorical_to_bincoo(table, bincoofile) -> metaWrite the indicator matrix of table to bincoofile in BinCOO text format (one i j line per nonzero). table is N × q; entries are positive integer category codes. Returns a NamedTuple of metadata used by mca to label categories.
Supplementary projection
OnlineCA.project_rows — Function
project_rows(result, X_new)Project supplementary rows into the CA space spanned by result. Each row of X_new is one supplementary point with the same column count as the training data. Returns an n_sup × dim matrix of principal row coordinates.
OnlineCA.project_columns — Function
project_columns(result, Y_new)Project supplementary columns. Each column of Y_new is one supplementary point with the same row count as the training data. Returns an n_sup × dim matrix of principal column coordinates.