OnlineCA.jl (Julia API)

Correspondence Analysis (CA)

OnlineCA.caFunction
ca(;input, outdir, dim, noversamples, niter, chunksize)

Out-of-core Correspondence Analysis (CA) for dense data.

Uses Halko-style randomized SVD on the implicit standardized residual matrix S = Dr^{-1/2} (P - rp cp') Dc^{-1/2}, where P = X/n.

The matrix S is never explicitly formed. Instead, matrix-vector products Sv and S'u are computed by streaming through the data.

Input Arguments

  • input : Julia Binary file generated by OnlinePCA.csv2bin function.
  • outdir : The directory user want to save the result.
  • dim : The number of dimensions of CA.
  • noversamples : The number of over-sampling for randomized SVD.
  • niter : The number of power iterations for randomized SVD.
  • chunksize : The number of rows reading at once (0 = all rows at once).

Output Arguments

  • F : Row coordinates (N × dim)
  • G : Column coordinates (M × dim)
  • σ : Singular values (dim,)
  • Inertia : Explained inertia by the dimensions (dim,)
  • TotalInertia : Total inertia (scalar)
source

Sparse Correspondence Analysis (SparseCA)

OnlineCA.sparse_caFunction
sparse_ca(;input, outdir, dim, noversamples, niter, chunksize)

Out-of-core Correspondence Analysis (CA) for sparse data (MatrixMarket format).

Input Arguments

  • input : Julia Binary file generated by OnlinePCA.mm2bin function.
  • outdir : The directory user want to save the result.
  • dim : The number of dimensions of CA.
  • noversamples : The number of over-sampling for randomized SVD.
  • niter : The number of power iterations for randomized SVD.
  • chunksize : The number of rows reading at once (0 = all rows at once).

Output Arguments

  • F : Row coordinates (N × dim)
  • G : Column coordinates (M × dim)
  • σ : Singular values (dim,)
  • Inertia : Explained inertia by the dimensions (dim,)
  • TotalInertia : Total inertia (scalar)
source

BinCOO Correspondence Analysis (BinCOOCA)

OnlineCA.bincoo_caFunction
bincoo_ca(;input, outdir, dim, noversamples, niter, chunksize)

Out-of-core Correspondence Analysis (CA) for binary COO data.

Input Arguments

  • input : Julia Binary file generated by OnlineNMF.bincoo2bin function.
  • outdir : The directory user want to save the result.
  • dim : The number of dimensions of CA.
  • noversamples : The number of over-sampling for randomized SVD.
  • niter : The number of power iterations for randomized SVD.
  • chunksize : The number of rows reading at once (0 = all rows at once).

Output Arguments

  • F : Row coordinates (N × dim)
  • G : Column coordinates (M × dim)
  • σ : Singular values (dim,)
  • Inertia : Explained inertia by the dimensions (dim,)
  • TotalInertia : Total inertia (scalar)
source

Multiple Correspondence Analysis (MCA)

OnlineCA.mcaFunction
mca(table::AbstractMatrix{<:Integer};
    dim=3, correction=:none, var_names=nothing,
    noversamples=5, niter=3, chunksize=0,
    rng=default_rng(), seed=nothing, T=Float64,
    outdir=nothing, keep_indicator_file::Bool=false)

In-memory MCA. table is an N × q matrix where each entry is the positive integer category code for variable j of observation i. Internally the function

  1. materializes the indicator matrix to a temporary Bincoo + Zstandard file on disk,
  2. runs bincoo_ca on it,
  3. wraps the result with MCA metadata (variables, categories, var_of_category, q, correction) and the optionally corrected eigenvalues (inertia_adjusted, total_inertia_adjusted).

The temporary file is removed on return unless keep_indicator_file is set to true (in which case the path is included in the result for reuse / inspection).

source
OnlineCA.categorical_to_bincooFunction
categorical_to_bincoo(table, bincoofile) -> meta

Write the indicator matrix of table to bincoofile in BinCOO text format (one i j line per nonzero). table is N × q; entries are positive integer category codes. Returns a NamedTuple of metadata used by mca to label categories.

source

Supplementary projection

OnlineCA.project_rowsFunction
project_rows(result, X_new)

Project supplementary rows into the CA space spanned by result. Each row of X_new is one supplementary point with the same column count as the training data. Returns an n_sup × dim matrix of principal row coordinates.

source
OnlineCA.project_columnsFunction
project_columns(result, Y_new)

Project supplementary columns. Each column of Y_new is one supplementary point with the same row count as the training data. Returns an n_sup × dim matrix of principal column coordinates.

source