Overview
A common challenge in integrative analysis of high-dimensional data sets is subsequent biological interpretation. Anansi addresses this challenge by only considering pairwise associations that are known to occur a priori. In order to achieve this, we need to provide this relational information to anansi in the form of a (bi)adjacency matrix.
This vignette is divided by three sections:
Getting started: The full model
After determining which pairs of features should be investigated,
anansi
fits several linear models for which it returns
summary statistics. The user has control over these models through the
formula
argument in anansi().
Setup
We will use the same example as in the
vignette on adjacency matrices. There, we also explain for more
about adjacency matrices, AnansiWeb
class and using
weaveWeb()
. Please note that the data used in this example
have been generated with statistical associations spiked in for this
vignette.
# Setup AnansiWeb with some dummy data
library(anansi)
web <- krebsDemoWeb()
Formula syntax
For each of the feature-pairs that should be assessed, anansi fits similarly structured linear models. Anansi supports complex linear models as well as longitudinal models using the R formula syntax. The most basic model in anansi is:
## y ~ x
where y
refers to is the feature from
tableY
and x
is the corresponding feature in
tableX
.
Besides estimating the asociation between y
and
x
, anansi also allows for the inclusion of covariates that
could influence this association. With full model, we
mean the total influence of x
, including all of its
interaction terms, on y
. For example, if our input formula
was:
## y ~ x * (group_ab + score_a)
R rewrites this as follows:
## y ~ x + group_ab + score_a + x:group_ab + x:score_a
The variables that constitute the ‘full’ effect of x would be:
x + x:group_ab + x:score_a
.
Formula in anansi()
When providing a formula argument in the main anansi()
function, the y
and x
variables should not be
mentioned, they are already implied. Rather, only additional covariates
that could interact with x
should be provided in the
formula
argument.
out <- anansi(web, formula = ~group_ab + score_a, groups = "group_ab")
## Fitting least-squares for following model:
## ~ x + group_ab + score_a + x:group_ab + x:score_a
## Running correlations for the following groups:
## a, b
Note that anansi()
will tell us how the rewritten model
looks if we don’t set verbose
to FALSE.
The full model simply estimates the total association between
y
and x
, making interpretation
straightforward. We can imagine these associations to be positive or
negative.

Figure 1. Both citrate and cis-aconitate ~ aconitase dehydrogenase, sequentially.
Differential association
In order to assess differences in associations based on one or more variables (such as phenotype or treatment), we make use of the emergent and disjointed association paradigm introduced in the context of proportionality and apply it outside of the simplex.
Disjointed associations
Disjointed associations describe the situation where there is a
detectable association in all cases, but the quality of that
association, the slope, changes in a manner explained
by a term. We estimate this as interaction terms with x
. In
our example we have two such terms, namely x:group_ab
and
x:score_a
.
Disjointed associations: Categorical variables
The figure below shows how such a disjointed association might look. We can imagine some sort of a pharmacological treatment that alters enzymatic conversion rates to the point where association between the enzyme and the metabolite flips.

isocitrate ~ isocitrate dehydrogenase by category (treatment).
Disjointed associations: Continuous variables
Similarly, we could imagine that this flip is not binary or otherwise step-wise, but rather the result of a continuous variable.

ketoglutarate ~ ketoglutarate dehydrogenase stratified by a third continuous variable.
Equivalent stats::lm()
model
We can use mapply()
on an AnansiWeb
object
to run a function on the observations for each pair of features. We can
use this to confirm the statistical output of anansi()
against the stats
package:
score_lm <- mapply(web, FUN = function(x, y) {
anova(lm(y ~ x * (group_ab + score_a), data = metadata(web)))$`F value`[5L]
})
group_lm <- mapply(web, FUN = function(x, y) {
anova(lm(y ~ x * (score_a + group_ab), data = metadata(web)))$`F value`[5L]
})
all.equal(score_lm, out$disjointed_score_a_f.values)
## [1] "names for target but not for current"
all.equal(group_lm, out$disjointed_group_ab_f.values)
## [1] "names for target but not for current"
We run two separate mapply()
calls for the two
interaction terms. Because of the way anova()
calculates
sum-of-squares (type-I), the order of variables matters for
anova()
.
Emergent association
Emergent associations describe the situation where the
strength of an association is explained by a term
interacting with x. We again estimate this as interaction terms with
x
. In our example we have two such terms, namely
x:group_ab
and x:score_a
.
Emergent associations: Categorical variables
The figure below shows how such a emergent association might look. We can imagine some sort of a pharmacological treatment that can completely shut down a biochemical pathway. As a result, the members of that pathway, which would show a consistently strong association under physiological conditions, could go out of sync.

succinyl-CoA ~ succinyl-CoA synthetase by categorical variable (Treatment).
Emergent associations: Continuous variables
Similarly, we could imagine situations where this loss in association strength happens gradually, rather than in a binary manner. For example, perhaps there is a strong association between a metabolite and an enzyme that metabolizes it under physiological conditions, but this association weakens along with some health index or environmental variable.

succinate ~ succinate dehydrogenase stratified by a third continuous variable.
Repeated measures
Random slopes through Error()
Similar to stats::aov()
, you can wrap a variable in
Error()
for anansi()
to treat it as a stratum,
allowing for random intercepts. This is useful when dealing with
multiple measurements per subject, commonly due to a longitudinal
design.
outErr <- anansi(web, formula = ~group_ab + score_a + Error(sample_id), groups = "group_ab")
## Fitting least-squares for following model:
## ~ x + group_ab + score_a + x:group_ab + x:score_a
## with 'sample_id' as random intercept.
## Running correlations for the following groups:
## a, b
Note that anansi()
acknowledges the random
intercept.
compare to aov( Error() )
.
We can use aov()
to see how Error()
works.
R
does not use orthogonal contrasts and uses type-I
sum-of-squares calculations by default. In other words, the order of
covariates matters. A variable wrapped in Error()
gets
moved to the front of the line, so the rest of the model could be
understood as conditioned on the Error()
-wrapped term.
Here is the model with Error()
##
## Error: sample_id
## Df Sum Sq Mean Sq F value Pr(>F)
## fumarase 1 1.40 1.397 1.335 0.251
## fumarase:repeated 3 4.73 1.576 1.506 0.218
## Residuals 95 99.42 1.046
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## fumarase 1 1.17 1.1742 1.404 0.237
## repeated 3 0.79 0.2623 0.314 0.816
## fumarase:repeated 3 2.38 0.7938 0.949 0.417
## Residuals 293 245.07 0.8364
And the equivalent model with sample_id
moved to the
front instead:.
## Df Sum Sq Mean Sq F value Pr(>F)
## sample_id 99 105.54 1.0661 1.275 0.0631 .
## fumarase 1 1.17 1.1742 1.404 0.2370
## repeated 3 0.79 0.2623 0.314 0.8155
## fumarase:repeated 3 2.38 0.7938 0.949 0.4172
## Residuals 293 245.07 0.8364
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# See ?stats::aov for more.
For every term except for sample_id
, the results are
equivalent.
Session info
## R version 4.5.0 (2025-04-11)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.5.2 patchwork_1.3.0 anansi_0.7.0 BiocStyle_2.37.0
##
## loaded via a namespace (and not attached):
## [1] SummarizedExperiment_1.39.0 gtable_0.3.6
## [3] xfun_0.52 bslib_0.9.0
## [5] htmlwidgets_1.6.4 Biobase_2.69.0
## [7] lattice_0.22-7 vctrs_0.6.5
## [9] tools_4.5.0 generics_0.1.3
## [11] stats4_4.5.0 tibble_3.2.1
## [13] pkgconfig_2.0.3 BiocBaseUtils_1.11.0
## [15] Matrix_1.7-3 desc_1.4.3
## [17] S4Vectors_0.47.0 lifecycle_1.0.4
## [19] GenomeInfoDbData_1.2.14 farver_2.1.2
## [21] compiler_4.5.0 textshaping_1.0.0
## [23] munsell_0.5.1 ggforce_0.4.2
## [25] GenomeInfoDb_1.45.0 htmltools_0.5.8.1
## [27] sass_0.4.10 yaml_2.3.10
## [29] pkgdown_2.1.1 pillar_1.10.2
## [31] crayon_1.5.3 jquerylib_0.1.4
## [33] MASS_7.3-65 DelayedArray_0.35.1
## [35] cachem_1.1.0 abind_1.4-8
## [37] tidyselect_1.2.1 digest_0.6.37
## [39] dplyr_1.1.4 bookdown_0.43
## [41] labeling_0.4.3 forcats_1.0.0
## [43] polyclip_1.10-7 fastmap_1.2.0
## [45] grid_4.5.0 colorspace_2.1-1
## [47] cli_3.6.4 SparseArray_1.9.0
## [49] magrittr_2.0.3 S4Arrays_1.9.0
## [51] withr_3.0.2 UCSC.utils_1.5.0
## [53] scales_1.3.0 rmarkdown_2.29
## [55] XVector_0.49.0 httr_1.4.7
## [57] matrixStats_1.5.0 igraph_2.1.4
## [59] ragg_1.4.0 evaluate_1.0.3
## [61] knitr_1.50 GenomicRanges_1.61.0
## [63] IRanges_2.43.0 MultiAssayExperiment_1.35.0
## [65] rlang_1.1.6 Rcpp_1.0.14
## [67] glue_1.8.0 tweenr_2.0.3
## [69] BiocManager_1.30.25 formatR_1.14
## [71] BiocGenerics_0.55.0 jsonlite_2.0.0
## [73] R6_2.6.1 MatrixGenerics_1.21.0
## [75] systemfonts_1.2.2 fs_1.6.6