Introduction
The anansi
package computes and compares the association
between the features of two ’omics datasets that are known to interact
based on a database such as KEGG. Studies including both functional
microbiome and metabolomics data are becoming more common. Often, it
would be helpful to integrate both datasets in order to see if they
corroborate each others patterns. However, all-vs-all association
analyses are imprecise and likely to yield spurious associations. This
package takes a knowledge-based approach to constrain association search
space, only considering metabolite-function interactions that have been
recorded in a pathway database. In addition, it provides a framework to
assess differential associations.
While anansi
is geared towards metabolite-function
interactions in the context of host-microbe interactions, it is
perfectly capable of handling any other pair of data sets where some
features interact canonically.
Installation
Install the development version from GitHub with
remotes
:
install.packages("remotes")
remotes::install_github("thomazbastiaanssen/anansi")
Data preparation
The main anansi
function expects data in the
AnansiWeb
class format, which contains three tables: The
first two tables, tableY
and tableX
are the
feature tables of your two data modalities (e.g., metabolites, genes,
microbes, …). Both tables should have columns as features and rows as
samples.
The third table should be a binary adjacency matrix with the column
names of tableY
as rows and the column names of
tableX
as columns. An example of such an adjacency matrix
comes with the anansi
package and is referred to as a
dictionary (because it is used to look up which metabolites interact
with which functions).
Though most of the examples use metabolites and functions,
anansi
is able to handle any type of big/’omics data, as
long as there is a dictionary available. Because of this, anansi uses
the type-naive nomenclature tableY
and tableX
.
The Y and X refer to the position these measurements will have in the
linear modeling framework:
\[ Y \sim X \times{covariates} \]
A note on functional microbiome data
Two common questions in the host-microbiome field are “Who’s there?”
and “What are they doing?”. Techniques like 16S rRNA sequencing and
shotgun metagenomics sequencing are most commonly used to answer the
first question. The second question can be a bit more tricky, as it
often requires functional inference software to address them. For 16S
sequencing, algorithms like PICRUSt2 and Piphillin can be used to infer
function. For shotgun metagenomics, HUMANn3 in the bioBakery suite can
be used. All of these algorithms can produce functional count data in
terms of KEGG Orthologues (KOs) organised as tables, which can be
directly passed to anansi
as follows:
Weave a web🕸️
The weaveWeb()
function can be used to parse the tables
t1
and t2
that we prepared above into an
AnansiWeb
object. The AnansiWeb
format is a
necessary input file for the main anansi
workflow. It
allows anansi
to keep track of which features from the two
input data sets should be considered as pairs.
anansi
provides a pre-built map between ko, cpd and ec
feature annotations from the KEGG database, but users can also provide
their own custom maps. See
vignette on adjacency matrices.
# Create AnansiWeb object from tables
web <- weaveWeb(cpd ~ ko, tableY = t1, tableX = t2, link = kegg_link())
## Dropped features in tableX: 2476 remain.
Run anansi🕷️
The main workspider is called anansi
just like the
package. Generally, three main arguments should be specified:
-
web
: anAnansiWeb
object, such as the one we generated in the above step. -
formula
: a linking formula. For instance, to assess differential associations between treatments, we use the formula~ Legend
. -
metadata
: adata.frame
with additional information about the data set. Its rows correspond to samples and its columns to variables that can be used informula
as c.
# Run anansi on AnansiWeb object
anansi_out <- anansi(web = web, formula = ~Legend, metadata = FMT_metadata, adjust.method = "BH",
verbose = TRUE)
## Fitting least-squares for following model:
## ~ x + Legend + x:Legend
## Running correlations for the following groups:
## Aged yFMT, Aged oFMT, Young yFMT
Reporting output📝
By default, anansi
outputs a table in the
wide
format. For general reporting, we recommend sticking
to the more legible table
format by specifying the
return.format
argument. Let’s take a look at the structure
of the output table:
## feature_Y feature_X All_r.values All_t.values All_p.values
## 1 C00186 K00016 -0.08985529 0.5260699 0.60225456
## 2 C00186 K00101 0.28661046 1.7443940 0.09012569
## 3 C00041 K00259 -0.21708011 1.2967052 0.20346437
## 4 C00064 K00265 -0.25811198 1.5578255 0.12853530
## 5 C00064 K00266 -0.05235448 0.3056957 0.76170027
# Let's take a look at the column names
colnames(anansi_out)
## [1] "feature_Y" "feature_X"
## [3] "All_r.values" "All_t.values"
## [5] "All_p.values" "Aged yFMT_r.values"
## [7] "Aged yFMT_t.values" "Aged yFMT_p.values"
## [9] "Aged oFMT_r.values" "Aged oFMT_t.values"
## [11] "Aged oFMT_p.values" "Young yFMT_r.values"
## [13] "Young yFMT_t.values" "Young yFMT_p.values"
## [15] "full_r.squared" "full_f.values"
## [17] "full_p.values" "disjointed_Legend_r.squared"
## [19] "disjointed_Legend_f.values" "disjointed_Legend_p.values"
## [21] "emergent_Legend_r.squared" "emergent_Legend_f.values"
## [23] "emergent_Legend_p.values" "All_q.values"
## [25] "Aged yFMT_q.values" "Aged oFMT_q.values"
## [27] "Young yFMT_q.values" "full_q.values"
## [29] "disjointed_Legend_q.values" "emergent_Legend_q.values"
Output statistics
We can see that the output of anansi()
is structured so
that there is one feature pair per row, marked by the first two columns.
For each of those feature pairs, the table contains outcome parameters
from several models. In order, we have pooled correlations, group-wise
correlations, full model, disjointed and emergent associations. For each
of these models we report four parameters. For correlations we provide
the Pearson’s coefficient (\(\rho\))
and the \(t\)-statistic, whereas we
provide their squared counterparts for the linear regression models:
\(R^2\) and the \(F\) statistic. For all models, we
additionally provide p-values and adjusted p-values. For more on this,
see
the vignette on differential associations.
Association | Method | Statistics | Significance |
---|---|---|---|
Pooled | Correlation | \(\rho\), \(t\) | p, adjusted p |
Group-wise | Correlation | \(\rho\), \(t\) | p, adjusted p |
Full | Linear regression | \(R^2\), \(F\) | p, adjusted p |
Disjointed | Linear regression | \(R^2\), \(F\) | p, adjusted p |
Emergent | Linear regression | \(R^2\), \(F\) | p, adjusted p |
Plot the results
The long format can be helpful to plug the data into
ggplot2
. Here, we recreate the style of the figure from the
FMT Aging study.
# Use tidyr to wrangle the correlation r-values to a single column
library(tidyr)
anansiLong <- anansi_out |>
pivot_longer(starts_with("All") | contains("FMT")) |>
separate_wider_delim(
name,
delim = "_", names = c("cor_group", "param")
) |>
pivot_wider(names_from = param, values_from = value)
# Now it's ready to be plugged into ggplot2, though let's clean up a bit more.
# Only consider interactions where the entire model fits well enough.
anansiLong <- anansiLong[anansiLong$full_q.values < 0.2, ]
ggplot(anansiLong) +
# Define aesthetics
aes(
x = r.values,
y = feature_X,
fill = cor_group,
alpha = disjointed_Legend_p.values < 0.05
) +
# Make a vertical dashed red line at x = 0
geom_vline(xintercept = 0, linetype = "dashed", colour = "red") +
# Points show raw correlation coefficients
geom_point(shape = 21, size = 3) +
# facet per compound
ggforce::facet_col(~feature_Y, space = "free", scales = "free_y") +
# fix the scales, labels, theme and other layout
scale_y_discrete(limits = rev, position = "right") +
scale_alpha_manual(
values = c("TRUE" = 1, "FALSE" = 1 / 3),
"Disjointed association\np < 0.05"
) +
scale_fill_manual(
values = c(
"Young yFMT" = "#2166ac",
"Aged oFMT" = "#b2182b",
"Aged yFMT" = "#ef8a62",
"All" = "gray"
),
breaks = c("Young yFMT", "Aged oFMT", "Aged yFMT", "All"),
name = "Treatment"
) +
theme_bw() +
ylab("") +
xlab("Pearson's \u03c1")
Plotting multiple feature pairs at once with patchwork
We can also also use the AnansiWeb
object to pull up
specific pairs of features. The patchwork
provides a wonderful interface to combine plots. We can use
getFeaturePairs()
to extract the relevant data from our
AnansiWeb
object in a convenient list format:
library(patchwork)
# Get a list containing data.frames for each feature pair.
feature_pairs <- getFeaturePairs(web)
# For each, prepare a ggplot object with those features on x & y axes.
plot_list <- lapply(
feature_pairs,
FUN = function(pair_i) {
ggplot(pair_i) +
aes(
y = .data[[colnames(pair_i)[1L]]],
x = .data[[colnames(pair_i)[2L]]]
) +
# We'll use facet_grid for labels instead of labs().
facet_grid(colnames(pair_i)[1L] ~ colnames(pair_i)[2L])
}
)
# Feed the first six of our ggplots to patchwork, using wrap_plots()
wrap_plots(plot_list[seq_len(6L)], guides = "collect") &
# Continue as with an ordinary ggplot, but use & instead of +
geom_point(
shape = 21, show.legend = FALSE,
aes(fill = FMT_metadata$Legend)
) &
geom_smooth(
method = "lm", se = FALSE,
aes(colour = FMT_metadata$Legend),
show.legend = FALSE
) &
# Appearance
scale_fill_manual(
aesthetics = c("colour", "fill"),
values = c(
"Young yFMT" = "#2166ac",
"Aged oFMT" = "#b2182b",
"Aged yFMT" = "#ef8a62",
"All" = "gray"
)
) &
theme_bw() &
labs(x = NULL, y = NULL)
These types of visualizations of feature pairs work best when the amount
of feature pairs to be plotted is modest.
Bioconductor pipeline
anansi
also supports the Bioconductor class
MultiAssayExperiment
(MAE) as an alternative to
AnansiWeb
. The latter can be converted to a MAE with the
function asMAE
.
library(MultiAssayExperiment)
# Store sample metadata into metadata slot of web object
metadata(web) <- data.frame(FMT_metadata, row.names = FMT_metadata$Sample_ID)
# Convert web to mae
mae <- asMAE(web)
The MAE object can be passed to the weaveWeb
function,
which returns the same output, ready for the anansi
function.
web2 <- weaveWeb(mae, tableY = "cpd", tableX = "ko", link = kegg_link())
out <- anansi(web = web2, formula = ~Legend, adjust.method = "BH", verbose = TRUE)
## Fitting least-squares for following model:
## ~ x + Legend + x:Legend
## Running correlations for the following groups:
## Aged yFMT, Aged oFMT, Young yFMT
Finally, the output of anansi
can be visualised with
plotAnansi
by specifying the type of association to
use.
# Filter associations by q value
out <- out[out$full_q.values < 0.2, ]
# Visualize disjointed associations
plotAnansi(out, association.type = "disjointed", model.var = "Legend", signif.threshold = 0.05,
fill_by = "group")
Session info
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] MultiAssayExperiment_1.35.3 SummarizedExperiment_1.39.1
## [3] Biobase_2.69.0 GenomicRanges_1.61.1
## [5] Seqinfo_0.99.1 IRanges_2.43.0
## [7] S4Vectors_0.47.0 BiocGenerics_0.55.0
## [9] generics_0.1.4 MatrixGenerics_1.21.0
## [11] matrixStats_1.5.0 patchwork_1.3.1
## [13] tidyr_1.3.1 ggforce_0.5.0
## [15] ggplot2_3.5.2 anansi_0.7.0
## [17] BiocStyle_2.37.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 viridisLite_0.4.2
## [3] dplyr_1.1.4 farver_2.1.2
## [5] viridis_0.6.5 ggraph_2.2.1
## [7] fastmap_1.2.0 SingleCellExperiment_1.31.1
## [9] tweenr_2.0.3 digest_0.6.37
## [11] lifecycle_1.0.4 magrittr_2.0.3
## [13] compiler_4.5.1 rlang_1.1.6
## [15] sass_0.4.10 tools_4.5.1
## [17] igraph_2.1.4 yaml_2.3.10
## [19] knitr_1.50 S4Arrays_1.9.1
## [21] labeling_0.4.3 graphlayouts_1.2.2
## [23] htmlwidgets_1.6.4 DelayedArray_0.35.2
## [25] RColorBrewer_1.1-3 abind_1.4-8
## [27] withr_3.0.2 purrr_1.0.4
## [29] desc_1.4.3 grid_4.5.1
## [31] polyclip_1.10-7 scales_1.4.0
## [33] MASS_7.3-65 cli_3.6.5
## [35] rmarkdown_2.29 crayon_1.5.3
## [37] ragg_1.4.0 BiocBaseUtils_1.11.0
## [39] cachem_1.1.0 stringr_1.5.1
## [41] splines_4.5.1 BiocManager_1.30.26
## [43] formatR_1.14 XVector_0.49.0
## [45] vctrs_0.6.5 Matrix_1.7-3
## [47] jsonlite_2.0.0 bookdown_0.43
## [49] ggrepel_0.9.6 systemfonts_1.2.3
## [51] jquerylib_0.1.4 glue_1.8.0
## [53] pkgdown_2.1.3 stringi_1.8.7
## [55] gtable_0.3.6 tibble_3.3.0
## [57] pillar_1.10.2 htmltools_0.5.8.1
## [59] R6_2.6.1 textshaping_1.0.1
## [61] tidygraph_1.3.1 evaluate_1.0.4
## [63] lattice_0.22-7 memoise_2.0.1
## [65] bslib_0.9.0 Rcpp_1.1.0
## [67] nlme_3.1-168 gridExtra_2.3
## [69] SparseArray_1.9.0 mgcv_1.9-3
## [71] xfun_0.52 fs_1.6.6
## [73] forcats_1.0.0 pkgconfig_2.0.3