Introduction
The anansi
package computes and compares the association
between the features of two ’omics datasets that are known to interact
based on a database such as KEGG. Studies including both microbiome and
metabolomics data are becoming more common. Often, it would be helpful
to integrate both datasets in order to see if they corroborate each
others patterns. All vs all association is imprecise and likely to yield
spurious associations. This package takes a knowledge-based approach to
constrain association search space, only considering metabolite-function
interactions that have been recorded in a pathway database. This package
also provides a framework to assess differential association.
While anansi
is geared towards metabolite-function
interactions in the context of host-microbe interactions, it is
perfectly capable of handling any other pair of datasets where some
features interact canonically. A very early version of
anansi
was used to generate “Extended Data Fig. 7” in that
paper.
Data preparation
The main anansi
function expects data in the
AnansiWeb
class format, which contains three tables: The
first two tables, tableY
and tableX
are the
feature tables of your two data modalities (e.g., metabolites, genes,
microbes, …). Both tables should have columns as features and rows as
samples.
The third table should be a binary adjacency matrix with the column
names of tableY
as rows and the column names of
tableX
as columns. Such an adjacency matrix is provided in
the anansi
library and is referred to as a dictionary
(because you use it to look up which metabolites interact with which
functions).
Though most of the examples use metabolites and functions,
anansi
is able to handle any type of big/’omics data, as
long as there is a dictionary available. Because of this, anansi uses
the type-naive nomenclature tableY
and tableX
.
The Y and X refer to the position these measurements will have in the
linear modeling framework:
\(Y \sim X \times{covariates}\)
A note on functional microbiome data
Two common questions in the host-microbiome field are “Who’s there?”
and “What are they doing?”. Techniques like 16S sequencing and shotgun
metagenomics sequencing are most commonly used to answer the first
question. The second question can be a bit more tricky - often we’ll
need functional inference software to address them. For 16S sequencing,
algorithms like PICRUSt2 and Piphillin can be used to infer function.
For shotgun metagenomics, HUMANn3 in the bioBakery suite can be
used.
All of these algorithms can produce functional count data in terms of
KEGG Orthologues (KOs). These tables can be directly plugged in to
anansi
.
Weave a web🕸️
The weaveWeb()
function can be used to parse the tables
that we prepared above into an AnansiWeb
object. The
AnansiWeb
format is a necessary input file for the main
anansi
workflow. It allows anansi
to keep
track of which features from the two input data sets should should be
considered as pairs.
With anansi
, we provide a pre-built map between ko, cpd
and ec features of the KEGG database, but users can provide their own
maps. See
vignette on adjacency matrices.
## Dropped features in tableX: 2476 remain.
Run anansi🕷️
The main workspider in this package is called anansi
.
Generally, you want to give it three arguments. First, there’s
web
, which is an AnansiWeb
object, such as the
one we generated in the above step. Second, there’s
formula
, which should be a formula. For instance, to assess
differential associations between treatments, we use the formula
~Legend
, provided we have a column with that name in our
metadata
object, the Third argument.
anansi_out <- anansi(web = web, formula = ~Legend, metadata = FMT_metadata, adjust.method = "BH",
verbose = TRUE)
## Fitting least-squares for following model:
## ~ x + Legend + x:Legend
## Running correlations for the following groups:
## Aged yFMT, Aged oFMT, Young yFMT
Reporting output📝
anansi
returns a wide format table as an output. For
general reporting, we recommend sticking to the table format as it’s the
most legible. Let’s take a look at the structure of the output
table:
## feature_Y feature_X All_r.values All_t.values All_p.values
## 1 C00186 K00016 -0.08985529 0.5260699 0.60225456
## 2 C00186 K00101 0.28661046 1.7443940 0.09012569
## 3 C00041 K00259 -0.21708011 1.2967052 0.20346437
## 4 C00064 K00265 -0.25811198 1.5578255 0.12853530
## 5 C00064 K00266 -0.05235448 0.3056957 0.76170027
# Let's take a look at the column names
colnames(anansi_out)
## [1] "feature_Y" "feature_X"
## [3] "All_r.values" "All_t.values"
## [5] "All_p.values" "Aged yFMT_r.values"
## [7] "Aged yFMT_t.values" "Aged yFMT_p.values"
## [9] "Aged oFMT_r.values" "Aged oFMT_t.values"
## [11] "Aged oFMT_p.values" "Young yFMT_r.values"
## [13] "Young yFMT_t.values" "Young yFMT_p.values"
## [15] "full_r.squared" "full_f.values"
## [17] "full_p.values" "disjointed_Legend_r.squared"
## [19] "disjointed_Legend_f.values" "disjointed_Legend_p.values"
## [21] "emergent_Legend_r.squared" "emergent_Legend_f.values"
## [23] "emergent_Legend_p.values" "All_q.values"
## [25] "Aged yFMT_q.values" "Aged oFMT_q.values"
## [27] "Young yFMT_q.values" "full_q.values"
## [29] "disjointed_Legend_q.values" "emergent_Legend_q.values"
Output statistics
We can see that the output of anansi()
is structured so
that there is one feature pair per row, marked by the first two columns.
For each of those feature pairs, the table contains outcome parameters
from several models. In order, we have pooled correlations, group-wise
correlations, full model, disjointed and emergent associations. For each
of these models we report four parameters. For correlations we provide
\(\rho\) - Pearson’s coefficient - and
the \(t\)-statistic, whereas we provide
their squared counterparts for the other models: \(R^2\) and the \(F\) statistic. For all models, we
additionally provide p-values and adjusted p-values. For more on this,
see
the vignette on differential associations.
Plot the results
The long format can be helpful to plug the data into
ggplot2
. Here, we recreate the style of the figure from the
FMT Aging study.
# Use tidyr to wrangle the correlation r-values to a single column
library(tidyr)
anansiLong <- anansi_out |>
pivot_longer(starts_with("All") | contains("FMT")) |>
separate_wider_delim(
name,
delim = "_", names = c("cor_group", "param")
) |>
pivot_wider(names_from = param, values_from = value)
# Now it's ready to be plugged into ggplot2, though let's clean up a bit more.
# Only consider interactions where the entire model fits well enough.
anansiLong <- anansiLong[anansiLong$full_q.values < 0.2, ]
ggplot(anansiLong) +
# Define aesthetics
aes(
x = r.values,
y = feature_X,
fill = cor_group,
alpha = disjointed_Legend_p.values < 0.05
) +
# Make a vertical dashed red line at x = 0
geom_vline(xintercept = 0, linetype = "dashed", colour = "red") +
# Points show raw correlation coefficients
geom_point(shape = 21, size = 3) +
# facet per compound
ggforce::facet_col(~feature_Y, space = "free", scales = "free_y") +
# fix the scales, labels, theme and other layout
scale_y_discrete(limits = rev, position = "right") +
scale_alpha_manual(
values = c("TRUE" = 1, "FALSE" = 1 / 3),
"Disjointed association\np < 0.05"
) +
scale_fill_manual(
values = c(
"Young yFMT" = "#2166ac",
"Aged oFMT" = "#b2182b",
"Aged yFMT" = "#ef8a62",
"All" = "gray"
),
breaks = c("Young yFMT", "Aged oFMT", "Aged yFMT", "All"),
name = "Treatment"
) +
theme_bw() +
ylab("") +
xlab("Pearson's \u03c1")
Plotting multiple feature pairs at once with
patchwork
We can also also use the AnansiWeb
object to pull up
specific pairs of features. The patchwork
provides a wonderful interface to combine plots. We can use
getFeaturePairs()
to extract the relevant data from our
AnansiWeb
object in a convenient list format:
library(patchwork)
# Get a list containing data.frames for each feature pair.
feature_pairs <- getFeaturePairs(web)
# For each, prepare a ggplot object with those features on x & y axes.
plot_list <- lapply(
feature_pairs,
FUN = function(pair_i) {
ggplot(pair_i) +
aes(
y = .data[[colnames(pair_i)[1L]]],
x = .data[[colnames(pair_i)[2L]]]
) +
# We'll use facet_grid for labels instead of labs().
facet_grid(colnames(pair_i)[1L] ~ colnames(pair_i)[2L])
}
)
# Feed the first six of our ggplots to patchwork, using wrap_plots()
wrap_plots(plot_list[seq_len(6L)], guides = "collect") &
# Continue as with an ordinary ggplot, but use & instead of +
geom_point(
shape = 21, show.legend = FALSE,
aes(fill = FMT_metadata$Legend)
) &
geom_smooth(
method = "lm", se = FALSE,
aes(colour = FMT_metadata$Legend),
show.legend = FALSE
) &
# Appearance
scale_fill_manual(
aesthetics = c("colour", "fill"),
values = c(
"Young yFMT" = "#2166ac",
"Aged oFMT" = "#b2182b",
"Aged yFMT" = "#ef8a62",
"All" = "gray"
)
) &
theme_bw() &
labs(x = NULL, y = NULL)
These types of visualizations of feature pairs work best when the amount
of feature pairs to be plotted is modest.
Bioconductor pipeline
anansi also supports the Bioconductor class MultiAssayExperiment
(MAE) as an alternative to AnansiWeb. The latter can be converted to a
MAE with the function asMAE
.
library(MultiAssayExperiment)
# Store sample metadata into metadata slot of web object
metadata(web) <- data.frame(FMT_metadata, row.names = FMT_metadata$Sample_ID)
# Convert web to mae
mae <- asMAE(web)
The MAE object can be passed to the getAnansi
function,
which returns the same output as anansi
.
out <- getAnansi(x = mae, tableY = "cpd", tableX = "ko", formula = ~Legend, adjust.method = "BH",
verbose = TRUE)
## Fitting least-squares for following model:
## ~ x + Legend + x:Legend
## Running correlations for the following groups:
## Aged yFMT, Aged oFMT, Young yFMT
Finally, the output of anansi can be visualized with
plotAnansi
by specifying the type of association to
use.
# Filter associations by q value
out <- out[out$full_q.values < 0.2, ]
# Visualize disjointed associations
plotAnansi(out, association.type = "disjointed", model.var = "Legend", signif.threshold = 0.05,
fill_by = "group")
Session info
## R version 4.5.0 (2025-04-11)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] MultiAssayExperiment_1.35.0 SummarizedExperiment_1.39.0
## [3] Biobase_2.69.0 GenomicRanges_1.61.0
## [5] GenomeInfoDb_1.45.0 IRanges_2.43.0
## [7] S4Vectors_0.47.0 BiocGenerics_0.55.0
## [9] generics_0.1.3 MatrixGenerics_1.21.0
## [11] matrixStats_1.5.0 patchwork_1.3.0
## [13] tidyr_1.3.1 ggforce_0.4.2
## [15] ggplot2_3.5.2 anansi_0.7.0
## [17] BiocStyle_2.37.0
##
## loaded via a namespace (and not attached):
## [1] jsonlite_2.0.0 magrittr_2.0.3
## [3] estimability_1.5.1 ggbeeswarm_0.7.2
## [5] farver_2.1.2 rmarkdown_2.29
## [7] fs_1.6.6 ragg_1.4.0
## [9] vctrs_0.6.5 DelayedMatrixStats_1.31.0
## [11] htmltools_0.5.8.1 S4Arrays_1.9.0
## [13] forcats_1.0.0 BiocBaseUtils_1.11.0
## [15] BiocNeighbors_2.3.0 cellranger_1.1.0
## [17] SparseArray_1.9.0 parallelly_1.43.0
## [19] sass_0.4.10 bslib_0.9.0
## [21] htmlwidgets_1.6.4 desc_1.4.3
## [23] plyr_1.8.9 DECIPHER_3.5.0
## [25] emmeans_1.11.0 cachem_1.1.0
## [27] igraph_2.1.4 lifecycle_1.0.4
## [29] pkgconfig_2.0.3 rsvd_1.0.5
## [31] Matrix_1.7-3 R6_2.6.1
## [33] fastmap_1.2.0 GenomeInfoDbData_1.2.14
## [35] digest_0.6.37 colorspace_2.1-1
## [37] ggnewscale_0.5.1 scater_1.37.0
## [39] irlba_2.3.5.1 textshaping_1.0.0
## [41] vegan_2.6-10 beachmat_2.25.0
## [43] labeling_0.4.3 httr_1.4.7
## [45] polyclip_1.10-7 TreeSummarizedExperiment_2.17.0
## [47] abind_1.4-8 mgcv_1.9-3
## [49] compiler_4.5.0 withr_3.0.2
## [51] BiocParallel_1.43.0 viridis_0.6.5
## [53] DBI_1.2.3 MASS_7.3-65
## [55] DelayedArray_0.35.1 bluster_1.19.0
## [57] permute_0.9-7 tools_4.5.0
## [59] vipor_0.4.7 beeswarm_0.4.0
## [61] ape_5.8-1 glue_1.8.0
## [63] nlme_3.1-168 gridtext_0.1.5
## [65] grid_4.5.0 mia_1.17.0
## [67] cluster_2.1.8.1 reshape2_1.4.4
## [69] gtable_0.3.6 tzdb_0.5.0
## [71] fillpattern_1.0.2 hms_1.1.3
## [73] xml2_1.3.8 BiocSingular_1.25.0
## [75] ScaledMatrix_1.17.0 XVector_0.49.0
## [77] ggrepel_0.9.6 pillar_1.10.2
## [79] stringr_1.5.1 yulab.utils_0.2.0
## [81] splines_4.5.0 ggtext_0.1.2
## [83] dplyr_1.1.4 tweenr_2.0.3
## [85] treeio_1.33.0 lattice_0.22-7
## [87] tidyselect_1.2.1 DirichletMultinomial_1.51.0
## [89] SingleCellExperiment_1.31.0 Biostrings_2.77.0
## [91] scuttle_1.19.0 knitr_1.50
## [93] gridExtra_2.3 bookdown_0.43
## [95] xfun_0.52 rbiom_2.2.0
## [97] stringi_1.8.7 UCSC.utils_1.5.0
## [99] lazyeval_0.2.2 yaml_2.3.10
## [101] evaluate_1.0.3 codetools_0.2-20
## [103] tibble_3.2.1 BiocManager_1.30.25
## [105] cli_3.6.4 xtable_1.8-4
## [107] systemfonts_1.2.2 munsell_0.5.1
## [109] jquerylib_0.1.4 readxl_1.4.5
## [111] Rcpp_1.0.14 parallel_4.5.0
## [113] readr_2.1.5 pkgdown_2.1.1
## [115] sparseMatrixStats_1.21.0 slam_0.1-55
## [117] decontam_1.29.0 viridisLite_0.4.2
## [119] mvtnorm_1.3-3 tidytree_0.4.6
## [121] scales_1.3.0 purrr_1.0.4
## [123] crayon_1.5.3 rlang_1.1.6
## [125] formatR_1.14