Skip to contents

Introduction

The anansi package computes and compares the association between the features of two ’omics datasets that are known to interact based on a database such as KEGG. Studies including both microbiome and metabolomics data are becoming more common. Often, it would be helpful to integrate both datasets in order to see if they corroborate each others patterns. All vs all association is imprecise and likely to yield spurious associations. This package takes a knowledge-based approach to constrain association search space, only considering metabolite-function interactions that have been recorded in a pathway database. This package also provides a framework to assess differential association.

While anansi is geared towards metabolite-function interactions in the context of host-microbe interactions, it is perfectly capable of handling any other pair of datasets where some features interact canonically. A very early version of anansi was used to generate “Extended Data Fig. 7” in that paper.

Installation

# install and load anansi devtools::install_github('thomazbastiaanssen/anansi')
library(anansi)

# load ggplot2 and ggforce to plot results
library(ggplot2)
library(ggforce)

# load example data + metadata from FMT Aging study
data(FMT_data)

Data preparation

The main anansi function expects data in the AnansiWeb class format, which contains three tables: The first two tables, tableY and tableX are the feature tables of your two data modalities (e.g., metabolites, genes, microbes, …). Both tables should have columns as features and rows as samples.

The third table should be a binary adjacency matrix with the column names of tableY as rows and the column names of tableX as columns. Such an adjacency matrix is provided in the anansi library and is referred to as a dictionary (because you use it to look up which metabolites interact with which functions).

Though most of the examples use metabolites and functions, anansi is able to handle any type of big/’omics data, as long as there is a dictionary available. Because of this, anansi uses the type-naive nomenclature tableY and tableX. The Y and X refer to the position these measurements will have in the linear modeling framework:

\(Y \sim X \times{covariates}\)

A note on functional microbiome data

Two common questions in the host-microbiome field are “Who’s there?” and “What are they doing?”. Techniques like 16S sequencing and shotgun metagenomics sequencing are most commonly used to answer the first question. The second question can be a bit more tricky - often we’ll need functional inference software to address them. For 16S sequencing, algorithms like PICRUSt2 and Piphillin can be used to infer function. For shotgun metagenomics, HUMANn3 in the bioBakery suite can be used.
All of these algorithms can produce functional count data in terms of KEGG Orthologues (KOs). These tables can be directly plugged in to anansi.

# anansi expects samples to be rows and features to be columns.
t1 <- t(FMT_metab)
t2 <- t(FMT_KOs)

Weave a web🕸️

The weaveWeb() function can be used to parse the tables that we prepared above into an AnansiWeb object. The AnansiWeb format is a necessary input file for the main anansi workflow. It allows anansi to keep track of which features from the two input data sets should should be considered as pairs.

With anansi, we provide a pre-built map between ko, cpd and ec features of the KEGG database, but users can provide their own maps. See vignette on adjacency matrices.

web <- weaveWeb(cpd ~ ko, tableY = t1, tableX = t2, link = kegg_link())
## Dropped features in tableX: 2476 remain.

Run anansi🕷️

The main workspider in this package is called anansi. Generally, you want to give it three arguments. First, there’s web, which is an AnansiWeb object, such as the one we generated in the above step. Second, there’s formula, which should be a formula. For instance, to assess differential associations between treatments, we use the formula ~Legend, provided we have a column with that name in our metadata object, the Third argument.

anansi_out <- anansi(web = web, formula = ~Legend, metadata = FMT_metadata, adjust.method = "BH",
    verbose = TRUE)
## Fitting least-squares for following model:
## ~ x + Legend + x:Legend
## Running correlations for the following groups:
##  Aged yFMT, Aged oFMT, Young yFMT

Reporting output📝

anansi returns a wide format table as an output. For general reporting, we recommend sticking to the table format as it’s the most legible. Let’s take a look at the structure of the output table:

# General structure of anansi() output
head(anansi_out, c(5, 5))
##   feature_Y feature_X All_r.values All_t.values All_p.values
## 1    C00186    K00016  -0.08985529    0.5260699   0.60225456
## 2    C00186    K00101   0.28661046    1.7443940   0.09012569
## 3    C00041    K00259  -0.21708011    1.2967052   0.20346437
## 4    C00064    K00265  -0.25811198    1.5578255   0.12853530
## 5    C00064    K00266  -0.05235448    0.3056957   0.76170027
# Let's take a look at the column names
colnames(anansi_out)
##  [1] "feature_Y"                   "feature_X"                  
##  [3] "All_r.values"                "All_t.values"               
##  [5] "All_p.values"                "Aged yFMT_r.values"         
##  [7] "Aged yFMT_t.values"          "Aged yFMT_p.values"         
##  [9] "Aged oFMT_r.values"          "Aged oFMT_t.values"         
## [11] "Aged oFMT_p.values"          "Young yFMT_r.values"        
## [13] "Young yFMT_t.values"         "Young yFMT_p.values"        
## [15] "full_r.squared"              "full_f.values"              
## [17] "full_p.values"               "disjointed_Legend_r.squared"
## [19] "disjointed_Legend_f.values"  "disjointed_Legend_p.values" 
## [21] "emergent_Legend_r.squared"   "emergent_Legend_f.values"   
## [23] "emergent_Legend_p.values"    "All_q.values"               
## [25] "Aged yFMT_q.values"          "Aged oFMT_q.values"         
## [27] "Young yFMT_q.values"         "full_q.values"              
## [29] "disjointed_Legend_q.values"  "emergent_Legend_q.values"

Output statistics

We can see that the output of anansi() is structured so that there is one feature pair per row, marked by the first two columns. For each of those feature pairs, the table contains outcome parameters from several models. In order, we have pooled correlations, group-wise correlations, full model, disjointed and emergent associations. For each of these models we report four parameters. For correlations we provide \(\rho\) - Pearson’s coefficient - and the \(t\)-statistic, whereas we provide their squared counterparts for the other models: \(R^2\) and the \(F\) statistic. For all models, we additionally provide p-values and adjusted p-values. For more on this, see the vignette on differential associations.

Plot the results

The long format can be helpful to plug the data into ggplot2. Here, we recreate the style of the figure from the FMT Aging study.

# Use tidyr to wrangle the correlation r-values to a single column
library(tidyr)

anansiLong <- anansi_out |>
    pivot_longer(starts_with("All") | contains("FMT")) |>
    separate_wider_delim(
        name,
        delim = "_", names = c("cor_group", "param")
    ) |>
    pivot_wider(names_from = param, values_from = value)

# Now it's ready to be plugged into ggplot2, though let's clean up a bit more.
# Only consider interactions where the entire model fits well enough.
anansiLong <- anansiLong[anansiLong$full_q.values < 0.2, ]

ggplot(anansiLong) +

    # Define aesthetics
    aes(
        x = r.values, 
        y = feature_X,
        fill = cor_group, 
        alpha = disjointed_Legend_p.values < 0.05
        ) +

    # Make a vertical dashed red line at x = 0
    geom_vline(xintercept = 0, linetype = "dashed", colour = "red") +

    # Points show  raw correlation coefficients
    geom_point(shape = 21, size = 3) +

    # facet per compound
    ggforce::facet_col(~feature_Y, space = "free", scales = "free_y") +

    # fix the scales, labels, theme and other layout
    scale_y_discrete(limits = rev, position = "right") +
    scale_alpha_manual(
        values = c("TRUE" = 1, "FALSE" = 1 / 3),
        "Disjointed association\np < 0.05"
    ) +
    scale_fill_manual(
        values = c(
            "Young yFMT" = "#2166ac",
            "Aged oFMT" = "#b2182b",
            "Aged yFMT" = "#ef8a62",
            "All" = "gray"
        ),
        breaks = c("Young yFMT", "Aged oFMT", "Aged yFMT", "All"),
        name = "Treatment"
    ) +
    theme_bw() +
    ylab("") +
    xlab("Pearson's \u03c1")

Differential association plot, with facets separating metabolites
and metabolites on the y-axis. x-axis depicts pearson's correlation
coefficient.

Plotting multiple feature pairs at once with patchwork

We can also also use the AnansiWeb object to pull up specific pairs of features. The patchwork provides a wonderful interface to combine plots. We can use getFeaturePairs() to extract the relevant data from our AnansiWeb object in a convenient list format:

library(patchwork)

# Get a list containing data.frames for each feature pair.
feature_pairs <- getFeaturePairs(web)

# For each, prepare a ggplot object with those features on x & y axes.
plot_list <- lapply(
    feature_pairs,
    FUN = function(pair_i) {
        ggplot(pair_i) +
            aes(
                y = .data[[colnames(pair_i)[1L]]],
                x = .data[[colnames(pair_i)[2L]]]
            ) +
            # We'll use facet_grid for labels instead of labs().
            facet_grid(colnames(pair_i)[1L] ~ colnames(pair_i)[2L])
    }
)

# Feed the first six of our ggplots to patchwork, using wrap_plots()
wrap_plots(plot_list[seq_len(6L)], guides = "collect") &
    # Continue as with an ordinary ggplot, but use & instead of +
    geom_point(
        shape = 21, show.legend = FALSE,
        aes(fill = FMT_metadata$Legend)
    ) &
    geom_smooth(
        method = "lm", se = FALSE,
        aes(colour = FMT_metadata$Legend),
        show.legend = FALSE
    ) &

    # Appearance
    scale_fill_manual(
        aesthetics = c("colour", "fill"),
        values = c(
            "Young yFMT" = "#2166ac",
            "Aged oFMT" = "#b2182b",
            "Aged yFMT" = "#ef8a62",
            "All" = "gray"
        )
    ) &
    theme_bw() &
    labs(x = NULL, y = NULL)

These types of visualizations of feature pairs work best when the amount of feature pairs to be plotted is modest.

Note on using patchwork with anansi:

Especially for large numers of featire pairs, it’s much more efficient to add shared elements to the patchwork using the & operator, rather than adding them to the initial ggplot() calls in the lapply() loop.

Bioconductor pipeline

anansi also supports the Bioconductor class MultiAssayExperiment (MAE) as an alternative to AnansiWeb. The latter can be converted to a MAE with the function asMAE.

library(MultiAssayExperiment)

# Store sample metadata into metadata slot of web object
metadata(web) <- data.frame(FMT_metadata, row.names = FMT_metadata$Sample_ID)

# Convert web to mae
mae <- asMAE(web)

The MAE object can be passed to the getAnansi function, which returns the same output as anansi.

out <- getAnansi(x = mae, tableY = "cpd", tableX = "ko", formula = ~Legend, adjust.method = "BH",
    verbose = TRUE)
## Fitting least-squares for following model:
## ~ x + Legend + x:Legend
## Running correlations for the following groups:
##  Aged yFMT, Aged oFMT, Young yFMT

Finally, the output of anansi can be visualized with plotAnansi by specifying the type of association to use.

# Filter associations by q value
out <- out[out$full_q.values < 0.2, ]

# Visualize disjointed associations
plotAnansi(out, association.type = "disjointed", model.var = "Legend", signif.threshold = 0.05,
    fill_by = "group")

Session info

## R version 4.5.0 (2025-04-11)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] MultiAssayExperiment_1.35.0 SummarizedExperiment_1.39.0
##  [3] Biobase_2.69.0              GenomicRanges_1.61.0       
##  [5] GenomeInfoDb_1.45.0         IRanges_2.43.0             
##  [7] S4Vectors_0.47.0            BiocGenerics_0.55.0        
##  [9] generics_0.1.3              MatrixGenerics_1.21.0      
## [11] matrixStats_1.5.0           patchwork_1.3.0            
## [13] tidyr_1.3.1                 ggforce_0.4.2              
## [15] ggplot2_3.5.2               anansi_0.7.0               
## [17] BiocStyle_2.37.0           
## 
## loaded via a namespace (and not attached):
##   [1] jsonlite_2.0.0                  magrittr_2.0.3                 
##   [3] estimability_1.5.1              ggbeeswarm_0.7.2               
##   [5] farver_2.1.2                    rmarkdown_2.29                 
##   [7] fs_1.6.6                        ragg_1.4.0                     
##   [9] vctrs_0.6.5                     DelayedMatrixStats_1.31.0      
##  [11] htmltools_0.5.8.1               S4Arrays_1.9.0                 
##  [13] forcats_1.0.0                   BiocBaseUtils_1.11.0           
##  [15] BiocNeighbors_2.3.0             cellranger_1.1.0               
##  [17] SparseArray_1.9.0               parallelly_1.43.0              
##  [19] sass_0.4.10                     bslib_0.9.0                    
##  [21] htmlwidgets_1.6.4               desc_1.4.3                     
##  [23] plyr_1.8.9                      DECIPHER_3.5.0                 
##  [25] emmeans_1.11.0                  cachem_1.1.0                   
##  [27] igraph_2.1.4                    lifecycle_1.0.4                
##  [29] pkgconfig_2.0.3                 rsvd_1.0.5                     
##  [31] Matrix_1.7-3                    R6_2.6.1                       
##  [33] fastmap_1.2.0                   GenomeInfoDbData_1.2.14        
##  [35] digest_0.6.37                   colorspace_2.1-1               
##  [37] ggnewscale_0.5.1                scater_1.37.0                  
##  [39] irlba_2.3.5.1                   textshaping_1.0.0              
##  [41] vegan_2.6-10                    beachmat_2.25.0                
##  [43] labeling_0.4.3                  httr_1.4.7                     
##  [45] polyclip_1.10-7                 TreeSummarizedExperiment_2.17.0
##  [47] abind_1.4-8                     mgcv_1.9-3                     
##  [49] compiler_4.5.0                  withr_3.0.2                    
##  [51] BiocParallel_1.43.0             viridis_0.6.5                  
##  [53] DBI_1.2.3                       MASS_7.3-65                    
##  [55] DelayedArray_0.35.1             bluster_1.19.0                 
##  [57] permute_0.9-7                   tools_4.5.0                    
##  [59] vipor_0.4.7                     beeswarm_0.4.0                 
##  [61] ape_5.8-1                       glue_1.8.0                     
##  [63] nlme_3.1-168                    gridtext_0.1.5                 
##  [65] grid_4.5.0                      mia_1.17.0                     
##  [67] cluster_2.1.8.1                 reshape2_1.4.4                 
##  [69] gtable_0.3.6                    tzdb_0.5.0                     
##  [71] fillpattern_1.0.2               hms_1.1.3                      
##  [73] xml2_1.3.8                      BiocSingular_1.25.0            
##  [75] ScaledMatrix_1.17.0             XVector_0.49.0                 
##  [77] ggrepel_0.9.6                   pillar_1.10.2                  
##  [79] stringr_1.5.1                   yulab.utils_0.2.0              
##  [81] splines_4.5.0                   ggtext_0.1.2                   
##  [83] dplyr_1.1.4                     tweenr_2.0.3                   
##  [85] treeio_1.33.0                   lattice_0.22-7                 
##  [87] tidyselect_1.2.1                DirichletMultinomial_1.51.0    
##  [89] SingleCellExperiment_1.31.0     Biostrings_2.77.0              
##  [91] scuttle_1.19.0                  knitr_1.50                     
##  [93] gridExtra_2.3                   bookdown_0.43                  
##  [95] xfun_0.52                       rbiom_2.2.0                    
##  [97] stringi_1.8.7                   UCSC.utils_1.5.0               
##  [99] lazyeval_0.2.2                  yaml_2.3.10                    
## [101] evaluate_1.0.3                  codetools_0.2-20               
## [103] tibble_3.2.1                    BiocManager_1.30.25            
## [105] cli_3.6.4                       xtable_1.8-4                   
## [107] systemfonts_1.2.2               munsell_0.5.1                  
## [109] jquerylib_0.1.4                 readxl_1.4.5                   
## [111] Rcpp_1.0.14                     parallel_4.5.0                 
## [113] readr_2.1.5                     pkgdown_2.1.1                  
## [115] sparseMatrixStats_1.21.0        slam_0.1-55                    
## [117] decontam_1.29.0                 viridisLite_0.4.2              
## [119] mvtnorm_1.3-3                   tidytree_0.4.6                 
## [121] scales_1.3.0                    purrr_1.0.4                    
## [123] crayon_1.5.3                    rlang_1.1.6                    
## [125] formatR_1.14