Skip to contents

Introduction

The anansi package computes and compares the association between the features of two ’omics datasets that are known to interact based on a database such as KEGG. Studies including both functional microbiome and metabolomics data are becoming more common. Often, it would be helpful to integrate both datasets in order to see if they corroborate each others patterns. However, all-vs-all association analyses are imprecise and likely to yield spurious associations. This package takes a knowledge-based approach to constrain association search space, only considering metabolite-function interactions that have been recorded in a pathway database. In addition, it provides a framework to assess differential associations.

While anansi is geared towards metabolite-function interactions in the context of host-microbe interactions, it is perfectly capable of handling any other pair of data sets where some features interact canonically.

Installation

Install the development version from GitHub with remotes:

install.packages("remotes")
remotes::install_github("thomazbastiaanssen/anansi")

Setup

# load anansi
library(anansi)

# load ggplot2 and ggforce to plot results
library(ggplot2)
library(ggforce)

# load example data + metadata from FMT Aging study
data(FMT_data)

Data preparation

The main anansi function expects data in the AnansiWeb class format, which contains three tables: The first two tables, tableY and tableX are the feature tables of your two data modalities (e.g., metabolites, genes, microbes, …). Both tables should have columns as features and rows as samples.

The third table should be a binary adjacency matrix with the column names of tableY as rows and the column names of tableX as columns. An example of such an adjacency matrix comes with the anansi package and is referred to as a dictionary (because it is used to look up which metabolites interact with which functions).

Though most of the examples use metabolites and functions, anansi is able to handle any type of big/’omics data, as long as there is a dictionary available. Because of this, anansi uses the type-naive nomenclature tableY and tableX. The Y and X refer to the position these measurements will have in the linear modeling framework:

\[ Y \sim X \times{covariates} \]

A note on functional microbiome data

Two common questions in the host-microbiome field are “Who’s there?” and “What are they doing?”. Techniques like 16S rRNA sequencing and shotgun metagenomics sequencing are most commonly used to answer the first question. The second question can be a bit more tricky, as it often requires functional inference software to address them. For 16S sequencing, algorithms like PICRUSt2 and Piphillin can be used to infer function. For shotgun metagenomics, HUMANn3 in the bioBakery suite can be used. All of these algorithms can produce functional count data in terms of KEGG Orthologues (KOs) organised as tables, which can be directly passed to anansi as follows:

# anansi expects samples to be rows and features to be columns.
t1 <- t(FMT_metab)
t2 <- t(FMT_KOs)

Weave a web🕸️

The weaveWeb() function can be used to parse the tables t1 and t2 that we prepared above into an AnansiWeb object. The AnansiWeb format is a necessary input file for the main anansi workflow. It allows anansi to keep track of which features from the two input data sets should be considered as pairs.

anansi provides a pre-built map between ko, cpd and ec feature annotations from the KEGG database, but users can also provide their own custom maps. See vignette on adjacency matrices.

# Create AnansiWeb object from tables
web <- weaveWeb(cpd ~ ko, tableY = t1, tableX = t2, link = kegg_link())
## Dropped features in tableX: 2476 remain.

Run anansi🕷️

The main workspider is called anansi just like the package. Generally, three main arguments should be specified:

  1. web: an AnansiWeb object, such as the one we generated in the above step.
  2. formula: a linking formula. For instance, to assess differential associations between treatments, we use the formula ~ Legend.
  3. metadata: a data.frame with additional information about the data set. Its rows correspond to samples and its columns to variables that can be used in formula as c.
# Run anansi on AnansiWeb object
anansi_out <- anansi(web = web, formula = ~Legend, metadata = FMT_metadata, adjust.method = "BH",
    verbose = TRUE)
## Fitting least-squares for following model:
## ~ x + Legend + x:Legend
## Running correlations for the following groups:
##  Aged yFMT, Aged oFMT, Young yFMT

Reporting output📝

By default, anansi outputs a table in the wide format. For general reporting, we recommend sticking to the more legible table format by specifying the return.format argument. Let’s take a look at the structure of the output table:

# General structure of anansi() output
head(anansi_out, c(5, 5))
##   feature_Y feature_X All_r.values All_t.values All_p.values
## 1    C00186    K00016  -0.08985529    0.5260699   0.60225456
## 2    C00186    K00101   0.28661046    1.7443940   0.09012569
## 3    C00041    K00259  -0.21708011    1.2967052   0.20346437
## 4    C00064    K00265  -0.25811198    1.5578255   0.12853530
## 5    C00064    K00266  -0.05235448    0.3056957   0.76170027
# Let's take a look at the column names
colnames(anansi_out)
##  [1] "feature_Y"                   "feature_X"                  
##  [3] "All_r.values"                "All_t.values"               
##  [5] "All_p.values"                "Aged yFMT_r.values"         
##  [7] "Aged yFMT_t.values"          "Aged yFMT_p.values"         
##  [9] "Aged oFMT_r.values"          "Aged oFMT_t.values"         
## [11] "Aged oFMT_p.values"          "Young yFMT_r.values"        
## [13] "Young yFMT_t.values"         "Young yFMT_p.values"        
## [15] "full_r.squared"              "full_f.values"              
## [17] "full_p.values"               "disjointed_Legend_r.squared"
## [19] "disjointed_Legend_f.values"  "disjointed_Legend_p.values" 
## [21] "emergent_Legend_r.squared"   "emergent_Legend_f.values"   
## [23] "emergent_Legend_p.values"    "All_q.values"               
## [25] "Aged yFMT_q.values"          "Aged oFMT_q.values"         
## [27] "Young yFMT_q.values"         "full_q.values"              
## [29] "disjointed_Legend_q.values"  "emergent_Legend_q.values"

Output statistics

We can see that the output of anansi() is structured so that there is one feature pair per row, marked by the first two columns. For each of those feature pairs, the table contains outcome parameters from several models. In order, we have pooled correlations, group-wise correlations, full model, disjointed and emergent associations. For each of these models we report four parameters. For correlations we provide the Pearson’s coefficient (\(\rho\)) and the \(t\)-statistic, whereas we provide their squared counterparts for the linear regression models: \(R^2\) and the \(F\) statistic. For all models, we additionally provide p-values and adjusted p-values. For more on this, see the vignette on differential associations.

Association Method Statistics Significance
Pooled Correlation \(\rho\), \(t\) p, adjusted p
Group-wise Correlation \(\rho\), \(t\) p, adjusted p
Full Linear regression \(R^2\), \(F\) p, adjusted p
Disjointed Linear regression \(R^2\), \(F\) p, adjusted p
Emergent Linear regression \(R^2\), \(F\) p, adjusted p

Plot the results

The long format can be helpful to plug the data into ggplot2. Here, we recreate the style of the figure from the FMT Aging study.

# Use tidyr to wrangle the correlation r-values to a single column
library(tidyr)

anansiLong <- anansi_out |>
    pivot_longer(starts_with("All") | contains("FMT")) |>
    separate_wider_delim(
        name,
        delim = "_", names = c("cor_group", "param")
    ) |>
    pivot_wider(names_from = param, values_from = value)

# Now it's ready to be plugged into ggplot2, though let's clean up a bit more.
# Only consider interactions where the entire model fits well enough.
anansiLong <- anansiLong[anansiLong$full_q.values < 0.2, ]

ggplot(anansiLong) +

    # Define aesthetics
    aes(
        x = r.values, 
        y = feature_X,
        fill = cor_group, 
        alpha = disjointed_Legend_p.values < 0.05
        ) +

    # Make a vertical dashed red line at x = 0
    geom_vline(xintercept = 0, linetype = "dashed", colour = "red") +

    # Points show  raw correlation coefficients
    geom_point(shape = 21, size = 3) +

    # facet per compound
    ggforce::facet_col(~feature_Y, space = "free", scales = "free_y") +

    # fix the scales, labels, theme and other layout
    scale_y_discrete(limits = rev, position = "right") +
    scale_alpha_manual(
        values = c("TRUE" = 1, "FALSE" = 1 / 3),
        "Disjointed association\np < 0.05"
    ) +
    scale_fill_manual(
        values = c(
            "Young yFMT" = "#2166ac",
            "Aged oFMT" = "#b2182b",
            "Aged yFMT" = "#ef8a62",
            "All" = "gray"
        ),
        breaks = c("Young yFMT", "Aged oFMT", "Aged yFMT", "All"),
        name = "Treatment"
    ) +
    theme_bw() +
    ylab("") +
    xlab("Pearson's \u03c1")

Differential association plot, with facets separating metabolites
and metabolites on the y-axis. x-axis depicts pearson's correlation
coefficient.

Plotting multiple feature pairs at once with patchwork

We can also also use the AnansiWeb object to pull up specific pairs of features. The patchwork provides a wonderful interface to combine plots. We can use getFeaturePairs() to extract the relevant data from our AnansiWeb object in a convenient list format:

library(patchwork)

# Get a list containing data.frames for each feature pair.
feature_pairs <- getFeaturePairs(web)

# For each, prepare a ggplot object with those features on x & y axes.
plot_list <- lapply(
    feature_pairs,
    FUN = function(pair_i) {
        ggplot(pair_i) +
            aes(
                y = .data[[colnames(pair_i)[1L]]],
                x = .data[[colnames(pair_i)[2L]]]
            ) +
            # We'll use facet_grid for labels instead of labs().
            facet_grid(colnames(pair_i)[1L] ~ colnames(pair_i)[2L])
    }
)

# Feed the first six of our ggplots to patchwork, using wrap_plots()
wrap_plots(plot_list[seq_len(6L)], guides = "collect") &
    # Continue as with an ordinary ggplot, but use & instead of +
    geom_point(
        shape = 21, show.legend = FALSE,
        aes(fill = FMT_metadata$Legend)
    ) &
    geom_smooth(
        method = "lm", se = FALSE,
        aes(colour = FMT_metadata$Legend),
        show.legend = FALSE
    ) &

    # Appearance
    scale_fill_manual(
        aesthetics = c("colour", "fill"),
        values = c(
            "Young yFMT" = "#2166ac",
            "Aged oFMT" = "#b2182b",
            "Aged yFMT" = "#ef8a62",
            "All" = "gray"
        )
    ) &
    theme_bw() &
    labs(x = NULL, y = NULL)

These types of visualizations of feature pairs work best when the amount of feature pairs to be plotted is modest.

Note on using patchwork with anansi:

Especially for large numbers of feature pairs, it’s much more efficient to add shared elements to the patchwork using the & operator, rather than adding them to the initial ggplot() calls in the lapply() loop.

Bioconductor pipeline

anansi also supports the Bioconductor class MultiAssayExperiment (MAE) as an alternative to AnansiWeb. The latter can be converted to a MAE with the function asMAE.

library(MultiAssayExperiment)

# Store sample metadata into metadata slot of web object
metadata(web) <- data.frame(FMT_metadata, row.names = FMT_metadata$Sample_ID)

# Convert web to mae
mae <- asMAE(web)

The MAE object can be passed to the weaveWeb function, which returns the same output, ready for the anansi function.

web2 <- weaveWeb(mae, tableY = "cpd", tableX = "ko", link = kegg_link())

out <- anansi(web = web2, formula = ~Legend, adjust.method = "BH", verbose = TRUE)
## Fitting least-squares for following model:
## ~ x + Legend + x:Legend
## Running correlations for the following groups:
##  Aged yFMT, Aged oFMT, Young yFMT

Finally, the output of anansi can be visualised with plotAnansi by specifying the type of association to use.

# Filter associations by q value
out <- out[out$full_q.values < 0.2, ]

# Visualize disjointed associations
plotAnansi(out, association.type = "disjointed", model.var = "Legend", signif.threshold = 0.05,
    fill_by = "group")

Session info

## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] MultiAssayExperiment_1.35.3 SummarizedExperiment_1.39.1
##  [3] Biobase_2.69.0              GenomicRanges_1.61.1       
##  [5] Seqinfo_0.99.1              IRanges_2.43.0             
##  [7] S4Vectors_0.47.0            BiocGenerics_0.55.0        
##  [9] generics_0.1.4              MatrixGenerics_1.21.0      
## [11] matrixStats_1.5.0           patchwork_1.3.1            
## [13] tidyr_1.3.1                 ggforce_0.5.0              
## [15] ggplot2_3.5.2               anansi_0.7.0               
## [17] BiocStyle_2.37.0           
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1            viridisLite_0.4.2          
##  [3] dplyr_1.1.4                 farver_2.1.2               
##  [5] viridis_0.6.5               ggraph_2.2.1               
##  [7] fastmap_1.2.0               SingleCellExperiment_1.31.1
##  [9] tweenr_2.0.3                digest_0.6.37              
## [11] lifecycle_1.0.4             magrittr_2.0.3             
## [13] compiler_4.5.1              rlang_1.1.6                
## [15] sass_0.4.10                 tools_4.5.1                
## [17] igraph_2.1.4                yaml_2.3.10                
## [19] knitr_1.50                  S4Arrays_1.9.1             
## [21] labeling_0.4.3              graphlayouts_1.2.2         
## [23] htmlwidgets_1.6.4           DelayedArray_0.35.2        
## [25] RColorBrewer_1.1-3          abind_1.4-8                
## [27] withr_3.0.2                 purrr_1.0.4                
## [29] desc_1.4.3                  grid_4.5.1                 
## [31] polyclip_1.10-7             scales_1.4.0               
## [33] MASS_7.3-65                 cli_3.6.5                  
## [35] rmarkdown_2.29              crayon_1.5.3               
## [37] ragg_1.4.0                  BiocBaseUtils_1.11.0       
## [39] cachem_1.1.0                stringr_1.5.1              
## [41] splines_4.5.1               BiocManager_1.30.26        
## [43] formatR_1.14                XVector_0.49.0             
## [45] vctrs_0.6.5                 Matrix_1.7-3               
## [47] jsonlite_2.0.0              bookdown_0.43              
## [49] ggrepel_0.9.6               systemfonts_1.2.3          
## [51] jquerylib_0.1.4             glue_1.8.0                 
## [53] pkgdown_2.1.3               stringi_1.8.7              
## [55] gtable_0.3.6                tibble_3.3.0               
## [57] pillar_1.10.2               htmltools_0.5.8.1          
## [59] R6_2.6.1                    textshaping_1.0.1          
## [61] tidygraph_1.3.1             evaluate_1.0.4             
## [63] lattice_0.22-7              memoise_2.0.1              
## [65] bslib_0.9.0                 Rcpp_1.1.0                 
## [67] nlme_3.1-168                gridExtra_2.3              
## [69] SparseArray_1.9.0           mgcv_1.9-3                 
## [71] xfun_0.52                   fs_1.6.6                   
## [73] forcats_1.0.0               pkgconfig_2.0.3