Co-occurrence analysis

Click to upload or drag CSV here
Format: Species × Sites (rows × columns, values 0/1)

Pairwise test method:

Number of permutations:

C-score null model:

Fixed rows + columns: Preserves both species frequencies AND site richness. Recommended for most analyses.

Pairwise tests

Upload data and run analysis to see pairwise results

Community-level C-score analysis

C-score results

Observed C-score

Expected C-score

Standardized effect (SES)

P-value (2-tail)

P-value (lower)

P-value (upper)

Top checkerboard pairs (highest CU)

Species 1	Species 2	CU	r_i	r_j	Co-occur

Top co-occurring pairs (lowest CU)

Species 1	Species 2	CU	r_i	r_j	Co-occur

Upload data and run analysis to see C-Score results

About this tool

This tool performs two types of analyses on species co-occurrence data:

Pairwise tests - Tests individual species pairs for association using either exact hypergeometric test or permutation test
Community-level C-score analysis - Measures overall checkerboard patterns (aggregates pairwise information)

Pairwise tests: Two methods available

Method 1: Exact Hypergeometric Test (Recommended)

The exact hypergeometric test calculates the exact probability of observing the co-occurrence pattern without simulation.

How it works:

Given two species with \(r_i\) and \(r_j\) occurrences across \(m\) sites, the probability of observing exactly \(k\) co-occurrences under the null hypothesis (random distribution) follows the hypergeometric distribution:

\[P(X = k) = \frac{\binom{r_i}{k} \binom{m - r_i}{r_j - k}}{\binom{m}{r_j}}\]

where \(\binom{n}{k}\) is the binomial coefficient "\(n\) choose \(k\)".

P-values explained:

P-value (Lower tail): \(P(X \leq k_{obs})\) = Probability of observing this many or fewer co-occurrences. Low values (< 0.05) indicate avoidance.
P-value (Upper tail): \(P(X \geq k_{obs})\) = Probability of observing this many or more co-occurrences. Low values (< 0.05) indicate attraction/co-occurrence.
P-value (Two-tailed): \(2 \times \min(P_{lower}, P_{upper})\) = Tests for any significant deviation from random expectation (either too many or too few co-occurrences).
Significance: If the two-tailed p-value < 0.05, the pattern is statistically significant.

Method 2: Permutation Test

A permutation test is a non-parametric statistical method that doesn't assume any particular distribution. It tests hypotheses by randomly rearranging (permuting) the data many times and comparing the observed pattern to the distribution of permuted patterns.

Step 1: Calculate observed co-occurrence

Count how many sites both species occupy together:

\[k_{obs} = \sum_{s=1}^{m} \mathbb{1}(\text{species}_i[s] = 1 \text{ AND } \text{species}_j[s] = 1)\]

Step 2: Generate permutations

Randomly shuffle each species' presence/absence pattern across sites independently. This breaks any real association while maintaining each species' total number of occurrences.

Step 3: Calculate permuted co-occurrences

For each permutation, count co-occurrences again. This builds a distribution of what we'd expect by chance.

Step 4: Compute statistics

\[\text{Expected} = \frac{1}{N}\sum_{p=1}^{N} k_p\]

\[\text{StdDev} = \sqrt{\frac{1}{N}\sum_{p=1}^{N}(k_p - \text{Expected})^2}\]

\[\text{Effect Size} = \frac{k_{obs} - \text{Expected}}{\text{StdDev}}\]

Step 5: Calculate p-values

Lower tail p-value: Proportion of permutations with co-occurrences ≤ observed

\[P_{lower} = \frac{\#\{k_p \leq k_{obs}\} + 1}{N + 1}\]

Upper tail p-value: Proportion of permutations with co-occurrences ≥ observed

\[P_{upper} = \frac{\#\{k_p \geq k_{obs}\} + 1}{N + 1}\]

Two-tailed p-value: Proportion of permutations as extreme or more extreme than observed

\[P_{two} = \frac{\#\{|k_p - \text{Expected}| \geq |k_{obs} - \text{Expected}|\} + 1}{N + 1}\]

Step 6: Interpret results

Low p-value (< 0.05): The pattern is unlikely due to chance alone
Positive effect size: Species co-occur more than expected (co-occurrence/attraction)
Negative effect size: Species co-occur less than expected (avoidance)

C-score (checkerboard score)

The C-score is a community-level metric that quantifies the average "checkerboard pattern" across all species pairs in a community. It measures whether species tend to avoid each other (high C-score) or co-occur (low C-score) more than expected by chance.

Key insight: The C-score aggregates information from all pairwise comparisons into a single community-level metric.

Formula

For each species pair \((i, j)\), the checkerboard unit (CU) is:

\[CU_{ij} = (r_i - S_{ij})(r_j - S_{ij})\]

where:

\(r_i\) = number of sites where species \(i\) occurs
\(r_j\) = number of sites where species \(j\) occurs
\(S_{ij}\) = number of sites where both species occur together

The C-score is the mean of all CU values across all species pairs:

\[C = \frac{1}{\binom{n}{2}}\sum_{i=1}^{n-1}\sum_{j=i+1}^{n} CU_{ij}\]

where \(n\) is the total number of species.

How expected C-score is calculated

The expected C-score represents what you'd expect by chance using the selected null model:

Randomize the matrix using the selected algorithm (preserving row totals, or both row and column totals)
Calculate the C-score on this randomized matrix
Repeat N times to build a null distribution
Expected C-score = mean of all N randomized C-scores

Interpretation

The tool uses Standardized Effect Size (SES) to interpret results:

\[\text{SES} = \frac{C_{obs} - \bar{C}_{null}}{\sigma_{null}}\]

where \(C_{obs}\) is the observed C-score, \(\bar{C}_{null}\) is the mean of the null distribution, and \(\sigma_{null}\) is the standard deviation of the null distribution.

SES > 2: Strong checkerboard pattern (species avoid each other)
SES > 0: Weak checkerboard pattern
SES < 0: Aggregation pattern (species co-occur)
SES < -2: Strong aggregation pattern

Null models for C-score

This tool offers two null models for the C-score permutation test:

Fixed rows + columns (recommended)

The "Fixed-Fixed" null model:

Row sums ARE conserved: Each species maintains its total number of occurrences (species frequencies preserved)
Column sums ARE conserved: Site richness (number of species per site) is also preserved after shuffling
Algorithm: Uses the curveball algorithm - randomly selects two rows and swaps columns where only one species is present
Use when: You want to control for both species rarity AND site heterogeneity. This is the most conservative and widely-used null model for C-score.

Fixed rows only

The "Fixed-Equiprobable" null model:

Row sums ARE conserved: Each species maintains its total number of occurrences
Column sums are NOT conserved: Site richness can vary randomly after shuffling
Algorithm: Shuffles each row (species) independently
Use when: Site heterogeneity is not important for your analysis, or with very sparse matrices

References

Stone, L., & Roberts, A. (1990). The checkerboard score and species distributions. Oecologia, 85(1), 74-79.
Gotelli, N. J. (2000). Null model analysis of species co-occurrence patterns. Ecology, 81(9), 2606-2621.