Co-occurrence analysis
Format: Species × Sites (rows × columns, values 0/1)
Pairwise tests
Community-level C-score analysis
About this tool
This tool performs two types of analyses on species co-occurrence data:
- Pairwise tests - Tests individual species pairs for association using either exact hypergeometric test or permutation test
- Community-level C-score analysis - Measures overall checkerboard patterns (aggregates pairwise information)
Pairwise tests: Two methods available
Method 1: Exact Hypergeometric Test (Recommended)
The exact hypergeometric test calculates the exact probability of observing the co-occurrence pattern without simulation.
How it works:
Given two species with \(r_i\) and \(r_j\) occurrences across \(m\) sites, the probability of observing exactly \(k\) co-occurrences under the null hypothesis (random distribution) follows the hypergeometric distribution:
\[P(X = k) = \frac{\binom{r_i}{k} \binom{m - r_i}{r_j - k}}{\binom{m}{r_j}}\]
where \(\binom{n}{k}\) is the binomial coefficient "\(n\) choose \(k\)".
P-values explained:
- P-value (Lower tail): \(P(X \leq k_{obs})\) = Probability of observing this many or fewer co-occurrences. Low values (< 0.05) indicate avoidance.
- P-value (Upper tail): \(P(X \geq k_{obs})\) = Probability of observing this many or more co-occurrences. Low values (< 0.05) indicate attraction/co-occurrence.
- P-value (Two-tailed): \(2 \times \min(P_{lower}, P_{upper})\) = Tests for any significant deviation from random expectation (either too many or too few co-occurrences).
- Significance: If the two-tailed p-value < 0.05, the pattern is statistically significant.
Method 2: Permutation Test
A permutation test is a non-parametric statistical method that doesn't assume any particular distribution. It tests hypotheses by randomly rearranging (permuting) the data many times and comparing the observed pattern to the distribution of permuted patterns.
Step 1: Calculate observed co-occurrence
Count how many sites both species occupy together:
\[k_{obs} = \sum_{s=1}^{m} \mathbb{1}(\text{species}_i[s] = 1 \text{ AND } \text{species}_j[s] = 1)\]
Step 2: Generate permutations
Randomly shuffle each species' presence/absence pattern across sites independently. This breaks any real association while maintaining each species' total number of occurrences.
Step 3: Calculate permuted co-occurrences
For each permutation, count co-occurrences again. This builds a distribution of what we'd expect by chance.
Step 4: Compute statistics
\[\text{Expected} = \frac{1}{N}\sum_{p=1}^{N} k_p\]
\[\text{StdDev} = \sqrt{\frac{1}{N}\sum_{p=1}^{N}(k_p - \text{Expected})^2}\]
\[\text{Effect Size} = \frac{k_{obs} - \text{Expected}}{\text{StdDev}}\]
Step 5: Calculate p-values
Lower tail p-value: Proportion of permutations with co-occurrences ≤ observed
\[P_{lower} = \frac{\#\{k_p \leq k_{obs}\} + 1}{N + 1}\]
Upper tail p-value: Proportion of permutations with co-occurrences ≥ observed
\[P_{upper} = \frac{\#\{k_p \geq k_{obs}\} + 1}{N + 1}\]
Two-tailed p-value: Proportion of permutations as extreme or more extreme than observed
\[P_{two} = \frac{\#\{|k_p - \text{Expected}| \geq |k_{obs} - \text{Expected}|\} + 1}{N + 1}\]
Step 6: Interpret results
- Low p-value (< 0.05): The pattern is unlikely due to chance alone
- Positive effect size: Species co-occur more than expected (co-occurrence/attraction)
- Negative effect size: Species co-occur less than expected (avoidance)
C-score (checkerboard score)
The C-score is a community-level metric that quantifies the average "checkerboard pattern" across all species pairs in a community. It measures whether species tend to avoid each other (high C-score) or co-occur (low C-score) more than expected by chance.
Key insight: The C-score aggregates information from all pairwise comparisons into a single community-level metric.
Formula
For each species pair \((i, j)\), the checkerboard unit (CU) is:
\[CU_{ij} = (r_i - S_{ij})(r_j - S_{ij})\]
where:
- \(r_i\) = number of sites where species \(i\) occurs
- \(r_j\) = number of sites where species \(j\) occurs
- \(S_{ij}\) = number of sites where both species occur together
The C-score is the mean of all CU values across all species pairs:
\[C = \frac{1}{\binom{n}{2}}\sum_{i=1}^{n-1}\sum_{j=i+1}^{n} CU_{ij}\]
where \(n\) is the total number of species.
How expected C-score is calculated
The expected C-score represents what you'd expect by chance using the selected null model:
- Randomize the matrix using the selected algorithm (preserving row totals, or both row and column totals)
- Calculate the C-score on this randomized matrix
- Repeat N times to build a null distribution
- Expected C-score = mean of all N randomized C-scores
Interpretation
The tool uses Standardized Effect Size (SES) to interpret results:
\[\text{SES} = \frac{C_{obs} - \bar{C}_{null}}{\sigma_{null}}\]
where \(C_{obs}\) is the observed C-score, \(\bar{C}_{null}\) is the mean of the null distribution, and \(\sigma_{null}\) is the standard deviation of the null distribution.
- SES > 2: Strong checkerboard pattern (species avoid each other)
- SES > 0: Weak checkerboard pattern
- SES < 0: Aggregation pattern (species co-occur)
- SES < -2: Strong aggregation pattern
Null models for C-score
This tool offers two null models for the C-score permutation test:
Fixed rows + columns (recommended)
The "Fixed-Fixed" null model:
- Row sums ARE conserved: Each species maintains its total number of occurrences (species frequencies preserved)
- Column sums ARE conserved: Site richness (number of species per site) is also preserved after shuffling
- Algorithm: Uses the curveball algorithm - randomly selects two rows and swaps columns where only one species is present
- Use when: You want to control for both species rarity AND site heterogeneity. This is the most conservative and widely-used null model for C-score.
Fixed rows only
The "Fixed-Equiprobable" null model:
- Row sums ARE conserved: Each species maintains its total number of occurrences
- Column sums are NOT conserved: Site richness can vary randomly after shuffling
- Algorithm: Shuffles each row (species) independently
- Use when: Site heterogeneity is not important for your analysis, or with very sparse matrices
References
- Stone, L., & Roberts, A. (1990). The checkerboard score and species distributions. Oecologia, 85(1), 74-79.
- Gotelli, N. J. (2000). Null model analysis of species co-occurrence patterns. Ecology, 81(9), 2606-2621.