You are on page 1of 23

Co-occurrence

Topics 1. 2. 3. 4. 5. 6. 7. 8. Introduction to Co-occurrence Data Format Defaults Options Output Caveats Co-occurrence Tutorial Literature Cited

1. Introduction to Co-occurrence
The co-occurrence module lets you test for non-random patterns of species co-occurrence in a presenceabsence matrix. For example, suppose a pair of species in an archipelago compete with one another and never occur on the same island. Islands support one species, or the other, but not both. Diamond (1975) identified this as a checkerboard distribution, and argued that the presence of many checkerboard pairs in a community is evidence of deterministic assembly rules. Connor and Simberloff (1979) were among the first to rigorously test such patterns against a null hypothesis of random community assembly. The publication of this pair of papers ignited an acrimonious debate in community ecology that has lasted over 20 years, with no end in sight (see reviews in Wiens 1989 and Gotelli and Graves 1996). To many scientists, this debate is synonymous with null model analysis. The controversy has been about the statistical behavior of particular null model algorithms, as well as more general philosophical criticisms of null models and hypothesis testing in community ecology. It would be impossible in this help file to review the debate, which resembles a Hieronymus Bosch painting in its complexity and strangeness. Given the importance of the controversy, it is essential that you familiarize yourself at least with the literature that reviews these clashes (Harvey et al. 1983, Wiens 1989, Gotelli and Graves 1996), as well as some of the original papers. EcoSim's co-occurrence module incorporates important statistical advances that have occurred in null model analysis since Connor and Simberloff's (1979) initial publication (e.g., Schluter 1984, Wilson 1987, Manly 1995, Stone and Roberts 1990, 1992, Gotelli et al. 1997, Gotelli 2000). The complex statistical issues in the analysis of co-occurrence data are reflected in the wide range of options available in EcoSim for cooccurrence analysis, and the lengthy tutorial and help text necessary to explain these analyses. Even within the standard module, there are 4 co-occurrence metrics and 9 simulation procedures for a total of 36 simulation variations! Moreover, the user can make decisions about the treatment of degenerate matrices (those with empty rows or empty columns) and the inclusion or exclusion of empty sites. Finally, EcoSim provides you with a powerful "user-defined" option that allows you to incorporate relative weightings for different species and different sites. These weights allow you to incorporate independent data on site quality and species dispersal potential for very sophisticated and powerful analyses. However, this freedom also gives you enough rope to effectively hang yourself. Although all of the simulation procedures are logical and reasonable, some of the variations are quite prone to Type I error (incorrectly rejecting the null hypothesis of random assembly). If you use these procedures carelessly with your data, you will make some serious errors.

Why, then, have we even provided you with all of these options? We believe it is important to explore patterns in your data by systematically altering the assumptions of your model (just as an experimental ecologist systematically alters treatment conditions). We reject the "one size fits all" mentality that reduces statistical analysis to a cookbook method (something that is very easy to achieve with sophisticated computer programs). EcoSim's palette of models, including some inappropriate ones, help you to understand the patterns in your data, by allowing you to systematically alter the models assumptions. Elsewhere (Gotelli 2000), we have evaluated EcoSim's universe of null models by assessing their performance with random and structured data sets. These analyses reveal the relative risk of Type I and Type II errors. This companion paper should ideally be consulted as you analyze and interpret your data. This help file will summarize the essential results of those analyses and advise you on how to set up and interpret your model, depending on the kind of data you have and the assumptions you make about species and sites. In some cases, however, the requirements of your model may be impossible to meet with the data matrix you are analyzing. If EcoSim cannot create a random assemblage after 3000 attempts, it will stop and ask you to redesign your model. We recommend that you first work through the tutorial example, which uses two data sets on West Indian finches and Virginia ants, to learn about the major features of the program. Then you can study the rest of this help file to learn about all of the features that are available for null model analysis.

2. Data Format
The input for co-occurrence analyses is a presence-absence matrix. Each row represents different species and each column represents a different site. A "1" indicates a species is present at a particular site, and a "0" indicates that a species is absent from a particular site. EcoSim will accept empty rows (species that do not occur in any of the sites) and empty columns (sites that do not contain any species), and this module offers you several options for how empty rows and columns are treated in the simulation. As in all EcoSim modules, the first column is reserved for species names, and the first row is reserved for site names. In the co-occurrence module, EcoSim treats treats any positive, real number to a "1", allowing you to conveniently analyze an abundance matrix without having to recode your data. EcoSim will also run cooccurrence data sets with negative numbers or non-numeric characters, but the results will be meaningless. Garbage in, garbage out, so make sure you have edited and proofed your data set before analysis!

3. Defaults
EcoSim generates 1000 random matrices as the default. The default co-occurrence index is Stone and Robert's (1990) C-score. The default randomization algorithm maintains fixed sums for row and column constraints. Thus, each matrix generated has the same row and column totals as the original matrix (Connor and Simberloff 1979). As described in the tutorial, this algorithm has good Type I properties (low chance of falsely rejecting the null hypothesis when it is true), but also has good power for detecting non-random patterns in noisy data sets. The degenerate matrix option is set to "retain", which means that any random matrices that have row or column sums of zero will be retained. However, degenerate matrices will not be generated by fixed-fixed constraints, unless there are empty rows and columns in the original matrix.

4. Options
Indices
EcoSim offers you four different co-occurrence indices. 1) Stone and Roberts' (1990) C-score This is EcoSim's default index. The C-score measures the average number of "checkerboard units" between all possible pairs of species. A checkerboard unit is any submatrix of the form: 10 01 or 01 10 The number of checkerboard units (CU) for each species pair is calculated as:

where is S is the number of shared sites (sites containing both species) and ri and rj are the row totals for species i and j. The C-score is the average of all possible checkerboard pairs, calculated for species that occur at least once in the matrix. In a competitively structured community, the C-score should be significantly larger than expected by chance. 2) The number of checkerboard species pairs This index follows directly from Diamond's (1975) assembly rules analysis. For this index, EcoSim scans the rows of the matrix and tabulates the number of species pairs that never co-occur in any site. In a competitively structured community, there should be more checkerboard pairs of species than expected by chance. 3) The number of species combinations For this index, EcoSim scans the columns of the presenceabsence matrix and keeps track of the number of unique species combinations that are represented in different sites. For an assemblage of n species, there are 2n possible species combinations, including the combination of no species being present (Pielou and Pielou 1968). In most real matrices, the number of sites (= columns) is usually substantially less than 2n , which places an upper bound on the number of species combinations that can be found in both the observed and the simulated matrices. In a competitively structured community, there should be fewer species combinations than expected by chance. 4) The variance ratio The V-ratio was first proposed by Robson (1972) and popularized by Schluter (1984) who recommended it as an index of species co-occurrence. The ratio is calculated as the ratio of the variance of the column sums to the sum of the row variances. For a presence-absence matrix, this is the ratio of the variance in species richness to the sum of the variance in species occurrence. If the species are

distributed independently and the sites are equiprobable, the expected value of the ratio is 1.0. If there is strong negative covariance between species pairs, the variance will ratio will be < 1.0 and if there is positive covariance between species pairs, the variance ratio will be > 1.0. Unlike the other three indices, the variance ratio does not actually depend on the species co-occurrence patterns, but is determined solely by the marginal totals of the matrix. For this reason, it cannot be tested with EcoSim's default algorithm, which maintains observed marginal totals. The variance ratio is best thought of as an index of variability in species richness per site. If niche limitation constrains the number of coexisting species, the variance in species richness among sites will be small relative to the null model. In a competitively structured community, the observed variance ratio should be significantly smaller than expected by chance.

Row Constraints
EcoSim provides 4 options for row constraints: 1) Equiprobable With this option, each row, or species, is equally likely to be represented. Setting the row total to equiprobable eliminates observed differences in the commonness and rarity of species from the null assemblages. 2) Fixed sum With this option, the observed row totals are maintained in the simulation. In other words, the number of occurrences of each species in the null communities is the same as in the original data set. This is EcoSim's default constraint. 3) Proportional With this option, the rows are filled randomly, but the probability of a particular row being "hit" is proportional to its row total. With this option, the row totals are not fixed, but on average, the rank order of species in the null assemblages based on row totals will match the rank order of species in the original matrix. 4) User-defined If you select this option, an edit window will pop up. The first column of the edit window lists the species, and the second column gives weights for each species. If you retain the default weights for each species of 1.0, the simulation will behave as though you had checked the "equiprobable" option, because each species has the same weight, and therefore the same chance of being selected. Instead, you will want to assign non-negative real numbers for each species. The relative size of these weights determine the probability of occurrence for each species. EcoSim's default for these options is to maintain fixed row sums. Unless you have some independent measurements for species weights, we suggest you stay with the default option of fixed row sums. For most algorithms, using equiprobable or proportional row sums inflates the probability of a Type I error (falsely rejecting the null hypothesis when it is true). The tutorial provides more details on selecting row constraints in your simulation.

Column Constraints
EcoSim provides 4 options for column constraints: 1) Equiprobable With this option, each column, or site, is equally likely to be represented. Setting the row total to equiprobable eliminates observed differences in species richness of sites from the null assemblages.

2) Fixed sum With this option, the observed column totals are maintained in the simulation. In other words, the number of species in each site in the null communities is the same as in the original data set. This is EcoSim's default constraint. 3)Proportional With this option, the columns are filled randomly, but the probability of a particular column being "hit" is proportional to its column total. With this option, the column totals are not fixed, but on average, the rank order of sites based on species richness in the null assemblages will match the rank order of sites in the original matrix. 4) User-defined If you select this option, an edit window will pop up. The first column of the edit window lists the species, and the second column gives weights for each site. If you retain the default weights for each site of 1.0, the simulation will behave as though you had checked the "equiprobable" option, because each site has the same weight, and therefore the same chance of being selected. Instead, you will want to assign non-negative real numbers for each site . The relative size of these weights determine the probability of a species occurring in a particular site. EcoSim's default for these options is to maintain fixed row sums. Unless you have some independent measurements for site weights, we suggest you stay with the default option of fixed column sums. The tutorial provides more details on selecting column constraints in your simulation.

Degenerate Matrices
Degenerate matrices are those that contain "missing species" or "empty sites". In other words, a degenerate matrix has at least one row total that equals zero ("missing species") or at least one column total that equals zero ("empty site"). EcoSim accepts degenerate matrices as input, because you may have found sites with no species, or started with a species list that was expected for a set of sites and discovered that some of the species were missing. These missing species and sites are important in the analysis of co-occurrence data, but unfortunately, they are often not reported in the ecological literature. EcoSim may also create degenerate matrices when it simulates null communities according to the constraints set by the user. In fact, only the default options of "fixed row sums" and "fixed column sums" is guaranteed to produce a set of non-degenerate matrices. All of the other options may create degenerate matrices, depending on the structure of the original matrix. In general, the more "filled" the matrix is with 1s, the less likely it is that degenerate matrices will be created. EcoSim provides you with 4 options for dealing with these degenerate matrices, once they are created in a simulation: 1) Retain This option keeps all matrices that meet the row and column constraints that were chosen by the user. This is EcoSim's default option, and is also the fastest option. 2) Discard This option discards any degenerate matrix it creates, and tries again with the same algorithm to create a matrix that is not degenerate. If EcoSim cannot do this after 3000 tries on a matrix, it gives up and aborts the simulation. 3) Split This option keeps all of the matrices that are created by EcoSim, but sorts them into degenerate and non-degenerate matrices. In the output, the observed co-occurrence index is compared to the nondegenerate matrices in the "index" tab, and the index is compared to the degenerate matrices in the "degenerate index" tab. One of the tabs may be empty (and often is) if the simulation created matrices that

"degenerate index" tab. One of the tabs may be empty (and often is) if the simulation created matrices that were all degenerate or non-degenerate. This option is a useful probe for checking whether the results change when using degenerate matrices (usually they don't). If the two histograms appear similar, you may want to re-run the analysis using the "retain" option. 4) Fix For this option, EcoSim repairs any degenerate matrix by transferring randomly one of the cell occurrences from an occupied column (or row) to an empty column (or row). If EcoSim is unable to fix the matrix after 3000 tries, it gives up and aborts the simulation. Although this option might seem to be worthwhile, we have found that it sometimes generates spurious significance (Type I error) and we don't recommend you use it. Although the inclusion of degenerate matrices can sometimes lead to lower significance levels, it usually doesn't make that much of a difference, so we recommend that you use the "retain" option, although you should certainly investigate the properties of the degenerate matrices with the "split" option.

Empty Sites
If you select the V-ratio or the number of species combinations as your co-occurrence index, you will notice that a new option box appears in the lower right-hand corner of the "preferences" tab. This option asks you whether to include or exclude empty sites. For these two indices, you must make a decision about whether to include empty sites (= columns with a total of zero) in the calculation of this index. Whatever choice you make determines how the index is calculated for both the observed and the simulated matrices. Because the same calculation is made for both the observed and simulated matrices, the choice will rarely affect the tail probability that is calculated by EcoSim. However, this choice will affect the size of the index. For missing species combinations, including empty sites will effectively increase the number of species combinations by 1, because empty sites are now scored and counted as a unique species combination. The effect of including empty sites on the variance ratio is more complex. Both the numerator (variance of the column sums) and denominator (sum of the row variances) of the variance ratio are affected by empty sites. However, the increase in the numerator is usually much greater than the increase in the denominator because the variance of rows is simply a binomial variance of 1s and 0s that will not be greatly affected by another zero. In contrast, the column sums represent species richness totals for each site, and the variance of these totals will be greatly inflated by including even a single empty column. So, the observed (and simulated) variance ratios are usually much larger when empty sites are included. EcoSim's default option is to retain the empty sites, although in practice this choice should not have much of an effect on the tail probability. You may be wondering why the "empty sites" choice does not appear when you choose the C-score or the number of checkerboard pairs as your co-occurrence index. The reason is that these two indices do not change when empty sites are excluded (try some simple pencil-and-paper calculations to convince yourself of this). A second, more important question that arises is -- why doesn't EcoSim provide a similar option for handling "missing species". In other words, why doesn't EcoSim allow you to specify whether or not missing species should be included in the calculation of the index? The answer is that EcoSim always deletes missing species from observed and simulated matrices before it calculates any co-occurrence index. For the V-ratio and the number of species combinations, empty rows do not affect the index. In contrast, empty rows would greatly change the C-score and the number of checkerboard pairs if they were included. However, it makes no sense to score a perfect checkerboard (or

calculate a C-score) for a species that never occurs in the archipelago. These indices are trying to quantify co-occurrence of species that are actually present. Including empty rows in this calculation would distort the index, so EcoSim removes the empty rows from both observed and simulated indices before calculating cooccurrence indices. Note that decision to include or exclude degenerate matrices (those with empty rows or columns) is a distinct issue from the calculation of pattern in a degenerate matrix that has been retained by EcoSim.

5. Output
Input Matrix Tab
The Input tab shows you the original utilization matrix, with all of its labels. You cannot edit the data in this window, but you can refer back to the original data set as you study the simulation results.

Simulation Tab
The Simulation tab shows you the most recent simulated matrix that was created by EcoSim. By clicking back and forth between the Input and Simulation tabs, you can examine this randomized matrix and convince yourself that EcoSim has randomized the data in the way that you wished. Different randomization algorithms will change the appearance and structure of the simulated matrix. Also note that the contents of the simulated matrix will change each time you run the simulation, unless you have reentered a particular random number seed.

Pairwise Tab
This tab only appears if you have chosen C-score as the co-occurrence index. This tab shows the number of checkerboard units calculated between each unique pair of species in the matrix. A checkerboard unit is any submatrix of the form: 10 01 or 01 10 The number of checkerboard units for each species pair is calculated as:

where is S is the number of shared sites (sies containing both species) and ri and rj are the row totals for species i and j. The C-score is the average of all possible checkerboard pairs, calculated for species that occur at least once in the matrix.

Index Tab

This tab gives the actual probability test in which the observed co-occurrence index is compared to the index in the simulated communities. In the left-hand column of this tab, you will see an observed cooccurrence index. The next three columns form the histogram window, which summarizes the distribution of the cooccurrence indicdes for the simulated communities. The first two columns give the low and high boundaries of 12 evenly spaced histogram bins. In the right-hand column, the number of simulations tells you how many of the simulated indices were in each bin. These integers sum up to the total number of iterations that were specified for the run. The placement of the observed index shows you, graphically, where the observation fell in the histogram distribution. You can use these data to plot the histogram and the observed value if you want to illustrate your results with a graph. The lower window gives summary statistics (mean and variance) for the co-occurrence index of the simulated communities. It then tells you the tail probability that the observed index was greater than or less than expected by chance.

Degenerate Index Tab


This tab only appears if you have selected the "split" option for degenerate matrices. The output is identical to that in the adjacent "index" tab. However, the index tab compares the observed co-occurrence index to the simulated non-degenerate matrices (those wtih no empty rows or columns), and the "degenerate index" tab shows the results for degenerate matrices. Often, one of these two tabs may be empty because the simulated matrices were either all degenerate or all non-degenerate.

Summary Tab
The summary tab gives the simulation conditions, including the name of the input file, the randomization algorithm, co-occurrence index, number of iterations, constraints, and random number seed. Next, it presents the information that was contained in the index and degenerate index tabs. The summary window also supplies you with the standardized effect size, which is calculated as: observed index - mean(simmulated indices)/standard deviation(simulated indices) This metric is analagous to the standardized effect size that is used in meta-analyses (Gurevitch et al. 1992). It scales the results in units of standard deviations, which allows for meaningful comparisons among different tests. Roughly speaking a standardized effect size that is greater than 2 or less than -2 is statistically significant with a tail probability of less than 0.05. However, this is only an approximation, and it assumes that the data are normally distributed, which is often not the case for null model tests. For any individual study, you should always report the actual tail probability, which is calculated directly from the simulation, and does not require any assumptions about normality of the data. Finally, the summary tab shows the original presence-absence matrix, including the row and column labels. All of these data can be edited, deleted, or annotated. The output can then be saved (Save to File) or discarded (Close). There is also a small time clock in the lower right-hand corner so you can tell how long your simulation took.

6. Caveats

6. Caveats
The analysis of co-occurrence data has been so contentious in community ecology that it is difficult to concisely describe the caveats and issues surrounding data interpretation. EcoSim gives you many choices for this analysis, but you will be on fairly solid ground if you stick with the default options. This test has good Type I properties, which means that if you reject the null hypothesis, you can be fairly confident your data are non-random. Some of the other algorithms that are available in co-occurrence do not behave this way. In particular, you should be very suspicious of any algorithm that does not maintain fixed row totals. The tutorial provides more details on choosing among the different algorithms and indices. If you don't reject the null hypothesis, there is always the danger of a Type II error. That is, the data are non-random, but this pattern was not detected by the analysis. A careful, systematic comparison of the data set with several different null models may help to pinpoint which assumptions are generating random (or non-random) patterns in your data. However, even if the null hypothesis is rejected, you may not be able to leap to the conclusion that competition has led to less co-occurrence than expected by chance. At least two other viable hypotheses need to be considered. The first is that the pattern reflects "habitat checkerboards": species are associated with different abiotic features of the sites which leads to less co-occurrence than expected by chance. Thus, one finch species may prefer dry sites and another may prefer wet sites. Although the two finches colonize sites independently and do not interact, they may exhibit a perfect checkerboard. More detailed knowledge about the biology of habitat selection and the kinds of habitats available on the islands can provide insight into this sort of pattern. A second explanation is that historical or phylogenetic processes have led to less coexistence than expected by chance. In particular, allopatric speciation may lead to a pattern of little or no coexistence among congeners, whether or not there is competition occurring. The situation is even more complicated than this, because these three hypotheses for explaining non-random co-occurrence (competition, habitat checkerboards, and historical effects) are not mutually exclusive, and there may be interactions between mechanisms. For example, competitive interactions may lead to the evolution of distinct habitat preferences and a reduction in niche overlap. However, a first important step towards sorting out these ideas is to at least establish whether the patterns are random or not. That is, do we even have anything interesting to talk about in the data? Ecologists continue to struggle with these issues, but at least EcoSim can now clarify and simplify some of the problems associated with the analysis of your data.

7. Co-occurrence Tutorial
West Indian finches
Launch EcoSim and you will see the familiar opening 5 x 5 matrix of species and sites. Use the file menu to open the file called "West Indies finches.txt". This data set is a presence-absence matrix for finches (Fringillidae) of the West Indies (Gotelli and Abele 1982). These islands have been censused for over a century by many ornithologists, so the species list is probably complete. Each row is one of the 17 finch species in the West Indies, and each column is one of the 19 major islands. You can quickly find the matrix dimensions of the current data file by selecting the "about" option in the file menu. Each entry in the matrix represents the occurrence of a particular species on a particular island. A "1" means the species is present and a "0" means the species is absent. This presence-absence matrix is the object of study in the co-occurrence module. EcoSim measures pattern in this matrix with one of 4 possible cooccurrence indices. It then creates a sample of random matrices (subject to some constraints) and statistically compares the co-occurrence index in the observed and simulated data sets.

Your first co-occurrence analysis


Now select co-occurrence from the analysis menu. Immediately switch to the "general" tab and set the random number seed to 10. Normally, you should use the default seed of 0, which instructs EcoSim to get a fresh random number seed from the system clock each time a new analysis is requested. In this case, by choosing a particular random number seed, your results will match up exactly with those in this tutorial. As always, EcoSim will remember your settings until you change them or restart the program. Switch back to the preferences tab and take a look at the 4 choices for the co-occurrence index. In this analysis, we will first use the C-score, which is EcoSim's default choice. This index, created by Stone and Roberts (1990), measures the average number of checkerboard units averaged across all possible species pairs. A checkerboard unit is a submatrix of the form: 0...1 .. .. .. 1...0 The dots indicate that the submatrix does not have to be comprised of adjacent rows or columns. For any pair of species, the number of checkerboard units (CU) is:

where is S is the number of shared sites (sies containing both species) and ri and rj are the row totals for species i and j. The C-score is the average of all possible checkerboard pairs, calculated for species that occur at least once in the matrix. The C-Score measures ther tendency for species to not occur together. The larger the C-score, the less the average co-occurrence among species pairs. If a community were structured by competition, we would expect the C-score to be large relative to a randomly assembled community. The next pair of options asks you to make a decision about how the row and column constraints of the matrix are to be handled in the simulation. These choices are critical in determining the way the random matrices are constructed. For this analysis, we will retain EcoSim's default choice of fixed row sums and fixed column sums. These choices mean that EcoSim will create random matrices in which the row totals and the column totals are the same as in the observed matrix. Thus, each island in the simulated data will have exactly the same number of species as in the real data (column totals fixed). Similarly, each species will occur on the same number of islands as in the real data (row totals fixed). The next option that you see asks you what to do about degenerate matrices. These are matrices that have missing species (row sums of zero) or empty islands (column sums of zero). Because the simulation procedure we are using maintains fixed row and column sums, it will never create a degenerate matrix. Therefore, the choice of this option does not make a difference, although it could matter in other simulations where degenerate matrices are created. Using all of EcoSim's default values, run the simulation, which should take a few seconds to complete 1000 replicates. You will first see the progress bar move slowly, then start again and speed quickly through the replicates. During the first "pass", EcoSim is making 10,000 initial transpositions to create an initial state that is very different from the original matrix. Then it begins creating the actual iterations and calculating the statistics for 1000 randomized communities you requested.

statistics for 1000 randomized communities you requested.

Output from co-occurrence analysis


The output screen shows you five tab windows. The first tab "input" gives the original input matrix. The second tab "simulation" gives one of the simulated matrices. If you carefully compare these two matrices, you will see that they are different, although they have the same row and column sums. For example, the island of Cuba (first colunn) has 4 species of finches. The simulated matrix also placed 4 species on Cuba, but the identities were not the same as in the original matrix. Conversely, the first species in the observed matrix, Carduelis dominicensis, occurs only on the island of Hispaniola. In the simulated matrix, C. dominicensis also occurs on only one island, but the computer placed it, by chance, on Martinique. The third tab "pairwise" shows the number of checkerboard units, calculated between all possible pairs of species. Notice that for some species pairs, such as the first two in the matrix, no checkerboard units were found, so the entry for those pairs is a zero. The C-score is calculated as the average of all the pairwise values for a matrix. The C-score is calculated for the observed matrix, and then compared statistically to the C-score values calculated from the sample of simulated matrices. This comparison is shown in the fourth tab index. On the left, you see that the observed C-score for the finch matrix was 3.79412. In contrast, the average of the 1000 simulated matrices was 2.76246, and none of the simulated matrices had a C-score as large as the observed. So, compared to the simulated universe of random matrices with identical row and column sums, there is much less co-occurrence in the finch matrix than expected by chance (p < 0.001). All of this information is shown in the summary window, which is your complete paper trail of the analysis. As always, you can edit this window as a text file so the output can be annotated. The output includes a standardized effect score of 7.47, indicating that the C-score for the observed finch matrix was over 7 standard deviations greater than the mean! EcoSim's default values, which were illustrated in this analysis, are probably the safest options to use if you are unsure how to select among the different models. This model is most similar to the original Connor and Simberloff (1979) analysis, although it avoids the statistical problems that emerged from that study. This model has good Type I properties, and thus is unlikely to cause you to incorrectly reject the null hypothesis when it is true. At the same time, when used with the C-score, this analysis does a surprisingly good job of detecting significant patterns in non-random data sets that have a good deal of random "noise" in the cooccurrence patterns (Gotelli 2000).

Exploring other co-occurrence metrics


Next you should explore EcoSim's other co-occurrence metrics. Change the co-occurrence index from the C-score to the number of checkerboards and run the analysis again. The number of checkerboards is defined as the number of species pairs in the matrix that form perfect checkerboards and never co-occur. The observed number of checkerboard pairs was 91, quite close to the average of the simulated matrices, which was 89.33. The tail probability for this observation is not extreme (p = 0.251), and there is no reason to reject the null hypothesis that the number of checkerboard pairs is random. This is quite a different result from the analysis with the C-score. Although the finches co-occur less often than expected by chance, the pattern is not caused by species pairs that never co-occur, forming perfect checkerboards as Diamond (1975) had originally suggested. Instead, species pairs, on average, co-occur less frequently than expected by chance. Another metric EcoSim lets you analyze is the number of unique species combinations in the archipelago (Pielou and Pielou 1968). Change the co-occurrence index to the number of species combinations. When

(Pielou and Pielou 1968). Change the co-occurrence index to the number of species combinations. When you make this change, you will notice a new choice appears on the preferences window, asking you whether to include or exclude empty sites. Fortunately, you don't need to worry about this option, because the simulation algorithm that you have chosen (fixed rows, fixed columns) does not generate matrices with empty sites. Now go ahead and run the simulation. You will see from the output that there are 10 different species combinations represented among the 19 islands. However, the expected number of combinations is 15.54. Although this seems like a modest difference, the observed number of species combinations is significantly less than expected (p < 0.001), again suggesting that species co-occurrences are non-random. Finally, try the analysis with the remaining co-occurrence index, the V-ratio. This index measures the average statistical covariance between pairs by calculating the ratio of the variance of the column sums (species richness) to the sum of the variances of the rows. If this index is greater than 1.0, it indicates positive covariance between species pairs and if it is less than 1.0, it indicates negative covariance (Schluter 1984). For the West Indian finches, the observed index was 1.229. For every one of the 1000 simulated matrices, the V-ratio was also 1.229. Why did this happen? Unlike the other three co-occurrence indices (C-score, number of checkerboards, and number of species combinations), the V-ratio is determined exclusively by the row and column totals of the matrix. Therefore, since these values were fixed in the simulation procedure you used (fixed row sums, fixed column sums), there is no change in the index, even though all of the simulated matrices were "different" from the original matrix. To use the variance ratio, you must change the simulation constraints, which is the next topic we will explore. One point that emerges from these analyses is that different indices can be expected to give different results. Although all the indices measure some aspect of "co-occurrence" they are not measuring the same property of a presence-absence matrix. So, which index should you use? Overall, the number of checkerboards and the number of species combinations are most sensitive to "noise" or randomness in the data set (including measurement error) because a change in a single matrix entry can greatly alter the index. In contrast, the C-score and the V-ratio are much less sensitive to variability in the data, because they are based on averages that are calculated across all possible pairs of species. The V-ratio is the most difficult of the indices to understand, and the fact that its value is determined exclusively by the marginal totals, and not the actual co-occurrence patterns, means that it must be interpreted carefully. You can read more about the properties of these indices in other parts of this help file.

Exploring column constraints


Let's think carefully about the constraints we have put on our null model. Because we have fixed the row totals, each species in a random matrix occurs exactly the same number of times as in the real matrix. Similarly, each island contains exactly the same number of finch species as observed in nature. The rationale for these restrictions is that we wish to retain in our null model differences among sites and among species. These differences may not reflect species-interactions, so we want to hold them constant and measure the pattern "above and beyond" the effect of the marginal constraints. However, one could argue that the marginal constraints, themselves, reflect the effect of species interactions. After all, in a null community, why should there be exactly 4 species of finches occurring on Cuba but only 1 on St. Croix? If resources are limited on small islands (Lack 1976), then the reduced number of species may, itself, reflect competitive interactions, and we don't want to "smuggle in" these effects into our null model.

model. Therefore, we can alter some of these constraints to reflect different sampling universes. Suppose that all the islands are equivalent to one another. That is, from the species perspective, all the islands are equally likely to be successfully colonized. To explore this simulation, change the column constraints from "fixed sum" to "equiprobable". This change instructs EcoSim to maintain fixed row sums, but to distribute the occurrences for each species randomly among the columns (= islands). In the simulation, the occurrences for each species, that is the row sums, are distributed randomly among the different columns. For each occurrence, a column is chosen randomly and equiprobably, although if a cell already has a 1 placed in it, another column is randomly chosen until an empty site is found. This procedure is repeated until all of the occurrences of each species are randomly distributed among the columns. If you run this simulation, you will see that the result is similar to the analysis with fixed column sums: the C-score is significantly larger than expected under the null hypothesis (p < 0.001). The, next option, "proportional", gives you a simulation that is somewhat intermediate between maintaining fixed column sums, and assuming that all sites are equiprobable. In proportional, the occurrences of each species are once again distributed randomly among the columns, as in equiprobable. This time, however, the sites are not all equally likely to be selected at random. Instead, the probability that a site is "hit" is proportional to the column totals. It is as though you are throwing dice randomly to place species, but the dice are weighted, so that some numbers are more likely than others. The formula that EcoSim uses is simple: the probability that a column is chosen is the column total divided by the matrix occurrence total, which, in this case, is 55 species-occurrences (the grand sum of all matrix entries). In the finch matrix, the probability that Cuba is randomly chosen is 4/55 = 0.073, whereas the probability that Barbados, with only 2 resident finch species, is chosen is 2/55 = 0.036. Thus, Cuba is twice as likely to be selected as Barbados, and 4 times as likely to be selected as St. Croix, because the finch species totals on these islands are 4, 2, and 1, respectively. The result of this algorithm is that species number per island does fluctuate from one run to the next, but, on average, the column sums in the random matrices are similar to those in the observed matrix. If you run this model (rows: fixed sums; columns: proportional) for the finch matrix using the C-score, you will again obtain the result that the observed C-score is significantly greater than expected by chance (p < 0.001). EcoSim offers one other option for dealing with column totals: user-defined weights. If you click this button, an edit window opens up in which each column label is displayed, along with a column weight. The default weights are all 1.0, which means that all sites are equiprobable. In fact, if you select user-defined and use the default weights, the output will be nearly identical to a run in which you had selected equiprobable for the column constraints (Run these two models and see for yourself). Instead, you will want to incorporate weights that reflect different probabilities for the different islands. A natural choice for site weights might be the area of the islands. A useful null model is the "random placement" model (Coleman et al. 1982), in which islands represent "targets" of different area, and species represent "darts" that are tossed randomly at the set of different targets. If islands passively intercept individual colonists of different species, each island behaves as a target, and its chances of getting hit are directly proportional to its area. Open up user-defined for the column constraints, and use the file menu to load the file "West Indies islands.txt". This file contains the areas (in square miles) of each island in the finch matrix. Running your model with these weights again gives the result that the observed C-score is significantly greater than expected by chance. Moreover, if you examine the "simulation" tab, you will one of the smallest islands, St. Martin, did not receive any species in this particular simulation, even though this island actually supports 2

species. In contrast, the simulation procedure placed 13 species on the large island of Cuba, even though this island actually supports 4 species of finches. The great disparity in the island areas of Cuba (44,164 mi2 ) and Montserrat (33 mi2 ) are reflected in this simulation model. These four analyses (fixed sum, equiprobable, proportional and island area weights) greatly increase our confidence in the non-random pattern, because the results were quite robust to the way we treated the different sites in the simulation (column constraints). Let's now address the problem of how we treat the different species (row constraints).

Exploring row constraints


From the co-occurrence preferences screen, you can see that there is a symmetric set of options available to you for altering the row constraints. Namely, you can keep the observed row totals fixed, make the rows equiprobable, proportional to observed row totals, or proportional to user-defined weights for each species. However, our recommendation is that you not depart from the default choice of fixed row sums. Why not? Simulation studies with random matrices have shown that altering the row sums from the observed values can sometimes make the analysis vulnerable to Type I errors (Gotelli 2000). In other words, a random data matrix, with no structure in it, can generate a significant test result if you do not use fixed row sums! This is in contrast to changes in the column constraints, which do not seem to affect the chances of a Type I error. The situation is a bit more complicated than this, because the Type I error depends on both the row and column constraints and the index that is used. Table 1 (described elsewhere in the help file) shows you which combinations of constraints and indices are likely to give you problems with Type I error. You should carefully study this table when selecting constaints and indices for your simulation. The safest approach is to leave the row sums fixed. If you choose one of the other options, EcoSim will give you a warning, although it will dutifully carry out your simulation request. One interesting simulation model is to use proportional column constraints and proportional row constraints. This simulation behaves well with the V-ratio and the number of species combinations, but should not be used with the C-score, or the number of checkerboards, both of which give a high rate of false positives. In this simulation, neither the row nor the column totals are fixed. Instead, EcoSim randomly distributes the total number of species occurrences across the matrix. Thus, in the finch matrix, the 55 species occurrences in the matrix would be randomly distributed among the 323(= 17 rows ! 19 columns) cells in the matrix. In this model, the probability that a cell is hit is proportional to the product of its row and its column total. Specifically, the probability of occurrence is (rowtot x coltot)/(grand tot)2 . So, in the finch matrix, the most likely cell to be filled is the occurrence of Tiara bicolor on Hispaniola. This is the species with the greatest number of occurrences (17) and the island with the greatest number of finches (7). Conversely, the least likely occurrence in this matrix would be Carduelis dominicensis on the island of St. Croix, because this species only occurs on one island, and because this island has only one species of finch (other rare speciesisland combinations in this matrix also generate this result). All other occurrences have probabilities that are in between these bounds. As always, once a cell is filled in the simulation, it cannot be used again, and EcoSim will repeat its random search for an empty cell in which to place each occurrence. Try running the proportional-proportional model with the V-ratio as the co-occurrence statistic. You will find an observed V-ratio of 1.229, which is similar to that generated by the null model (mean = 1.253, p = 0.522). One final comment about row and column constraints is that we have very little understanding of how these

One final comment about row and column constraints is that we have very little understanding of how these models behave when used with user-defined weights. As always, we recommend systematic comparisons with some of the more standard models to help you understand your results.

Analyzing degenerate matrices


In our analysis of the finch matrix, we concentrated on the row and column constraints and how they affected the analysis of pattern using the four co-occurrence indices. In this section, we will explore the ways that EcoSim deals with degenerate matrices. A degenerate matrix is one in which some species are not included in the samples, or some samples contain no species. These matrices have one or more row or column sums of zero. Unfortunately, missing species or empty sites are often not included when ecologists collect or report their data. So, one argument in favor of excluding degenerate matrices is that we want to create a sample of null matrices that are also nondegenerate, and therefore directly comparable to the observed matrix. If you accept this argument, you will probably want to use "fixed" for both the row and column constraints. This will ensure that the row and column totals are maintained in your simulation. Note that if you use fixed-fixed with an input matrix that is degenerate, EcoSim will maintain in all of the simulated matrices the empty rows and columns that were present in the input matrix. We have chosen fixed, fixed as the default because it does not involve complicated choices about the handling of degenerate matrices, and because it has surprisingly good statistical properties (Gotelli 2000). Fixed-fixed is the only simulation algorithm that will never create degenerate matrices. All of the other algorithms have the potential to create degenerate matrices. Whether they actually do or do not will depend a great deal on the matrix structure. The more "filled" the matrix is with 1s, the less likely it is that degenerate matrices will be created. Conversely, matrices with lots of zeroes will often generate degenerate matrices. In fact, for some matrices, it may be impossible for EcoSim to create anything but degenerate matrices for the constraints that you have chosen. An argument in favor of using degenerate matrices is that we might expect these to arise by chance in a null or randomly constructed community. With random colonization or placement of species on islands, it is easy to imagine that some samples might have no species, or that some species might not be represented in any samples. Moreover, degenerate matrices often turn up in field sampling data. Small-scale samples (pitfall traps, quadrats, transects, etc.) may often contain no individuals. It doesn't seem valid to use the fixed, fixed model in this case because the row and column totals might be very different in another random sample of the same community. For this reason, EcoSim allows you to use other simulation algorithms and to include degenerate matrices in your analyses.

Virginia ants
To illustrate the analysis of degenerate matrices, we will use the data file "Virginia ants.txt", which you should load into EcoSim. This data set shows the occurrence of 11 species of ground-foraging ants that were collected in a grid of 25 pitfall trap samples from an open field in Prince Edward County, Virginia (Arnett 1998). These data are part of a regional census of ants at 33 sites in the eastern U.S. Most of the species in the collection occurred in only a single sample, and 13 of the 25 traps contained no ants. The most common taxon (Aphaenogaster rudis complex) occurred in five of the traps, and species richness in the traps ranged from only one to three species. Clearly, the fixed-fixed model is not appropriate for these data, because the empty traps probably would have accumulated species if we had sampled repeatedly or for a longer duration. A better model might be to use "fixed" row totals and "equiprobable" column totals. This model assumes that the sites (= traps) are equiprobable, which is realistic because the samples were collected on a small spatial scale in a relatively

equiprobable, which is realistic because the samples were collected on a small spatial scale in a relatively homogenous open-field site. As before, set the random number seed to 10, set the column constraints to equiprobable, and run the simulation for C-score. The observed C-score for this matrix is 2.073 and the expected value from the simulation is 2.144. The observed value is less than expected, but well within the limits predicted by the null model (p = 0.323).

Analyses with degenerate matrices


Now, let's reconsider the analysis and how we have treated degenerate matrices. Since we used retain, EcoSim kept all of the simulated matrices, regardless of whether or not they were degenerate. Run the analysis again using "discard", and EcoSim will keep creating matrices until it accumulates 1000 nondegenerate matrices. What happened? EcoSim tried 3000 times to create a matrix that was non-degenerate and failed, so it stopped the simulation. This suggests that non-degenerate matrices are very rarely created with this algorithm and this data set. We can confirm this by running the analysis again and checking "split". This divides the data into two tabbed output screens "index" and "degenerate index". What you will find is that the "index" tab is empty, whereas the "degenerate index" tab contains the simulation results for all 1000 matrices. Finally, let's attempt to repair the degenerate matrices by checking "fix" and running the program again. Once again, EcoSim was not able to meet your request; it was unable to successfully redistribute the 1s in the matrix so that every row and every column had a non-zero sum. In retrospect, it is easy to understand these "failed" simulations by looking at the structure of the ant matrix. It simply has too many zeros in it for EcoSim to meet your requests for dealing with non-degenerate matrices. The only feasible solution in this case (and the one that is biologically most realistic) is to retain the degenerate matrices. Let's move on to another aspect of degenerate matrices. Using fixed row sums and equiprobable columns, change the co-occurrence index to the number of species combinations. Now EcoSim wants to know whether or not to include empty sites in its pattern calculation. Run the model with "include" checked. You can see that the original ant matrix had 10 species combinations represented among the 25 sites. The expected number was 11.358 (p = 0.206). Now run the analysis again, but this time check "exclude" for the empty sites option. The results are similar, but this time, only 9 species combinations were recorded in the ant matrix. Why the difference? When you checked "exclude", there was one less species combination counted, because the combination of "no species present" was not scored, either in the original ant matrix, or in the simulated matrices. The results are even more dramatic for the V-ratio. With the empty sites included, the observed V-ratio is 0.98, whereas with the empty sites excluded, the V-ratio is 0.43. Empty sites contribute greatly to the variance in species number among sites, so the V-ratio drops quite a bit when empty sites are excluded from the calculation. However, these empty sites are also dropped from the calculation of the V-ratio in the simulated assemblages as well, so the probability test gives a similar answer. Keep in mind that the options for degenerate matrices refer to how EcoSim handles the random matrices that it has created. The empty sites option refers to how EcoSim calculates pattern in both the simulated and the original matrix. Moreover, the empty site option only appears when you select the V-ratio or the number of species combinations, because the inclusion or exclusion of empty sites does not affect the other cooccurrence indices (C-score, number of checkerboards). Whichever options you choose, EcoSim calculates the indices in the same way for both the observed and the simulated matrices.

the indices in the same way for both the observed and the simulated matrices. You may wish to go back now and repeat these exercises for the West Indian finch matrix. What you will find is that the program can readily split, fix, and discard degenerate matrices as you instruct it. The reason is that the finch matrix is not so "empty", and it is easier for EcoSim to create matrices that meet the conditions of the simulation.

Comparative Table of Type I errors


This table will serve as an important guide in your selection of algorithms and indices. It summarizes the results of simulation studies with a number of different kinds of "random" matrices. Each row represents one of the 9 different algorithms that can be created by choosing fixed (Fix), equiprobable (Eq), or proportional (Pro) constraints for rows and columns (3 x 3 = 9). Each column is one of the 4 co-occurrence indices (Checker = number of checkerboard species pairs; C-score = Stone and Robert's (1990) C-score; Vratio = Schluter's (1984) variance ratio; Combo = number of unique species combinations). Each entry is the average proportion of simulations in which the null model was rejected using random input matrices (2tailed test; p < 0.05 for each tail). Details of random matrix construction are in Gotelli (2000). If a simulation is well behaved, this number should be less than or equal to 0.10, which is the proportion of times the null hypothesis was rejected by chance. Well-behaved combinations are shown in bold-face. Entries larger than 0.10 indicate the combination has a substantial risk of Type I error (false positives; incorrectly rejecting the null hypothesis) and should be avoided. n.a. = not applicable. Simulation Rows Columns Checker C-Score V-Ratio Combo SIM1 SIM2 SIM3 SIM4 SIM5 SIM6 SIM7 SIM8 SIM9 Eq Fix Eq Fix Pro Eq Pro Pro Fix Eq Eq Fix Pro Fix Pro Eq Pro Fix 0.72 0.07 0.77 0.16 0.74 0.89 0.77 0.91 0.08 0.76 0.08 0.77 0.27 0.76 0.73 0.73 0.56 0.10 0.14 0.10 0.77 0.10 0.62 0.12 0.12 0.08 n.a. 0.34 0.00 0.49 0.02 0.16 0.11 0.11 0.06 0.01

Choosing a co-occurrence statistic


EcoSim offers you four choices for how to measure co-occurrence patterns in your input matrix. It is difficult to suggest that one index is "better" than any of the others, because they all measure slightly different aspects of pattern. In particular, matrices that are extreme for one of the indices will often show random patterns when tested with another index! What follows is some commentary to help you decide on the different indices. C-score This index, introduced by Stone and Roberts (1990), quantifies the average amount of cooccurrence among all unique pairs of species in the assemblage. In a competitively structured community, the observed C-score should be significantly larger than expected by chance. In simulation studies, this index, when used with an appropriate null model, was superior to other indices in both Type I and Type II error properties. Recommended simulations: fixed rows-fixed columns; fixed rows-equiprobable columns. Number of checkerboard species pairs This index follows directly from Diamond's (1975) analysis of assembly rules. In a competitively structured community, the observed number of checkerboard species

assembly rules. In a competitively structured community, the observed number of checkerboard species pairs should be significantly larger than expected by chance. Because this index requires a perfect checkerboard to increase the score, it is more stringent than the C-score and may not always detect patterns of negative co-occurrence (as seen in the analysis of the finch matrix). It is also more sensitive to random error than the C-score. Recommended simulations: fixed rows-fixed columns; fixed rows-equiprobable columns. Number of species combinations Pielou and Pielou (1968) first proposed this index. In a competitively structured community, the observed number of species combinations should be significantly smaller than expected by chance, presumably because competition leads to "forbidden" combinations that will not be found (Diamond 1975). Like the number of checkerboard combinations, this index is somewhat sensitive to random error. Including empty sites will increase the number of species combinations by 1, because an empty site is counted as a species combination in both observed and simulated matrices. Recommended simulations: fixed rows-fixed columns; fixed rows-equiprobable columns; fixed rows-proportional columns; proportional rows-proportional columns. V-ratio This index was first proposed by Robson (1972) and has been popularized in the ecological literature by Schluter (1984). The test measures the ratio of the variance in the column sums, that is the variance in the number of species per site, that is the sum of the row variances. In a competitively structured community, the observed V ratio should be significantly smaller than expected by chance, usually less than 1.0. Unlike all of the other indices, this one is determined exclusively by the row and column totals, so one or both of these quantities must be allowed to vary in the null model simulation. Although this index can be thought of as a measure of the average covariance between species pairs, it is more usefully thought of as a test for constancy in species number, which is the niche limitation hypothesis of Wilson et al. (1987). Recommended simulations: fixed rows-equiprobable columns, fixed rows-proportional columns, and proportional rows-proportional columns. Note that the test cannot be used with fixed rows-fixed columns, which does not generate any variation in row and column sums.

Choosing a simulation algorithm


EcoSim offers you a possibility of 9 simulation algorithms, not including variations that can be created with user-defined weights. Although all of these simulations are available and can be used to compare data sets, some of them are quite prone to Type I errors (false positives) and should not be used as formal tests. Moreover, the performance of each simulation depends on the particular index it is used with. As a general rule, we can say that maintaining fixed row sums is usually necessary to guard against Type I errors. To simplify things, we have recommended only 4 of the 9 simulation matrices. However, if you consult Table 1, you will see there is usually some combination of algorithms and indices that gives acceptable Type I error rates. Sim1) Equiprobable rows-equiprobable columns This is the "most null" of all the simulations because it assumes all species and all sites are equiprobable. It corresponds to simple randomization tests (Sokal and Rohlf 1995) in which all combinations of data are equally likely. However, it is very prone to error with all indices. NOT RECOMMENDED. Sim2) Fixed rows-equiprobable columns This simulation randomizes the occurrence of each species among the sites, assuming the sites are equiprobable. It corresponds to a simple model of community assembly in which species colonize sites independently of one another. It behaves well with all four of the co-occurrence indices. RECOMMENDED. Sim3) Equiprobable rows-fixed columns This simulation holds the observed number of species per site fixed, then randomizes the species identities, assuming all species are equiprobable. It is error prone with all indices. NOT RECOMMENDED.

indices. NOT RECOMMENDED. Sim4) Fixed rows-proportional columns This simulation holds the observed number of species per site fixed, then randomizes their occurrences among sites. However, unlike Sim2, the sites are not equally likely to be hit, and the probability of occurrence is proportional to the observed column total, similar to a "random placement" model of species on sites (Coleman et al. 1982). This model is error prone for the C-score and the number of checkerboards, but behaves well for the V-ratio and number of species combinations. RECOMMENDED. Sim5)Proportional rows-fixed columns This simulation is the inverse of Sim4. It holds column totals fixed, and then samples species in proportion to row totals. It is error-prone for all the co-occurrence metrics. NOT RECOMMENDED. Sim6) Equiprobable rows-proportional columns This simulation assumes that species are equiprobable and that site probabilities vary in proportion to column totals. Error rates are unacceptable for the C-score and the number of checkerboards, and somewhat high for the V-ratio and number of species combinations. NOT RECOMMENDED. Sim7) Proportional rows-equiprobable columns This simulation assumes that sites are equiprobable and the occurrence probabilities for different species vary in proportion to row totals. Its performance is similar to Sim6. NOT RECOMMENDED. Sim8) Proportional rows-proportional columns This simulation assumes that neither sites nor species are equiprobable, and that occurrence probabilities are conditioned on both row and column totals. This simulation behaves well for the V-ratio and the number of species combinations, but is error prone for the C-score and the number of checkerboard pairs. RECOMMENDED. Sim9) Fixed rows-fixed columns This simulation maintains fixed row and column sums. Thus, no degenerate matrices are produced. Although an earlier version of this model by Connor and Simberloff (1979) was widely criticized, the version implmented in EcoSim has a good Type I error rate, and is powerful at detecting patterns in noisy data sets, particularly when used with the C-score. This model cannot be used with the V-ratio, which is determined exclusively by row and column sums. RECOMMENDED.

Choosing species and site weights


A major criticism of null model analysis is that it is circular to use the marginal totals of the matrix to constrain the simulation, since these marginals themselves may be influenced by competition. A systematic analysis of Type I and Type II error rates suggests that marginal totals can be effectively used to detect pattern in many co-occurrence matrices (Gotelli 2000). Nevertheless, it would be very worthwhile to incorporate indepndent information into the simulation. EcoSim allows you to do this with a powerful and easy-to-use set of species and site weights. However, you may have to go to considerable effort to gather the data needed for these analyses. Here are some suggestions for how you might weight sites or species. For sites, the most natural index might be the area of the site, or perhaps the logarithm of the area. One could also use measures of productivity or resource availability to provide relative rankings of site suitability. Data-rich regression models that predict species richness could also be used to generate expected species richness in a set of sites. Weights for species may be more challenging to obtain, but you could use macroecological variables such as average population size, body size, or area of the geographic range. The best thing would be to directly measure the relative colonization potential of different species, but this may be very difficult to do for a large

measure the relative colonization potential of different species, but this may be very difficult to do for a large assemblage of species. It is almost impossible to say whether these independent weights will increase or decrease the chances of rejecting the null hypothesis. One general result is that the more disparate the site weights, the more cooccurrence that is expected in the null model. Why? Because all species in these null models will tend to occur more often in some sites and less often in others. In contrast, less co-occurrence is expected if sites are treated as equiprobable. Therefore, the equiprobable weighting of sites is actually the more conservative procedure, even though it is the least biologically realistic. Although independent measures of species and site differences would be ideal, they probably are not available for most communities. For these reasons, most null model analyses will rely on the matrix totals themselves to constrain the simulation. Although not all ecologists are happy with these procedures, the models available in EcoSim have performed well with idealized data sets and should provide useful insights into co-occurrence patterns.

Choosing degenerate matrix options


Although degenerate matrices represent a different "sampling universe", there statistical properties do not appear to be greatly different from the non-degenerate matrices. We have used the "split" feature to examine the behavior of degenerate and non-degenerate matrices. Although there is a greater tendency to reject the null hypothesis with degenerate matrices, both sets of data tend to give similar results. For this reason, we recommend using the "retain" option, as well as the "split" option to view the behavior of degenerate and non-degenerate matrices. The "discard" option is available if you want to use only non-degenerate matrices, but you should be aware that this set of matrices might represent a very small space of the sampling universe. If EcoSim can't find a non-degenerate matrix for you after 3000 tries, it gives up. We don't recommend that you use the "fix" option, because the simple rules that EcoSim uses for fixing matrices are often different from the overall constraints in the simulation. A better approach might be to stick with the default fixed rows-fixed columns, which will generate only non-degenerate matrices with row and column totals maintained. However, this algorithm is not appropriate for the V-ratio. As for the choice of including or excluding empty sites, we recommend that you choose "include". If you have decided to allow degenerate matrices in your simulations, it seems logical to us that the empty sites should be included in the measurement of pattern. Note that empty sites make no difference in the C-score or the number of checkerboards. Empty sites have a small effect on the number of species combinations and a moderate effect on the V-ratio.

How Ecosim simulates


For the curious, this section briefly describes what goes on "under the hood" of EcoSim as it carries out your co-occurrence simulation model. Depending on the algorithm you have chosen, EcoSim uses three different procedures for randomizing co-occurrence matrices: 1) Reshuffling If the sites (or species) are equiprobable, EcoSim takes each observation in order, and switches it with another randomly chosen site (including the possibility of its own site). Once every site has been randomly switched with another, the list has been effectively randomized. 2) Filling If the algorithm is using proportional or user-defined weights, it is no longer possible to use the reshuffle procedure because the sites are not equally likely to be hit. Instead, each weight for a site is

reshuffle procedure because the sites are not equally likely to be hit. Instead, each weight for a site is mapped as a segment of a [0,1] number line. A random uniform number is drawn, and the species is placed in the site that corresponds to the segment that is hit. This allows EcoSim to carry out simulation procedures with any generalized set of weights for species or for sites. 3) Transposing The fixed rows-fixed columns algorithm requires some very special procedures, because it is not possible to sequentially fill either species or sites and simultaneously maintain the row and column sums. We followed the procedure in Stone and Roberts (1990) of switching randomly selected submatrices of the form: 1...0 .. .. 0...1 To the form: 1...0 .. .. 0...1 This switch produces a new matrix, but one that still maintains row and column totals. Manly (1995) showed that sequential matrices created this way are random with respect to the original matrix, as long as the sequence is distantly removed from the original matrix. EcoSim uses 10,000 preliminary transpositions before it begins retaining matrices in the null model. We have compared this procedure with several others that can be used to create these matrices and found that the results are nearly identical. The transposing algorithm is a rapid way to create random matrices that retain row and column totals. Sanderson et al. (1998) have recently claimed that Manly's (1995) procedure used with the C-score gives biased results. We respectfully disagree. We have extensively checked the performance of Manly's (1995) algorithm by using it with random matrices and with matrices that have non-random structure (Gotelli 2000). We have also compared the Manly's (1995) algorithm to other methods that generate random matrices. We are confident that the implementation in EcoSim provides you with an appropriate, random sample of matrices with fixed row and column totals.

Options for degenerate matrices


There are two questions you must address in the analysis of degenerate matrices. The first question is: what should EcoSim do with degenerate matrices that are created by the simulation? There are 4 choices you can make. The first is to retain all of the matrices that are created and not to worry about whether or not they are degenerate. The second choice is to discard all of the degenerate matrices that are created and to keep sampling until a non-degenerate matrix is created. The third choice is to split the degenerate and nondegenerate matrices into two bins, and then calculate the probability tests separately, comparing the observed matrix first to the sample of degenerate matrices, and then to the sample of non-degenerate matrices. The final choice is to fix the degenerate matrices. For this option, EcoSim will repair any degenerate matrix by transferring randomly one of the cell occurrences from an occupied column (or row) to an empty column (or row). Notice that these options all apply to degenerate matrices that are created by EcoSim in its simulation; the input matrix, even if it is degenerate, is not affected by these changes. The second question you must address is: should empty sites be included in the data set for the purposes of calculating the co-occurrence metric? This question applies to both the input matrix and the simulated

calculating the co-occurrence metric? This question applies to both the input matrix and the simulated matrices. Empty sites do not affect the calculation of the C-score or the number of checkerboards, so the empty site options box is not visible when you choose either of these indices. The presence of empty sites will affect the calculation of the number of species combinations and the V-ratio. Notice that this option applies to both the simulated matrix and the original input matrix.

8. Literature Cited
Arnett, A. E. 1998. Geographic variation in life-history traits of the ant lion, Myrmeleon immaculatus: evolutionary implications of Bergmann's Rule. Ph.D. dissertation. Department of Biology. University of Vermont, Burlington. Coleman, B. D., M. A. Mares, M. R. Willig, and Y.-H. Hsieh. 1982. Randomness, area, and species richness. Ecology 63: 1121-1133. Gotelli, N. J., N. J. Buckley, and J. A. Wiens. 1997. Co-occurrence of Australian land birds: Diamond's assembly rules revisited. Oikos 80: 311-324. Gotelli, N.J. 2000. Null model analysis of species co-occurrence patterns. In press, Ecology. Gurevitch, J., L.L. Morrow, A. Wallace, and J.S. Walsh. 1992. A meta-analysis of field experiments on competition. The American Naturalist 140: 539-572. Lack, D. L. 1976. Island biology, illustrated by the land birds of Jamaica. University of California Press, Berkeley. Manly, B. F. J. 1995. A note on the analysis of species co-occurrences. Ecology 76: 1109-1115. Pielou, D. P., and E. C. Pielou. 1968. Association among species of infrequent occurrence: the insect and spider fauna of Polyporus betulinus (Bulliard) Fries. Journal of Theoretical Biology 21: 202-216. Robson, D. S. 1972. Appendix: Statistical tests of significance. Journal of Theoretical Biology 34: 350-352. Sanderson, J.G., M.P. Moulton, and R.G. Selfridge. 1998. Null matrices and the analysis of species cooccurrences. Oecologia 116: 275-283. Schluter, D. 1984. A variance test for detecting species associations, with some example applications. Ecology 65: 998-1005. Sokal, R. R., and F. J. Rohlf. 1995. Biometry. W.H. Freeman and Company, New York. Stone, L., and A. Roberts. 1990. The checkerboard score and species distributions. Oecologia 85: 74-79. Stone, L., and A. Roberts. 1992. Competitive exclusion, or species aggregation? An aid in deciding. Oecologia 91: 419-424. Wilson, J. B. 1987. Methods for detecting non-randomness in species co-occurrences: a contribution. Oecologia 73: 579-582. Wilson, J. B., H. Gitay, and A. D. Q. Agnew. 1987. Does niche limitation exist? Functional Ecology 1: 391-397.

All Pages Copyright 2000 by Kesey-Bear and Acquired Intelligence, Inc. All rights reserved.

You might also like