Coexistence of cooperators and defectors in well mixed populations mediated by limiting resources

Traditionally, resource limitation in evolutionary game theory is assumed just to impose a constant population size. Here we show that resource limitations may generate dynamical payoffs able to alter an original prisoner's dilemma, and to allow for the stable coexistence between unconditional cooperators and defectors in well-mixed populations. This is a consequence of a self-organizing process that turns the interaction payoff matrix into evolutionary neutral, and represents a resource-based control mechanism preventing the spread of defectors. To our knowledge, this is the first example of coexistence in well-mixed populations with a game structure different from a snowdrift game.

Traditionally, resource limitation in evolutionary game theory (EGT) is assumed just to impose a constant population size. Here we show that resource limitations may generate dynamical payoffs able to alter an original prisoner's dilemma (PD), and to allow for stable coexistence between unconditional cooperators and defectors in well-mixed populations. This is a consequence of a selforganizing process that turns the interaction payoff matrix into evolutionary neutral, and represents a resource-based control mechanism preventing the spread of defectors. To our knowledge, this is the first example of coexistence in well-mixed populations with a game structure different from a snowdrift game. Cooperative behaviors are common in nature, and necessary for the evolutionary appearance of higher selective units -such as eukaryotic cells or multicellular lifefrom simpler components [1]. However, the survival of the fittest under the action of natural selection seems to foster selfish behaviors taking advantage of other individuals [2,3]. It is therefore intriguing how cooperative behaviors can emerge and survive in a world ruled by natural selection. This issue is frequently addressed in the context of evolutionary game theory (EGT), where the prisoner's dilemma (PD) game [4] has been used as a paradigm for understanding the evolution of cooperation, as its simplified non-iterated version is the worst scenario for the survival of cooperation [5]; if interactions between individuals follow a PD and reproductive success grows with payoffs, cooperative behavior is led to extinction in large well-mixed populations [5,6]. In the last decades some mechanisms have been found allowing cooperative behaviors to survive in the absence of genetic relatedness, but none of them works for the simplified PD in the absence of features such as memory [7,8], reputation gain [9], network structure [10][11][12] or other sensory inputs [13,14].
Here we study the influence of resource limitation on the emergence of cooperation. We find that in addition to imposing a finite population size, as usually assumed in EGT [4,5,[7][8][9][10][11][12][13][14][15], it may generate dynamic payoffs [16][17][18]. In the absence of resource limitation the interactions between cooperators and defectors fulfil a simplified PD, as determined by the selfish strategy, and thus cooperators extinguish, as expected in evolving well-mixed populations. Surprisingly, resource limitation may drive a self-organizing process that allows for stable coexistence between cooperators and defectors. In contrast to previous studies including ecological features, in which coexistence happens only in public goods games with variable interaction group sizes [19], it is transient [20] or requires spatial structure [21,22], here we find stable coexistence for pairwise interactions without population structure. This stable coexistence resembles the homeostatic equilibrium in the daisy world [23,24], as both are mediated by environmental factors driving the system out from equilibrium.
In order to study this, we develop a model consisting of an evolving well-mixed population of self-replicating individuals that receive resources from the environment and exchange resources during interactions. No population structure, memory, learning abilities or any other sensory inputs are assumed. Each individual i is represented by its internal amount of resources E i and its strategy, namely cooperate (C) or defect (D). The internal amount of resources may be interpreted as the amount that belongs to it, independently of why or how. The environment provides resources in portions E i 0 per unit time to randomly chosen individuals independently of their strategy, thus not modifying the structure of the payoffs. For simplicity, we impose a constant total resource influx E T , though results also apply for non constant fluxes (see [25]). If the amount of resources of an individual exceeds a value E s , it splits into two identical copies with half its internal amount of resources.
Defectors are characterized by the maximum amount of resources associated to an interaction: the cost spent (E c ) for stealing a reward (E r ) from the co-player. If the internal resources of a defector are smaller than the cost E c , it does not pay the cost nor receives the reward. If the interaction partner has resources below the reward, the entire amount of resources is sequestered. We assume that these quantities are inherited without mutation; they represent physiologic, morphologic or genetic characteristics intrinsic to individuals and cannot be modified by choice.
We consider large populations, simultaneous interactions and E r > E c > 0, although the same results are obtained if one assumes that every interaction is carried out by a donor and received by the co-player [18]. The interaction matrix determined by the strategies is thus and equals a simplified PD, with defectors paying a cost E c and obtaining a net reward ∆E = E r −E c > 0 (payoffs for the row player, we will omit C and D in the following matrixes). Finally, we assume that the limiting resource necessary for reproduction provides no advantage for keeping alive; therefore deaths occur at random with a frequency (rate) f relative to receiving resources and interacting, which happen equally frequently.
According to the PD structure of resource exchanges among cooperators and defectors in the absence of resources limitation, defectors should have a larger resource intake, reaching faster the splitting bound E s and thus reproducing quicker (i.e. fitness is proportional to resource exchanges). Therefore one would expect homogeneous populations of defectors as the outcome of the evolutionary process. However, the limitation of resources generates a distribution of resources among individuals in the population. As a consequence, the average reward stolen from cooperators E ′ r may decrease below E r , since the internal resources of some cooperators may fall below this quantity, thus modifying the payoffs. If the net average benefit of defectors ∆E ′ = E ′ r − E c remains negative all over the time, the payoff matrix does not correspond to a PD anymore; it is turned into a harmony game and cooperation becomes the dominant strategy. Simulations show that the model yields the later behavior and also, more interestingly, stable coexistence of cooperation and defection (Fig. 1); see [25] for details on the simulations. Dominance of cooperation was already found in a previous model assuming that resources are necessary for keeping alive [18] which, however, did not provide coexistence.
Coexistence in this scenario requires a complex feedback process whose exact analysis is quite difficult because of the complex nonlinearities involved in the dynamics. However, a simple quantitative reasoning exhibits the logic of this feedback and allows for an analytic estimation of the final stable state of the system. Let us note that an increase in the number of defectors over the equilibrium value would cause an overexploitation of cooperators, thus reducing their resource content. This would have two effects: (i) it would reduce cooperators' reproduction rate (fitness) because they become farther from the splitting bound E s , and (ii) it would also decrease the average reward obtained by defectors, which thereby reduces their fitness. If the second effect dominates over the first one, then stable coexistence becomes possible, as the feedback pushes the system back to equilibrium. A similar argument applies for a decrease in the number of defectors. The entire system is in equilibrium when the resource influxes and out fluxes in the populations of both cooperators and defectors cancel out. The balance of resources in these subpopulations contains three contributions: environmental supply, deaths, and interactions. They are expressed in the following equations E j , E j and N j denote, respectively the total resource content, average resources per individual and number of individuals of the subpopulations j = C, D ; E 0 = E T /N is the mean amount of resources received by an individual per unit time, with N = N C + N D the instantaneous population size; ρ = N C /N is the fraction of cooperators; f is the death probability per individual and interaction, and p the fraction of the population of defectors able to pay the cost (i.e. with E i > E c ). In equilibrium, the populations of cooperators and defectors become constant in time so that the resource pools E D and E C reach a constant value. One thus finds the equilibrium condition This shows that the coexistence depends on the death frequency f . For simplicity we will assume in this analytic derivation that deaths happen much less frequently than interactions, i.e. the limit f → 0; this corresponds to many interactions in a lifetime, when the effects of interactions become more relevant. Other f values are studied through simulations [25]. Since p never equals zero due to the constant resource influx, Eq. (4) reduces in this limit to which states that, in equilibrium, the cost paid by defectors equals the reward stolen from cooperators. In order to analytically predict the region of coexistence in the parameters space and the corresponding population composition, we need to know the average reward E ′ r in terms of the parameters E r and E c . This implies the calculation of the equilibrium distribution of resources for cooperators, which is a difficult task due to the nonlinearities involved in the dynamics. Instead, we can give a rough heuristic estimate as follows. The lower the fraction of cooperators in the population, the more frequent any cooperator meets a defector, thereby cooperators become overexploited and their average internal resources decrease. Thus the average reward E ′ r is expected to decrease as ρ decreases. We assume a linear relationship between both quantities, E ′ r = αρ, with α a positive constant. By the moment we consider that when ρ is close to 1, the effect of defectors is expected to be small, so that at first order we approximate the resource distribution of cooperators as uniform. For uniform distributions [25] one finds E ′ r = E r − E 2 r /(2E s ). We thus propose By combining Eq. (6) with Eq. (5) one obtains an ex-pression for the equilibrium fraction of cooperators In order to analyze in detail the behavior of the model, we have performed extensive numerical simulations covering the whole parameters space. They confirm the stability of the coexistence for all death frequencies, and show that the final stable state is independent of the initial conditions and resource influx (and thus, final population size) [25]. Figs. 2a,b show a good qualitative agreement between the predicted ρ and the outcome of the simulations. Deviations root in the linear approximation assumed in Eq. (6) (see [25]).
Let us notice that the obtained stable coexistence between cooperators and defectors presents a new outcome in the context of two-player games, where a stable mixed state is only expected in Snowdrift (or Hawk-Dove) games, which have a payoff structure different from ours. In general, symmetric two-player games can be described through the interaction matrix [26] where coefficients a and b are assumed to be constant.
Applying the replicator equation [6,15,26] to analyze the evolution of the population, three cases are possible (see Fig. 2c ; this is what occurs in Snowdrift games, when it always pays to play the opposite of the co-player. In our model, fitness is directly proportional to resource exchanges, because individuals reproduce when their resources overcome an upper bound that is the same for cooperators and defectors. Resource exchanges come from the environment and from interactions. The resource supply from the environment is the same for defectors and cooperators; it just provides a constant to all fitness values and can be omitted in the fitness matrix. The latter is thus ruled by the average resources exchanged through interactions, which aside from a scale factor translating resource exchanges to fitness, is [18] As stated above, p stands for the fraction of defectors whose resources exceed the cost E c . Let us note that this factor does not change the payoff structure in any case, as it multiplies all payoffs, and it only modifies the time scale of the dynamics. The interaction matrix can be rewritten in the form of matrix (8) by adding pE c to the second column (as adding a constant to a column does not affect the replicator dynamics [6,26]): i.e. a = −b = −p∆E ′ . According to the classification given above, this payoff matrix leads to dominance of one strategy whenever p∆E ′ = 0. In the absence of resource limitation ∆E ′ = ∆E > 0 and we have a PD. If resources are limited, there exists a wide range of parameters for which the ∆E ′ is tuned to zero for a specific mixture of cooperators and defectors (see Figs. 2a,b); thus, the stable equilibrium is the result of a dynamical self-organizing process and not of the game structure itself (see Fig. 2c). We can use the payoff matrix (10) to gain further insight into the stability of the coexistence state found in our model. In Eq. (6) we proposed the rough estimate E ′ r = αρ for the net benefit of defectors, with α > 0. Thus, we have ∆E ′ = αρ − E c . Aside from a positive factor relating fitness and payoffs in Eq. (10), the replicator equation yields which supplies three equilibria, ρ = 0, 1 and E c /α. Since p > 0, the mixed state is the stable one for 0 < E c /α < 1, in agreement with the stability of the coexistence states observed in the simulations.
We have presented a scenario which allows for stable coexistence of unconditional cooperators and defectors in well-mixed populations under pairwise interactions. This result is quite robust, since it does not depend on initial conditions, and it is also observed in small populations -though in this case fluctuations may lead to the extinction of one strategy -and under nonconstant influx of resources ( [25]). This stable coexistence roots on a selforganizing process which implicitly includes the environment, and it is the feedback induced by environmental constraints and defectors' behavior which turns the payoff matrix into evolutionary neutral and allows for the stability of the system. The evolutionary neutrality of the system (environment + individuals) and its stability as a whole, might be a first step towards the emergence of new units of selection by providing a selforganizing mechanism preventing the spread of selfish mutants alternative to central control (see [1]).
Let us also remark that, in contrast to previous models in evolutionary dynamics, the model presented here explicitly sets the issue in a nonequilibrium context, where a (resource) flux drives the system out from equilibrium. The observed selforganized coexistence state may be seen as another example of selforganizing process found in nonequilibrium systems such as, for instance, the unexpected oscillations in Belusov-Zabhotinsky reactions. This perspective may bear interest in economic contexts, another classical field of evolutionary game theory, where some authors claim that economic systems should be modeled as open, nonlinear nonequilibrium systems instead of the closed, equilibrium view dominant in traditional economics [27,28].
We In this supplementary material we provide details of the analytical derivations carried out in the main text and of the agent based simulations. The rank of the payoffs in the PD is If the game is played repeatedly by the same two individuals (iterated PD), which happens with a very low probability as we assumed big population sizes (in order to make mean field approximation), the condition 2R > T + S is also required. For a simplified PD, the condition T − R = P − S, also known as equal gains from switching, must be fulfilled. In our study, the payoffs take values T = E r − E c , R = 0, P = −E c , S = −E r and the three conditions are fulfilled by imposing II. RESOURCE DISTRIBUTION OF COOPERATORS: ESTIMATION OF E ′ r .

A. Uniform distribution of resources
Let us call P (E < E r ) the probability that a cooperator has an internal amount of resources lower than E r . The mean payoff for a defector playing against a cooperator can be written as E ′ r = P (E > E r )E r + P (E < E r )E r , where E r is the mean internal amount of resources of cooperators in the region E < E r . This may be rewritten as

For uniform distributions of resources this equation yields
B. Linear approximation for E ′ r To derive Eq.(7) in the main text we assumed the linear dependence E ′ r = αρ, where α is chosen to attain expression (3) when ρ = 1. Fig. 1 shows a reasonable agreement between this simple expression and the simulation results. As assumed, the average reward E ′ r is found to grow with the fraction of cooperators, although r /(2Es) versus fraction of cooperators in equilibrium ρ. Symbols denote simulation results expanding over the whole parameter space for f = 0.01. The straight line corresponds to the assumption of a linear relationship between E ′ r and ρ. One finds a reasonable agreement between both results with some deviations at low and high ρ. a quadratic dependence is more adequate than just a linear one. If one assumes such a quadratic dependence the agreement between predictions and simulations greatly improves (see Figs. 2b,d).
The final state of the system was found to be independent of the initial conditions, i.e. initial population size N 0 , fraction of cooperators ρ and initial distritution of resources, provided that the initial population size is big enough (see section III D below). In Figs. 2 both in the paper and the SM, we started all simulations with a population size N 0 = 10000, a uniform random distribution of internal resources and an initial fraction of cooperators of ρ = 0.5. The amount of resources for splitting was taken E s = 1000, which sets the scale of resources in the simulations. For E T we used 420000, so that the total population N attained values of several thousands, large enough to minimize finite size effects; however, the mechanism works for much smaller population sizes (smaller than 100 individuals, see fig. 5), provided that random fluctuations on the number of individuals do not allow the system to reach fixation. Simulations run over around 1000 time steps, where a time step is defined as a number of interactions equal to the population size, and stopped if a homogeneous population was reached before.

A. Resource allocation
We tested two resource allocation methods: supplying portions uniformly distributed on the interval E i 0 ∈ (0, 2E T /N ) and of fixed size E i 0 = E T /N , where N denotes the number of individuals in the population; both provided the same results, but the first one (which is the one used in the figures) introduces some extra stochasticity in the model. As said above, E T was chosen big enough as to avoid effects due to finite population sizes, while keeping feasible simulation times (mean population sizes around 10 4 individuals).

B. Updating
The updating is completely asynchronous [1] and with overlapping generations, which avoids the appearance of synchronization effects and mimics reproductive dynamics observed in nature. The implementation used is as follows: Every interaction time step six individuals are chosen at random; of them (a) two receive an amount of resources E i 0 from the environment, (b) two interact, and (c) two die with a probability f . Other asynchronous implementations where tested, as choosing just four individuals per interaction step, one to act as donor, one as recipient of the act, one to receive resources and one to die with probability f . Results using this updating were identical, but simulation times were much longer, for which reason we choose the first method to carry out the computer simulations.
C. Remarks on the parameter space.
Our model contains 5 parameters: E T , E s , E c , E r and f . If one doubles all the parameters related to the resources, i.e. all of them except f , the dynamics does not change, provided there are no finite population size effects. Then, one of the parameters in the model just defines the scale with respect to the others. We thus set E s = 1000 without loss of generality. Furthermore, as long as E T takes a large value (E T ≫ E s ), it only affects the final number of individuals, but not the strategy of the final state (this is confirmed by simulations). This leaves us with the free parameters, E c , E r (both smaller than E s ) and f . We have explored the dynamics for all values of E c and E r in the region E s > E r > E c , and for several values of f (see Fig. 2). In this way, our analysis covers the whole parameter space.

D. Independence of results on initial conditions and on resource influx
As mentioned above, the simulation results are very robust. In particular, they are independent on initial conditions aside from finite size effects when either N or ρ are too small. For instance, in figure 3 we show simulations results starting from four different initial amounts of cooperators and defectors (N C , N D ), all leading to the same final steady state.
In addition, our results are robust with respect to changes in the resource influx and, in particular, the latter is not required to be constant. Figure 4 shows simulation results for a sudden reduction of E T to its fifth. As expected, the population size N also reduces to its fifth, but the fraction of cooperators hardly changes. Indeed, it is observed a slight transient advantage for cooperators when resources are reduced, and a transient advantage of defectors when resources are increased.
Finally, Figure 5a analizes the limit of small population sizes. One observes that both the number of individuals in the population and the fraction of cooperators reach steady states with larger random flutuations. The other panels of Figure 5 study the effect of oscillatory changes of the total resource influx E T in these small populations.
The population numbers are found to follow the resource influx E T (except at fast oscillations of period w = 10, shorter than the time scale of the population dynamics). Remarkably, the fraction of cooperators remains essentially constant, though fluctuations may drive the system to the extinction of one strategy (Fig. 5f). It can be observed that in all cases, even when the initial values are far away from the final state, the system recovers its equilibrium values given there is enough time. In the case in which the initial number of defectors is much bigger than its final stable value (five times bigger, green lines), or that of cooperators is very small (13 times smaller, black lines), the population of cooperators/defectors (respectively) decreases initially, and thus this situation could drive one of the populations to extinction if its initial number was small enough (see