-learning agents in a Cournot oligopoly model
Introduction
In this paper, we model the learning behavior of firms in repeated Cournot oligopoly games using -learning. -learning is a reinforcement learning model of agent behavior originally developed in the field of artificial intelligence (Watkins, 1989). The model is based on two assumptions. First, for each possible strategy an agent is assumed to remember some value indicating that strategy's performance. This value, referred to as a -value, is determined based on the agent's past experience with the strategy. Basically, the -value of a strategy is calculated as a weighted average of the payoffs obtained from the strategy in the past, where more recent payoffs are given greater weight. The second assumption of -learning states that, based on the -values, an agent probabilistically chooses which action to play. A logit model is used to describe the agent's choice behavior. The assumptions made by -learning can also be found in other reinforcement learning models. The models of Sarin and Vahid, 1999, Sarin and Vahid, 2001 and Kirman and Vriend (2001) use ideas similar to -values, while the models of, for example, Mookherjee and Sopher (1997) and Camerer and Ho (1999) use a logit model to describe the way in which an agent chooses an action. -learning distinguishes itself from other reinforcement learning models in that it combines these two elements in a single model. In the economic literature, the combination of these elements has, to our knowledge, not been studied before.
In this paper, we show that the use of -learning for modeling the learning behavior of firms in repeated Cournot oligopoly games generally leads to collusive behavior.1 This is quite a remarkable result, since most -learning firms that we study do not have the ability to remember what happened in previous stage games. The firms therefore cannot use trigger strategies, that is, they cannot threaten to punish each other in case of non-collusive behavior. There is also no possibility for explicit communication between firms. However, despite the absence of punishment and communication mechanisms, collusive behavior prevails among firms. Apart from -learning, there are almost no models of the learning behavior of individual economic agents that predict collusive behavior in Cournot games. The only model of which we are aware is the so-called trial-and-error model studied by Huck et al. (2004a). Yet, experimental results (for an overview, see Huck et al., 2004b) indicate that with two firms collusive behavior is quite common in Cournot games. -learning is one of the few models that does indeed predict this kind of behavior.
Models of the learning behavior of economic agents are studied both in agent-based computational economics (e.g., Tesfatsion, 2003, Tesfatsion, 2006) and in game theory (e.g., Fudenberg and Levine, 1998). In agent-based computational economics the methodology of computer simulation is typically adopted, whereas in game theory the analytical methodology is predominant. It seems rather difficult to obtain analytical results for the behavior of multiple -learning agents interacting with each other in a strategic setting. In the field of artificial intelligence, it has been proven that under certain conditions a single -learning agent operating in a fixed environment learns to behave optimally (Watkins and Dayan, 1992). However, for settings with multiple agents learning simultaneously almost no analytical results are available. Given the difficulty of obtaining analytical results, most of the results that we present in this paper are based on computer simulations. Analytical results are provided only for the special case in which -learning firms in a Cournot duopoly game can choose between exactly two production levels, the production level of the Nash equilibrium and some other, lower production level. The analytical results turn out to be useful for obtaining some basic intuition why -learning firms may learn to collude with each other.
The remainder of this paper is organized as follows. First, in 2 Related research, 3 , we provide an overview of related research and we introduce -learning. Then, in Section 4, we discuss the Cournot oligopoly model with which we are concerned throughout the paper. We consider our computer simulations in 5 Setup of the computer simulations, 6 Results of the computer simulations, in which we discuss the simulation setup and present the simulation results. We provide some analytical results in Section 7. Finally, in Section 8, we draw conclusions.
Section snippets
Related research
The literature on modeling the learning behavior of economic agents is quite large. Overviews of this literature are provided by Brenner (2006) and Duffy (2006). One can distinguish between individual learning models and social learning models (Vriend, 2000). In individual learning models an agent learns exclusively from its own experience, whereas in social learning models an agent also learns from the experience of other agents. Below, we first discuss the modeling of individual learning
-learning
In this paper, -learning is applied as follows. An agent plays a repeated game. At the beginning of the stage game in period t, the agent's memory is in some state . This state may be determined by, for example, the actions played by the agent and its opponents in the stage game in period . Taking into account the state of its memory, the agent chooses to play some action . The choice of an action is made probabilistically based on the so-called -values of the agent. Playing action
Cournot oligopoly model
We consider a simple Cournot oligopoly model with the following characteristics: the number of firms is fixed, firms produce perfect substitutes, the demand function is linear, firms have identical cost functions, and marginal cost is constant. The inverse demand function is given bywhere n denotes the number of firms, p denotes the market price, denotes firm i's production level, and and denote two parameters. Firm i's total cost equalswhere the
Setup of the computer simulations
In this paper, we focus on the long-run behavior of -learning agents when the probability of experimentation approaches zero. In this respect, the approach that we take is similar to the approach that is typically taken to analyze evolutionary game-theoretic learning models (e.g., Vega-Redondo, 1997, Alós-Ferrer, 2004, Bergin and Bernhardt, 2005). We further focus on settings in which the learning behavior of all agents is modeled using -learning. An alternative would be to consider settings
Results of the computer simulations
In this section, we present the results of the computer simulations that we performed. We first consider the simulations with firms that did not have a memory, and we then consider the simulations with firms that did have a memory.
Simulations with firms that did not have a memory were performed for various values for both the number of firms n and the learning rate . For each combination of values for n and , Table 1 shows firms’ joint quantity produced and joint profit. Since we focus on the
Analytical results
In the previous section, we presented simulation results showing that the use of -learning for modeling the learning behavior of firms in a Cournot oligopoly game generally leads to collusive behavior. This turned out to be the case not only for firms with a memory but also for firms without a memory. This is quite remarkable, since firms without a memory cannot use trigger strategies, that is, they cannot threaten to punish each other in case of non-collusive behavior. So, collusive behavior
Conclusions
We have studied the use of -learning for modeling the learning behavior of firms in repeated Cournot oligopoly games. -learning, which belongs to the family of reinforcement learning models, combines two elements that, individually, can also be found in other models of the reinforcement learning type. On the one hand, the way in which the performance of a strategy is measured is similar to the way in which this is done in the models of Sarin and Vahid, 1999, Sarin and Vahid, 2001 and Kirman
Acknowledgments
We would like to thank Maarten Janssen, Joost van Rosmalen, three anonymous referees, the associate editor, and the editor for their comments. These comments have significantly improved the paper.
References (35)
Cournot versus Walras in dynamic oligopolies with memory
International Journal of Industrial Organization
(2004)Genetic algorithm learning and the cobweb model
Journal of Economic Dynamics and Control
(1994)Agent learning representation: advice on modelling economic learning
Keeping up with the Joneses: competition and the evolution of collusion
Journal of Economic Behavior and Organization
(2000)- et al.
Endogenous fluctuations under evolutionary pressure in Cournot competition
Games and Economic Behavior
(2002) Agent-based models and human subject experiments
- et al.
Evolution in games with randomly disturbed payoffs
Journal of Economic Theory
(2007) - et al.
Two are few and four are many: number effects in experimental oligopolies
Journal of Economic Behavior and Organization
(2004) - et al.
Evolving market structure: an ACE model of price dispersion and loyalty
Journal of Economic Dynamics and Control
(2001) - et al.
Quantal response equilibria for normal form games
Games and Economic Behavior
(1995)
Learning and decision costs in experimental constant sum games
Games and Economic Behavior
Cooperation as a result of learning with aspiration levels
Journal of Economic Behavior and Organization
Learning in extensive form games: experimental data and simple dynamic models in the intermediate term
Games and Economic Behavior
Multiagent reinforcement learning in the iterated prisoner's dilemma
Biosystems
Payoff assessments without probabilities: a simple dynamic model of choice
Games and Economic Behavior
Predicting how people play games: a simple dynamic model of choice
Games and Economic Behavior
Agent-based computational economics: a constructive approach to economic theory
Cited by (87)
Reinforcement learning in a prisoner's dilemma
2024, Games and Economic BehaviorAlgorithmic collusion: Genuine or spurious?
2023, International Journal of Industrial OrganizationPigouvian algorithmic platform design
2023, Journal of Economic Behavior and OrganizationOptimal mining in proof-of-work blockchain protocols
2023, Finance Research LettersCitation Excerpt :This explains its growing use for Game Theory models. For example, Fershtman and Pakes (2012) use single-agent q-learning to show profit-maximazing behavior in a dynamic asymmetric environment; Waltman and Kaymak (2008) apply multi-agent q-learning to demonstrate collusion in Cournot competition; Calvano et al. (2020) illustrate the effect of implicit collusion on the pricing strategies of firms; Yang et al. (2020) and Weidlich and Veit (2008) provide a comprehensive survey of the various methods that have been used over the years. There are a few papers on cryptocurrencies using similar techniques (Sun Yin et al., 2019; Manahov and Urquhart, 2021), usually focusing on price prediction (Alessandretti et al., 2018; Awotunde et al., 2021; Sebastião and Godinho, 2021; Akyildirim et al., 2021).
Dynamics of market making algorithms in dealer markets: Learning and tacit collusion
2024, Mathematical Finance