Inspired by mathematical models of learning in games I set-up a simplistic model of a learner that slowly seeks her way into learning new knowledge and analyse how in such a model two model students can interact. The results suggest something interesting regarding the timing of group-work in courses.

Learning in Games

One context in which Economists enjoy studying models of learning in when considering players engaged in some kind of a repetitive game. The predictions of game theory for the outcome of such games depend strongly on whether the game is played once or multiple times, possibly infinitely many times. But outcomes also crucially depend on whether all the players in the game have all the relevant information at their disposal. Whenever a game runs over multiple rounds and players have the opportunity to observe the behaviour of their fellow players they might mitigate any information deficiencies they have by learning. An example of such a study is the one by Gale, Binmore & Samuelson [2]. An interesting aspect of their analysis is the conclusion that players can actually know too much.

The argument for this is roughly as follows: suppose a player in a game can choose between two strategies but is unsure as to which strategy is the optimal one. As a result of the uncertainty this player’s strategic choices will vary between the two available strategies, i.e. there is a certain probability she will choose one and the complementary probability she will choose the other. Her strategy is a so-called mixed strategy. A convenient measure for how much information is lacking in this mixed strategy is the Entropy of the probability distribution. The highest entropy would be Log[2] when she is entirely indifferent between the two strategies she can play. The lowest entropy is 0 when she is always selecting one of the two with certainty.

Knowing too much in this context would mean that selects strategies with certainty whereas her opponents in the game are much more randomly responding to her. As a result her ‘calculation’ of the optimal strategy based on what optimal responses the others would play isn’t realised and hence her outcomes may be worse then they could be if she would select her own strategies with a slightly higher degree of uncertainty. The model in [2] demonstrates a situation where this is indeed the case.

That result makes you wonder: is the learning player in that game actually determining a trade-off between expected pay-off and the utility of Entropy?

Gradient learning

To study that question it would be good to have a more general model of what learning is. A popular model from computer science, AI, neural network theory and machine learning is that of gradient-learning. Let’s discuss for a moment what it is and how a human student might come to adopt gradient-learning as a strategy.

Suppose that a student’s preference for knowing an amount  or quality x of knowledge is given by a utility function

Screenshot (170)

The Log is always increasing with increasing x, so we are assuming the preference of this agent is always to learn more if she has the choice. The constant k is always assumed positive. Now if the agent would have full information about what maximal quantity she can learn and about the form of her preference for knowledge she might decide to simple learn the optimal, i.e. full, amount. But suppose the more realistic situation that she neither knows whether there is a maximal amount, nor what the exact form of her preferences is. Suppose that all she knows is the following: if her level of knowledge is x then all she knows is whether a small increase in x would generate a small increase in utility or not. Said slightly more technically: she knows her marginal utility of knowledge at the level she has obtained, but not her full utility function.

In such a situation the model student of this post could decide to pursue the following strategy. She checks her marginal utility of knowledge and then learns a small amount of knowledge proportional to that marginal utility. If we model this behaviour mathematically for the utility function given above, this leads to a so-called first-order differential equation that allows us to calculate the learning curve our model student will go through;

Screenshot (171)

This is an equation we can actually solve exactly to find x[t] when we know the level x[0] at which this model student starts

Screenshot (164)

If we plot these learning curves for our model student, for different values of the product of k with alpha, we get the following picture.

Screenshot (165)

Our model student always keeps learning but as her levels of learning increase her rate-of-learning drops because the marginal utility of the new knowledge diminishes. But because in this simple model there is no cost to learning she has no incentive to stop until she hits an exogenously determined maximum level of knowledge. Larger values of k times alpha give rise to steeper curves, but the slowing down is generic for all of them.

Some teachers might indeed consider this indeed a model student as she is definitely not procrastinating! She is learning rapidly initially and then slows down as she makes progress whereas a procrastinator would learn little initially and then rush to catch-up.

Back to the Game

In [1] I connect this with this notion of Entropy and the model of gradient-learning to show that in a game like [2] with ‘noisy’ opponents a player could seek to optimise the sum of the pay-off u[x] of her strategies and utility of her randomness S[x],

Screenshot (163)

where the constant in from of the Entropy S[x] is the ‘marginal utility of Entropy‘, i.e. the utility of a little more uncertainty in how she selects her strategy each turn. This allows me to plot a graph of how the learning curves of the student approach the optimal outcome depending on the marginal utility of Entropy. Depending on the value of beta these curves look like this.

Screenshot (162)

For low beta the optimal outcome of the learner is a probability distribution where she plays both strategies with equal likelihood. For high beta the optimal outcome is to play the strategy with the highest pay-off with near-certainty. The important thing here is to realize that beta is not a choice of the player but a measure of the disorder or randomness of her environment of fellow players and opponents.

Collective learning

There is a huge literature on collective or social learning models [3]. I want to keep it simple here and thus my question is: what could a simple model of gradient-learning students tells us about how their learning might interact. To study that in [1] I have looked at the simplest possible system of two model students. I have given them the following preferences concerning their own level of knowledge and that of their peer

Screenshot (172)

The two constants c12 and c21 describe the interaction between the two model students. If c12 and c21 are both larger than 0 this means that both students prefer to either both have a knowledge level below 1 or both above 1. If c12 and c21 are both negative then each model student prefers the knowledge level of their peer to be on the opposite side of the 1, relative to herself. If the c12 and c21 have different signs then one model student has one preference and the other has the other preference. So in a simplistic manner the coefficients c12 and c21 determine whether these agents are cooperative, antagonistic or whether we have one of each.

If we now assume that neither of them know this and both make their decision as how to learn on the basis of the marginal utility of their own level of knowledge we get the following set of coupled differential equations

Screenshot (166)

which we cannot solve exactly any more. But we can solve them numerically and compare them to the learning curve of the model student we discussed earlier whose preference only involves her own level of knowledge.

Cooperative learning

If we solve the two equations for the learning curves assuming that the model student are cooperative we get the following graphs.

Screenshot (167)

The dotted-line is the learning curve of the single model student who has no preferences regarding her peers. For early times she outperforms her two cooperative peers (solid and dashed curves), in the sense that she learns faster then the other two. But after a while the cooperative learners are both overtaking her despite the fact that their c12 and c21 are both positive but not equal.

The reason for this is actually quite simple. Early on all three students know very little, i.e. less than x=1. But the cooperative learners ‘suffer disutility’ from that and this negatively affects the marginal utility of their own knowledge stock. As a result, as gradient-learners they rate of learning is lower. But once they pass the threshold of 1 the opposite happens. The single learner does not suffer the initial disutility but also does not benefit in the later stages.

Antagonistic learning

If both c12 and c21 are negative you might view the collectively learning pair as antagonistic or perhaps competitive. Each agent prefer to be better and the other worse than 1, or the other way around. It is tempting to call this competitive but there is an element of ‘spite‘ in this preference, so I decided to call it antagonistic. If I compute a set of learning curves then this is what they look like.

Screenshot (168)

The dotted curve is again the single model-student and the antagonistic pair are the dashed and solid lines. In the phase of low levels of knowledge the antagonistic pair outperform the single learner but they loose that advantage as soon as enough time has passed. The reason for this is again very much similar to what we saw in the previous case, but now with opposite signs.

Can we draw conclusions from this?

Probably not! But I would like to speculate a bit. If we consider this a very simplistic example of a pair of students engaged in group-work compared with a student studying alone then this simulation suggests something I had not thought about before: timing of group work matters. The model student-pair can always outperform the single student (in this model) if they learn competitively in the early stages of the ‘course’ and cooperatively in the later stage of the ‘course’.

In real-world class rooms with real-world students things are never as simple and as clear as in the model-world of differential equations. The preferences of model-students are simply determined by me by shifting the values of some parameters. Nevertheless in the real world a teacher will have influence on these preferences through the incentives he/she sets as part of how the course- and class-work is organized. A word of caution is proper here as well: this model does not contain anything resembling a model-student’s “self-confidence” and the effects of collective learning on that variable could be extremely important for the learning outcomes achieved! Antagonistic learning could perhaps be so destructive for self-confidence of one of the pair that the damage outweighs any early-phase benefit. Nevertheless, the next time I am designing a course with group- or pair-work components I will definitely give some more thought about how to set it up for different parts of the course.

Utility of Entropy and social rewards for smart-asses

I started this post by contemplating the game in [2] and the role of ‘chaos’ or disorder, i.e. entropy, in the trade-off the learners are trying to make. Communities of learners do not necessarily reward individuals that show outstanding levels of knowledge. I think we would be wrong to interpret that as spite by the community or actively punitive habits. In part it is simply the entropy of the wider community irresistibly making its influence felt.

Look ahead

In my next post I want to deal with some of the fundamental flaws of the model discussed here. The model students here had no opportunity to choose their final outcomes, or to weigh up the costs and benefits of the learning curve against the rewards for a particular outcome. A final and third post in this series will explore the role of Entropy in much greater detail and depth.


Posts in the category “stuff I should know about” usually are paraphrased extracts from working papers I am making available here [1]. They are less detailed and technical than the working papers and as a result allow a little more for speculation!


[1] Witte F.M.C, The Econophysics of Learning: Uninformed learners in equilibrium, FW-03-(12/2019),;

[2] Gale J, Binmore KG, Samuelson L. Learning to be imperfect: The ultimatum game. Games and economic behavior. 1995 Jan 1;8(1):56-90;

[3] Mobius M, Rosenblat T. Social learning in economics. Annu. Rev. Econ.. 2014 Aug 2;6(1):827-47;

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: