Inspired by mathematical models of learning in games I set-up a simplistic model of a learner that slowly seeks her way into learning new knowledge and analyse how in such a model two model students can interact. The results suggest something interesting regarding the timing of group-work in courses.

**Learning in Games**

One context in which Economists enjoy studying models of learning in when considering players engaged in some kind of a repetitive game. The predictions of game theory for the outcome of such games depend strongly on whether the game is played once or multiple times, possibly infinitely many times. But outcomes also crucially depend on whether all the players in the game have all the relevant information at their disposal. Whenever a game runs over multiple rounds and players have the opportunity to observe the behaviour of their fellow players they might mitigate any information deficiencies they have by *learning*. An example of such a study is the one by Gale, Binmore & Samuelson [2]. An interesting aspect of their analysis is the conclusion that players can actually *know too much*.

The argument for this is roughly as follows: suppose a player in a game can choose between two strategies but is unsure as to which strategy is the optimal one. As a result of the uncertainty this player’s strategic choices will vary between the two available strategies, i.e. there is a certain probability she will choose one and the complementary probability she will choose the other. Her strategy is a so-called *mixed* strategy. A convenient measure for how much information is lacking in this mixed strategy is the Entropy of the probability distribution. The highest entropy would be Log[2] when she is entirely indifferent between the two strategies she can play. The lowest entropy is 0 when she is always selecting one of the two with certainty.

*Knowing too much* in this context would mean that selects strategies with certainty whereas her opponents in the game are much more randomly responding to her. As a result her ‘calculation’ of the optimal strategy based on what optimal responses the others would play isn’t realised and hence her outcomes may be worse then they could be if she would select her own strategies with a slightly higher degree of uncertainty. The model in [2] demonstrates a situation where this is indeed the case.

That result makes you wonder: is the learning player in that game actually determining a trade-off between expected pay-off and the utility of Entropy?

**Gradient learning**

To study that question it would be good to have a more general model of what *learning* is. A popular model from computer science, AI, neural network theory and machine learning is that of *gradient-learning*. Let’s discuss for a moment what it is and how a human student might come to adopt gradient-learning as a strategy.

Suppose that a student’s preference for knowing an *amount or quality* *x* of knowledge is given by a utility function

The Log is always increasing with increasing *x*, so we are assuming the preference of this agent is always to learn more if she has the choice. The constant *k* is always assumed positive. Now if the agent would have full information about what maximal quantity she can learn and about the form of her preference for knowledge she might decide to simple learn the optimal, i.e. full, amount. But suppose the more realistic situation that she neither knows whether there is a maximal amount, nor what the exact form of her preferences is. Suppose that all she knows is the following: if her level of knowledge is *x* then all she knows is whether a small increase in *x* would generate a small increase in *utility* or not. Said slightly more technically: she knows her *marginal* utility of knowledge at the level she has obtained, but not her full utility function.

In such a situation the model student of this post could decide to pursue the following strategy. She checks her marginal utility of knowledge and then learns a small amount of knowledge proportional to that marginal utility. If we model this behaviour mathematically for the utility function given above, this leads to a so-called first-order differential equation that allows us to calculate the learning curve our model student will go through;

This is an equation we can actually solve exactly to find *x[t]* when we know the level *x[0]* at which this model student starts

If we plot these learning curves for our model student, for different values of the product of *k* with alpha, we get the following picture.

Our model student always keeps learning but as her levels of learning increase her rate-of-learning drops because the marginal utility of the new knowledge diminishes. But because in this simple model there is no cost to learning she has no incentive to stop until she hits an exogenously determined maximum level of knowledge. Larger values of *k* times alpha give rise to steeper curves, but the slowing down is generic for all of them.

Some teachers might indeed consider this indeed a *model student* as she is definitely not procrastinating! She is learning rapidly initially and then slows down as she makes progress whereas a procrastinator would learn little initially and then rush to catch-up.

**Back to the Game**

In [1] I connect this with this notion of Entropy and the model of gradient-learning to show that in a game like [2] with ‘noisy’ opponents a player could seek to optimise the sum of the pay-off *u[x]* of her strategies and utility of her randomness *S[x]*,

where the constant in from of the Entropy *S[x]* is the ‘*marginal utility of Entropy*‘, i.e. the utility of a little more uncertainty in how she selects her strategy each turn. This allows me to plot a graph of how the learning curves of the student approach the optimal outcome depending on the marginal utility of Entropy. Depending on the value of beta these curves look like this.

For low beta the optimal outcome of the learner is a probability distribution where she plays both strategies with equal likelihood. For high beta the optimal outcome is to play the strategy with the highest pay-off with near-certainty. The important thing here is to realize that beta is not a choice of the player but a measure of the disorder or randomness of her environment of fellow players and opponents.

**Collective learning**

There is a huge literature on collective or social learning models [3]. I want to keep it simple here and thus my question is: what could a simple model of gradient-learning students tells us about how their learning might interact. To study that in [1] I have looked at the simplest possible system of *two* model students. I have given them the following preferences concerning their own level of knowledge and that of their peer

The two constants c12 and c21 describe the interaction between the two model students. If c12 and c21 are both larger than 0 this means that both students prefer to either *both* have a knowledge level below 1 or *both* above 1. If c12 and c21 are both negative then each model student prefers the knowledge level of their peer to be *on the opposite side* of the 1, relative to herself. If the c12 and c21 have different signs then one model student has one preference and the other has the other preference. So in a simplistic manner the coefficients c12 and c21 determine whether these agents are cooperative, antagonistic or whether we have one of each.

If we now assume that neither of them know this and both make their decision as how to learn on the basis of the marginal utility of their own level of knowledge we get the following set of coupled differential equations

which we cannot solve exactly any more. But we can solve them numerically and compare them to the learning curve of the model student we discussed earlier whose preference only involves her own level of knowledge.

**Cooperative learning**

If we solve the two equations for the learning curves assuming that the model student are cooperative we get the following graphs.

The dotted-line is the learning curve of the single model student who has no preferences regarding her peers. For early times she outperforms her two cooperative peers (solid and dashed curves), in the sense that she learns faster then the other two. But after a while the cooperative learners are both overtaking her despite the fact that their c12 and c21 are both positive but not equal.

The reason for this is actually quite simple. Early on all three students know very little, i.e. less than *x=1*. But the cooperative learners ‘suffer disutility’ from that and this negatively affects the marginal utility of their own knowledge stock. As a result, as gradient-learners they rate of learning is lower. But once they pass the threshold of 1 the opposite happens. The single learner does not suffer the initial disutility but also does not benefit in the later stages.

**Antagonistic learning**

If both c12 and c21 are negative you might view the collectively learning pair as antagonistic or perhaps *competitive*. Each agent prefer to be *better* and the other *worse* than 1, or the other way around. It is tempting to call this competitive but there is an element of ‘*spite*‘ in this preference, so I decided to call it *antagonistic*. If I compute a set of learning curves then this is what they look like.

The dotted curve is again the single model-student and the antagonistic pair are the dashed and solid lines. In the phase of low levels of knowledge the antagonistic pair outperform the single learner but they loose that advantage as soon as enough time has passed. The reason for this is again very much similar to what we saw in the previous case, but now with opposite signs.

**Can we draw conclusions from this?**

Probably not! But I would like to speculate a bit. If we consider this a very simplistic example of a pair of students engaged in group-work compared with a student studying alone then this simulation suggests something I had not thought about before: *timing of group work matters*. The model student-pair can always outperform the single student (in this model) if they learn competitively in the early stages of the ‘course’ and cooperatively in the later stage of the ‘course’.

In real-world class rooms with real-world students things are never as simple and as clear as in the model-world of differential equations. The preferences of model-students are simply determined by me by shifting the values of some parameters. Nevertheless in the real world a teacher will have influence on these preferences through the incentives he/she sets as part of how the course- and class-work is organized. A word of caution is proper here as well: this model does not contain anything resembling a model-student’s “self-confidence” and the effects of collective learning on that variable could be extremely important for the learning outcomes achieved! Antagonistic learning could perhaps be so destructive for self-confidence of one of the pair that the damage outweighs any early-phase benefit. Nevertheless, the next time I am designing a course with group- or pair-work components I will definitely give some more thought about how to set it up for different parts of the course.

**Utility of Entropy and social rewards for smart-asses**

I started this post by contemplating the game in [2] and the role of ‘chaos’ or disorder, i.e. entropy, in the trade-off the learners are trying to make. Communities of learners do not necessarily reward individuals that show outstanding levels of knowledge. I think we would be wrong to interpret that as *spite* by the community or actively punitive habits. In part it is simply the entropy of the wider community irresistibly making its influence felt.

**Look ahead**

In my next post I want to deal with some of the fundamental flaws of the model discussed here. The model students here had no opportunity to choose their final outcomes, or to weigh up the costs and benefits of the learning curve against the rewards for a particular outcome. A final and third post in this series will explore the role of Entropy in much greater detail and depth.

**Disclaimer:**

Posts in the category “*stuff I should know about*” usually are paraphrased extracts from working papers I am making available here [1]. They are less detailed and technical than the working papers and as a result allow a little more for speculation!

**Bibliography**

[1] Witte F.M.C, *The Econophysics of Learning: Uninformed learners in equilibrium*, FW-03-(12/2019), https://ucl.academia.edu/FrankWitte#papers;

[2] Gale J, Binmore KG, Samuelson L. Learning to be imperfect: The ultimatum game. Games and economic behavior. 1995 Jan 1;8(1):56-90;

[3] Mobius M, Rosenblat T. Social learning in economics. Annu. Rev. Econ.. 2014 Aug 2;6(1):827-47;