Tilt and Hype in Online Chess

"The winner is the one who makes the next-to-last mistake."—GM Savielly Tartakower

Online chess has made it extremely accessible for people to connect with others around the world and play the game. Because of how easy it is to access a server and begin playing (especially on free sites like lichess), people often play many games in a row.

To those who do not play much chess, it would seem plausible that this is an ideal situation: someone can play a game, learn from their mistakes and successes, and start a new game immediately, with knowledge of the last game fresh in mind. From this perspective, one might think that each game is a new opportunity to apply the skills from the last game and that progress is somewhat linear.

Anyone who has played many chess games in a row will quickly tell you that this is not the case. Rather than learn from their mistakes in a past game, they know that losing a game can make frustrate them and affect their performance on a subsequent game—a feeling commonly referred to as "being tilted". Convesely, it's common to feel invigorated by a great victory and feel as though we play better afterwards— a feeling I call "being hyped".

These intuitions are informal, to be sure, but gel nicely with the psychological literature on sequence effects: how one decision affects a subsequent decision. For example, people often believe that scoring or winning at one time makes them more likely to score/win again(the so called "Hot hand effect").

To test whether tilt and hype affect performance in online chess, I analysed ~1 million chess games from 2014, from lichess' free database of online chess games. These results are by no means complete (I just analysed them for fun in my free time). At the end, I'll highlight all the additional work that could be done to shore up these effects and explore them in more detail. If this interests you and you'd like to take it to the next step, shoot me an email. :)

In the next section I summarize how I processed the data and the statistical techniques I used to model them. If this doesn't interest you, just skip to the results. Also, throughout the post I'll include statistics in brackets (e.g., b=some number, p < .0001). Again, if you're just here for the big picture, ignore that too!

Data processing and statistical approach

Warning! Statisitics ahead. Read at your own risk.

After downloading the data, I preprocessed it using some custom code in R. For simplicity, I removed games that were drawn from the dataset. This simplified the analyses, by creating binary outcomes (lose=0, win=1) and only affected a small proportion of games (less than 3%). I also only looked at games by players who played at least 2 games played in the database.

I modeled the data using mixed-effects logistic regression, fit using the lme4 package in R. This approach accounted for dependencies in the data between games played by the same players, explictly modeling trends within players' games and reducing the influence of players who played very few games. Continous variables were centered at their mean, within-player, and binary variables were effects (deviation) coded. Because multilevel models sometimes fair poorly when the number of level-1 units (games, in this case) is very uneven across clusters (players), I also ran a fixed-level logistic regression, which yielded largely the same results.

Finally, at the end, I modeled the sequential effects using a fixed-level logistic model with a decay parameter on the outcomes of past games. I'll describe this model in more detail at the end, as well as showcase it's predictive power. All code is available on Github.

Does losing or winning the last game affect your chances of winning or losing the next game?

Excluding draws, the average likelihood of winning a game was 50.07%. This is somewhat of a non-statement, since every game has to have a winner and loser.

What is more interesting is that this likelihood is not the same for all games. Specifically, the odds of winning a game depends on the outcome of the last game someone played (b = 0.25, SE = 0.0014, p = < 2.22e-16).

As can be seen in this graph, one is significantly more likely than average to lose a game they have just lost, and more likely to win a game they have just won.This matches the tilt vs. hype feeling I described earlier, where the sting of a loss lingers into the next game, handicapping one's performance, but the surge of victory invigorates one's play.

How far back matters?

Similarly, many players have a sense that these feelings can persist longer than just the last game, in effect creating losing and winning streaks.

This is exactly what we see in the data, where the horizontal axis here shows the number of games back, with 1 indicating the last played game, and the vertical axis showing the (symmetrical) effect of that game on the current game. In other words, larger values on the vertical axis mean that the outcome of the game previous X steps back strongly predicted the outcome of the current game, either worsening a player's chances of winning if they lost in the past or bolstering them if they won. Even the outcome of 7 games ago seems to have some bearing on the outcome of the current game b = 0.12.

Tilt and hype are persistent...

Do breaks reset tilt/hype?

A common strategy to break out of a funk is to take a break—the idea being that some time away will "reset" someone's play.

This seems to be borne out by the data as well: longer breaks help reduce the effect of the last game on the current game. Conversely, shorter breaks worsen the tilt/hype effect. This corresponds to what a lot of players feel when it comes to having "good or bad days"—when they play many games in the same day and feel like they are on top of the world or completely in the dumps based on their track record.

Combined with the result in last section, it's clear that players can end up in a slump or spree mentality, creating streaks of success or failure, especially when games are played close in time.

Do higher-rated players get tilted/hyped?

You might think that with experience comes wisdom, so that higher rated players become less upset over losing and less elated over winning. If this were true, we should see that they are less sensitive to the hype and tilt effects we've been discussing.

According to the data, this doesn't seem to be the case. While statistically significant (p < 2.22e-16), this effect is very small and we can clearly see from the figure above that while the overall probability of winning increases as players become more highly-rated, the influence of the last game remains strong even up into elite players. So even pros get tilted and hyped based on their last games' outcomes.

Do tilt and hype matter for easier games?

If global rating doesn't matter for the tilt/hype effect, maybe the ease of the game does. For instance, even if I'm tilted, I may still feel like I can beat someone much lower rated than me, thus reducing the tilt.

Again though, we see that the last game still influences the current game. Even when controlling for the difference in ratings between the players, which is obviously a strong predictor of who wins the game (p < 2.22e-16), the last game still exerts an important influence over the the actual outcome of the game. In the figure above, we can see that as the discrepancy between players' skill levels increases (in either direction), the likelihood of winning follows, but there is always an important difference based on if the player won or lost the last game.

Case example

So far we have seen evidence that the last game(s) of online chess someone plays influences their likelihood of winning the current game. But what does this really mean for a given player? Above I've included a figure of the rolling average of a given player (username not included). Here, we can clearly see trends that match the tilt/hype effect. For instance, Around game 131, the player starts to win and contines to win until a series of loses around game 211, wherein a losing streak begins.

This pattern of streaks and slumps is characteristic of a tilt/hype effect, where the effects of the last games bleed into the current games and creates these peak and valleys. Notably, I chose this user because they played many games during the time these data were sampled, which really showcases the tilt/hype effect over a long time-frame.

Decay Model

If you're not interested in statistics, you can skip this part and go straight to the conclusion!

In the models I ran above, I only included the last game or some games back as predictors. This worked to get the main point across, but isn't very general. To fix this, I fit a simple logistic model with a decay factor to these data. The model has the following form:

\[ \delta_{i} = \beta_0 + \beta_1\textrm{ELO}_{i} + \beta_2\textrm{Rating Diff}_{i} + \beta_3\bar{w}_{i} \\ p(\textrm{Win}_{i}) = \frac{1}{1+e^\delta}, \]

where \(\delta_{i}\) is the log-odds of winning game \(i\), \(\text{ELO}_i\) is a player's centered and scaled ELO rating at the time of the game, \(\text{Rating Diff}_{i}\) is the rating difference between players on game \(i\), \(e\) is Euleur's number, and \(\bar{w}_{i}\) is the exponentially weighted sum of past wins up to game \(i\), which is updated according to

\[ \bar{w}_{i+1} = \lambda\bar{w}_{i} + o_i, \] where \(\lambda\) is the decay parameter and \(o_i\) is the outcome of the game \(i\) (0 = lose, 1 = win). All together, there are 4 free parameters to fit (\(\beta_0\), \(\beta_1\), \(\beta_2\), \(\beta_3\), and \(\lambda\)). The key value of interest here for our purposes is \(\lambda\), which, if larger than zero, reflects at least some degree of influence of past games on the outcome of the present game. This model is therefore a more general model of sequence effects than the linear models presented earlier (though it has other limitations, discussed at the end).

Fitting this model (code here) to each user's game data separately using maximum likelihood estimation shows that 85.94% of estimated \(\lambda\) parameters are above zero, indicating some degree of exponential decay in the influence of last games on present games.

Another advantage of this model is that it is generative and can be used to predict behaviour. Predicting choice probabilities using the equations above and per-subject estimated parameters, sampling wins/losses for each user 1000 times, we can see that the model does a reasonably good job at capturing the basic hype/tilt effect, though it overestimates it a little. At the end of the next section I'll dicuss ways this model could be improved, but given its simplicity and qualitatively good performance, it seems like it is capturing something clearly present in the data: namely, that the past influences in the future when it comes to winning games in online chess.

Conclusions

In this blog post, using over a million online chess games, I've shown that players' performance on the last game influences their performance on subsequent games. I've called this the tilt/hype effect, wherein players who do poorly get tilted (i.e., frustrated), do worse, and suffer, whereas players who win get hyped (i.e., invigorated), do better, and succeed. This explains the common feeling chess players have when they feel like they are on a win/lose streak and the methodological approaches I've adopted in analyzing the data connect this feeling to more fundamental theories in cognitive science and psychology.

To be done

I have really only scratched the surface here. In no particular order, I've included a (only partial) list of other analyses and things that could be done to improve this work and further the argument I've made. If you've read up until this point and found this interesting enough to get involved in, don't hestiate to reach out!

In these analyses, I don't consider in-game variables at all. On the one hand, it is interesting to see that even by ignoring what happens in the game (what most people would think matters most), we are still relatively well-able to predict wins versus losses. On the other hand, it would be interesting to show how tilt/hype manifests in-game, possibly via the use of engine anlaysis to demonstrate where people go wrong after losing a past game. One likely hypothesis is that they are immediately frustrated after losing and blunder the opening. If so, it could also provide a mechanism for why breaks help.
Some of the predictors in the models are almost definitely correlated (ELO and rating difference for instance). Some more robust models could help correct for this.
The decay model I present is fixed-level, while the data are inherently nested. Therefore a multilevel version of it would likely be in order. Additionally, some more formal statistics could be performed on the estimated parameters to determine whether they meaninfully add to the model; especially the decay parameter. It's likely this is the case given how many people have above-zero \(\lambda\) values, but it would nonetheless be important to show this formally via model comparison, Bayesian bootstrapping, or some other combination of methods.

Thanks for reading! :)

-Sean

Back to blogposts

Back to main site