I wrote this article right after I published the first version of instant-glicko-2. It is meant to document how to implement a Glicko-2 algorithm that allows for instant feedback after games.
Great! Glicko-2 is a very cool rating system. And a popular choice too! Lichess, CS:GO, and Splatoon 2 all use the Glicko-2 system.
It also offers unique advantages.
Glicko-2 is builds on the original Glicko rating system. Glicko aims to improve on Elo by adding a measure of rating uncertainty, the "ratings deviation" (RD).
Using this value, we can calculate a confidence interval in which the player's actual strenght most likely lies.
If r is the rating, the players actual strength is expected to lie between r - 2RD and r + 2RD in 95% of cases.
This RD value always decreases with every game the player plays - after all, a played game is a good clue to the player's actual strengh. And when the player doesn't play games, it decreases with the time of inactivity. So if a player stops playing rated games for a year, we are less certain about their strength coming back.
To achieve the RD decay over time, it also introduces a little devil named "rating period". But we'll think about that when we actually try to use Glicko-2 for our game.
Glicko-2 aims to further improve on Glicko by introducing another variable to the rating, the "rating volatility" (σ). This value describes the expected fluctuation in rating. If the value is high, the player is expected to have some high fluctuation in performance, and if it is low, they are expected to be very consistent. The value does not affect the confidence interval discussed above.
The average value will be higher for games that, for example, require some amount of luck, or where fewer games per match are played.
The value doesn't change during times of inactivity.
Now if you allow me to go on a small tangent here, I see potential for some great marketing in this rating volatility too. Give it a catchy name, and you can have stories about how players with a high X-Factor have the most exciting and dramatic performances. Anything can happen when they're on stage.
Meanwhile, players with a very low X-Factor are walls. They are extremely solid, experts in dealing with every playstyle you can throw at them, and they are a true test of strength. If you beat them, your improvement payed off. After all, beating them is very unlikely to be a fluke.
These stories happen organically in competitive games. For example, you'll hear a lot of commentators comment on how consistent and how much of a wall Dabuz is when he is on stage in Smash Bros. broadcasts. He even has been crowned "King of Consistency" by respected Smash Bros. community ranking authority PGstats.
The same article that names Dabuz as a very consistent player also names Marss as a player who is the opposite.
When Marss is hot, he is nigh unbeatable by anyone outside of the top 5 players in the world. When he is playing at his best, his potential is limitless. The problem is consistency [...].
I think the possibility to capture those stories in a value even for players who are not at the very top and who will not have such articles written about them is very exciting.
The implementation should be relatively straight-forward. We just look at the steps described in Glickman's paper, and we're good. Just one little problem...
Do you spot it?
I brushed away the little devil named "rating period" earlier, and now it's coming back to haunt us.
We can only calculate ratings when such a rating period completes, and they don't complete after every game! In fact, Glickman recommends that at least 10-15 games per player should happen every rating period. So this is something we need to work around if we want to show our players how their ranking changed after a game. There are multiple approaches.
One simple approach is described in a blogpost by Ryan Juckett titled "The Online Skill Ranking of INVERSUS Deluxe". But this approach also has drawbacks. The later blogpost "Additional Thoughts on Skill Ratings" adresses these, and proposes a potential solution.
This solution seems to be very similar or even identical to the one Lichess uses. And one great thing is: Lichess is open source!
The crux of the solution is to allow fractional rating periods.
We now can evaluate temporary ratings for a specific point in time in a rating period, and work with that.
The secret sauce can be found in the RatingCalculator class in the Lichess implementation.
Or, alternatively, in me own repo for which I stole it :).
So our new strategy for calculating a player's rating at a given point in time is:
- If necessary, close every rating period for our players that hasn't been closed yet and commit their rankings.
We do this by just performing the steps described in the paper. - Get every result for the player in the current rating period.
- Get the current player rating by using the results in the current rating period, as well as
Lichess'our cool fractional period secret sauce.
And that's it really.
Blogpost on how INVERSUS Deluxe implements Glicko-2
Blogpost on how the dev of INVERSUS Deluxe would want to implement Glicko-2

@frostu8 That's a very interesting effect, I have not seen this before. But I can confirm it also happens with my implementation. I think this might be a direct consequence of the maths used.

This equation calculates the new rating. Here
phi'is the new rating deviation in the internal glicko-2 scale and the sum sums over values of the games played in the period.The more games are played within the rating period, the lower
phi'will be. So maybe if you play a second game against a very low rated opponent,phi'^2drops low enough to offset the rating that game would add to the sum part.If this is true, playing more games in a rating period could indeed hurt your rating coming out of the period in the glicko-2 system (thought your rating deviation will of course also be lower). However, I'm not super familiar with the logic behind all the maths, so I can't confirm this. If you're really curious you could try to contact Dr. Glickman to ask about this.
As for how to fix this, if this is really an issue with the glicko-2 algorithm and not just with our implementations, there will be no simple way to fix it with this approach. You could maybe try to implement a minimal rating gain/loss per win/loss, but that wouldn't be exactly "true to glicko-2" if you care about that.
There is also one alternative approach to implementing glicko-2 that I was experimenting with some time back but didn't write about. The approach is to mostly ignore rating periods and to instead actually update ratings after every single game. When calculating the new rating, the new deviation is calculated using the fractional rating period approach, but other than that everything is standard glicko-2. One problem with this is that technically you want both the player's and the opponent's rating (especially the deviation) to represent the same point in time, in particular the time the player's rating was last updated. I'm not sure how possible that is since both ratings will have been updated at different points in time. If the opponent's rating was updated last, maybe you could try to calculate backwards to what the deviation would have been when the player's rating was last updated, but I'm not sure how reliable or desirable that would be. Maybe one would just need to accept some error when choosing this approach.
Because only a single game is rated at a time with this approach, I believe it would sidestep the issue. Of course also isn't exactly "true to glicko-2", but I think it's a logical approach to "glicko-2, but without the rating periods".