Home Artificial Intelligence Create a Data-Driven Elo Rating System for 2v2 Games The Elo Rating Rating Algorithm Exploration Database Design and Modeling Web Application Development Application Features Data Visualization Conclusion

Create a Data-Driven Elo Rating System for 2v2 Games The Elo Rating Rating Algorithm Exploration Database Design and Modeling Web Application Development Application Features Data Visualization Conclusion

0
Create a Data-Driven Elo Rating System for 2v2 Games
The Elo Rating
Rating Algorithm Exploration
Database Design and Modeling
Web Application Development
Application Features
Data Visualization
Conclusion

From friendly matches to intense competition, foosball has found its area of interest in corporate culture, providing a novel way for teams to attach and compete.

This text explores the mathematics behind a 2v2 Elo-based scoring system that will be applied to foosball or every other 2v2 game. It also examines the architecture that supports data processing, and presents the creation of an online application that gives real-time rating and data evaluation using Python.

The Elo rating system is a technique used to find out the relative skill level of a player in a zero-sum games. It was first developed for chess but is now being applied as a rating system in quite a lot of other sports reminiscent of baseball, basketball, various board games and e-sports.

One well-known example of this method is in chess, where the Elo rating system is employed to rank players worldwide. Magnus Carlsen, also often known as the “Mozart of Chess”, holds the very best Elo rating on the earth with a rating of two,853 in 2023, demonstrating his extraordinary skills in the sport.

The Elo rating formula is a two-part formula: first, it calculates the expected final result for a given group of players, after which it determines the rating adjustment based on the final result of the match and the expected final result.

Expected Consequence Calculation

Consider the next example in chess with Player A and Player B with rankings R𝖠 and R𝖡 respectively. The equation for the expected rating of Player A against Player B is the next:

The Elo algorithm uses a variable that will be adjusted to regulate how the winning probability is influenced by the players’ rankings. In this instance, it is ready to 400, which is typical for many sports, including chess.

Now let’s take a take a look at a more realistic example, where player A has a rating of 1,500 and Player B, 1,200.

The identical equation seen above can calculate Player A’s expected rating against Player B:

With this calculation, we all know that Player A has a 84.9% probability of winning against Player B.

To search out the estimated probability of Player B winning against Player A, the identical formula is used, however the order of rankings is reversed:

The sum of the possibilities of Player A winning and Player B winning equals 1 (0.849 + 0.151 = 1). On this scenario, Player A subsequently has an 84.9% probability of winning, leaving Player B with only a 15.1% probability.

Rating Calculation

The difference in rating between the winner and the loser determines the full variety of points won or lost after each game.

  • If a player with a much higher Elo rating wins, they’ll receive fewer points for his or her victory, and their opponent will lose only a number of points for his or her defeat.
  • In contrast, if the lower-ranked player wins, this achievement is taken into account rather more significant, thus the reward is bigger and the higher-ranked opponent is penalized accordingly.

The formula to calculate the brand new rating of Player A playing against Player B is the next:

On this formula, ( S𝖠 — E𝖠 ) represents the difference between Player A’s actual rating and the expected rating. The extra variable K determines roughly how much a player’s rating can change after a single match. In chess, this variable is ready to 32.

If Player A wins, the actual rating, which is 1 on this case, shall be greater than the expected rating of 0.849, making a positive variance.

This means that Player A performed higher than initially expected. Consequently, the Elo rating system recalibrates the rankings for each players:

  • Player A’s rating will increase due to the win
  • Player B’s rating will decrease due to the loss

Once more, this same equation can calculate the brand new rating of Player A and Player B:

In summary, the Elo rating system offers a sturdy and efficient method for evaluating and comparing players’ skills dynamically and fairly. It continually updates a player’s rating after each match, considering the skill difference between the 2 opponents.

This approach rewards risk-taking, as winning against a higher-rated player ends in a more significant increase in a player’s rating, as shown within the table below:

FIGURE I : Example of the Elo System in Chess | Table by the creator

Then again, if a higher-rated player goes against their winning probability and loses against a lower-rated player, their rating shall be significantly impacted: they’ll lose more points, and their opponent will gain more points.

In summary, when a player wins a match, the lower their winning probability is, the upper the quantity of points they’ll win.

In its current state, this rating formula, originally designed for chess, is just not fully adapted to foosball.

In actual fact, foosball does have more variables than chess reminiscent of:

  • It’s a four-player game with teams of two (2v2)
  • Each team member can positively or negatively influence their teammate
  • Unlike the binary final result in chess, the size of victory or defeat in foosball can vary considerably depending on the teams’ scores

The main target here is on adapting the Elo rating system to the unique requirements of foosball games, involving 4 players divided into two teams.

Winning Probability

To start calculating recent player rankings, a refined formula must be established to find out the expected final result of a game involving 4 players in two teams.

To show this, consider a hypothetical four-player foosball game scenario: Player 1, Player 2, Player 3, and Player 4, each with a distinct rating that represents their skill level.

FIGURE II: Scenario with 4 Players Playing Foosball | Table by the creator

To calculate the expected rating of Team 1 against Team 2 within the revised Elo rating system, the expected rating of every player involved in the sport must be determined.

Player 1’s expected rating, denoted by E𝖯𝟣, will be calculated by averaging the sum of every opponent’s rating using the Elo rating formula as follows:

After extensive testing, it was decided that it could be appropriate for the expected rating formula to set the variable used to divide the rating difference to 500, somewhat than the normal value of 400 utilized in chess. This increased value implies that a player’s rating may have a smaller impact on their expected rating.

A primary reason for this adjustment is that, unlike chess, there’s a slight element of probability in foosball. Through the use of a worth of 500, the sport outcomes will be more accurately predicted, and a reliable rating system will be developed.

To calculate the expected rating of Player 2 denoted by E𝖯𝟤, against Player 3 and Player 4, the identical method as utilized for Player 1 will be employed.

The expected rating of the Team denoted E𝖳𝟣 can then be calculated by taking the typical of E𝖯𝟣 and E𝖯𝟤 :

Once the expected scores for every player are computed, they’ll then be used to calculate the final result of the match. The team with the very best expected rating is more more likely to win. By averaging the expected scores for every team member, the difficulty of skill differences throughout the team can then be solved !

The table below shows the expected scores of Player 1 and a couple of against Players 3 and 4.

  • P1’s expected scores against P3 and P4 are 0.091 and 0.201, corresponding to a 14.6% probability of winning
  • P2’s expected scores against P3 and P4 are 0.201 and 0.387, giving a combined winning probability of 29.4%
  • For P1, partnering with a stronger player like P2 can increase their overall possibilities of winning, as demonstrated by the 22%
FIGURE III : Expected Rating Based on the Scenario Shown in Figure II | Table by the creator

If the team of P1 and P2 wins, P1 gains fewer points than their individual expected rating would suggest, as P2, who’s higher ranked, also contributes to the win and lowers their overall winning probability.

Then again, P2 gains more points as a consequence of having a lower- ranked teammate. In case of a win, P2 is rewarded for taking a risk, while P1 earns fewer points, because it is assumed P2 contributed more significantly to the victory, and vice versa in the event that they lose.

Rating Parameters

Now that the expected final result of a four-player match has been determined, this information will be incorporated right into a recent formula that considers multiple variables that affect the match and player rankings.

As discussed earlier, the K-value will be modified to higher fit the needs of the rating system. This recent formula considers the variety of games played by each player, reflecting their seniority in addition to the results of the sport.

For instance, within the 2014 World Cup semi-final, Germany defeated Brazil by a rating of seven–1. This was one of the shocking and humiliating ends in World Cup history, as Brazil was the host nation and had never lost a competitive match at home since 1975.

If we were to use the rating system to this match, we’d expect Germany to achieve a major amount of points, while Brazil would lose a considerable amount of points, reflecting the difference of their performance and skill level.

K-Value
The K-rating, denoted as K𝟣 for Player 1 on this case, determines how much a player’s rating will change after one game. This revised K-value takes under consideration the variety of games the player has played to balance the effect of every game on their rating. After conducting quite a few tests, a formula was developed for calculating the K-value for every player.

For Player 1, that is expressed as:

This formula for the K-value is designed to have a greater impact on the rating for brand spanking new players while providing stability and fewer rating fluctuation for knowledgeable players. Specifically, after playing 300 games, a player’s rating becomes more representative of their skill level.

Chart by the author
FIGURE IV: K-value Over Time | Chart by the creator

Figure IV shows the effect of the variety of games played on the K-value. Starting at 50, this graph shows that the K-value decreases because the variety of games played increases, reaching a halved value of 25 after 300 games. This ensures that the impact of every game on a player’s rating decreases as experience increases.

Point Factor
To contemplate the points scored by each team, a brand new variable, called the “point factor”, was introduced into the equation. This factor multiplies the K parameter of every player and relies on absolutely the difference in points between the 2 teams. The impact of a match should be greater when a team wins by a big margin, i.e., an awesome victory.

To calculate the purpose factor, the next formula was used:

This formula takes absolutely the difference between the scores of the 2 teams, adds 1, and computes the base-10 logarithm of the result. This value is then cubed and a couple of is added to the result to acquire the ultimate value of the purpose factor.

FIGURE V: Point Factor | Chart by the creator

Final Rating Calculation

After adjusting all of the mandatory variables, an improved formula was developed to calculate the brand new rating of every player involved in a game.

Each player’s rating now takes under consideration their previous rating, the rating of their opponents, the impact of their teammates, their playing history, and the rating of the sport. This formula ensures that every player is rewarded in accordance with their true performance, bearing in mind the fairness of every match.

Going off of the previous example, the brand new formula for player A’s rating is the next:

This improved formula rewards players based on their actual performance, encourages risk taking and provides a more balanced rating system for each recent and experienced players.

Now that we’ve got an Elo algorithm, we are able to move on to database modeling.

The proposed database model adopts a relational approach, organizing data into interconnected tables through using Primary Keys (PKs) and Foreign Keys (FKs). This structured organization facilitates data management and evaluation, making PostgreSQL an appropriate selection because the database management system. PKs and FKs help maintain data consistency and minimize redundancy throughout the database.

FIGURE VI: Diagram Model of the Database | Image by the creator

Two sorts of relationships exist between tables on this database model: one-to-many and many-to-many.

The connection between the ‘Player’ table and the ‘Match’ table is many-to-many since a player can take part in quite a few matches, and multiple players will be involved in a single match. A junction table called ‘PlayerMatch’ bridges this relationship, containing two foreign keys: ‘player_id’ (referencing the participating player) and ‘match_id’ (referencing the corresponding match).

This structure ensures the accurate association of players and matches as demonstrated within the code below:

CREATE TABLE PlayerMatch (
player_match_id serial PRIMARY KEY,
player_id INT NOT NULL REFERENCES Player(player_id),
match_id INT NOT NULL REFERENCES Match(match_id)
);

The same logic applies to the ‘TeamMatch’ table, which serves as a junction between the ‘Match’ and ‘Team’ tables, allowing multiple teams to play one match and one match to involve multiple teams.

Separate tables for ‘PlayerRating’ and ‘TeamRating’ have been designed to streamline rating evaluation over time. These tables connect with the ‘PlayerMatch’ and ’TeamMatch’ tables respectively through ‘player_match_id’ and ‘team_match_id’.

Data Integrity

Along with using PKs and FKs, this database model also uses appropriate data types and CHECK constraints for data integrity:

  • The ‘winning_team_score’ and ‘losing_team_score’ columns within the ‘Match’ table are integers, stopping non-numeric entries
  • CHECK constraints implement that the ‘winning_team_score’ is precisely 11
  • CHECK constraints implement that the ‘losing_team_score’ is between 0 and 10, adhering to the sport rules

As seen within the code chunk below, using sequences for every primary key has been implemented within the database creation to facilitate data entry. This automation simplifies the general procedure when later using the Python loop for the information entry process.

CREATE SEQUENCE player_id_seq START 1;
CREATE SEQUENCE team_id_seq START 1;
CREATE SEQUENCE match_id_seq START 1;
CREATE SEQUENCE player_match_id_seq START 1;
CREATE SEQUENCE player_rating_id_seq START 1;
CREATE SEQUENCE team_match_id_seq START 1;
CREATE SEQUENCE team_rating_id_seq START 1;

Data Processing

The major challenge was to search out a option to process the match data in a sequence that might allow for the retrieval of the IDs from the initial data that was being processed and inserted into the database.

These particular IDs could then function foreign keys to administer the remaining data, creating the mandatory relationships in the method. In other words, step one was to discover and store specific data (IDs) from the raw data, after which use these IDs as a bridge to link and process the remaining of the information.

The information was processed step-by-step, using increasingly complex Python loops. Each recent entry was assigned a novel primary key generated from the table’s sequence.

  1. Step one was to handle the person players and procure their IDs.
  2. Next, teams were processed using the player IDs. For every unique pair of players in a match, an entry was created within the ‘Team’ table (FK players)
  3. Following this, matches were handled using the winning and losing team IDs. After processing the matches, the ‘PlayerMatch’ and ‘TeamMatch’ tables were addressed by retrieving the corresponding match, player, and team IDs
  4. Once all of the mandatory data had been processed, the ‘PlayerMatch’ and ‘TeamMatch’ IDs, together with the ‘match’ timestamps, were utilized in the ‘PlayerRating’ and ’TeamRating’ tables to trace the evolution of rankings over time.

The target of the online application is to permit users to enter game results, confirm data, and interact directly with the database. This ensures that the information is up-to-date and offered in real time in order that users are all the time in a position to access rating or visualize their metrics.

Moreover, I desired to make the online app mobile-friendly, because who would need to drag a laptop around to play foosball? That will not be very practical or fun.

Technology Stack

Backend
After comparing Django and Flask, two popular web frameworks for constructing web applications in Python, Flask was chosen for its beginner-friendly approach. The Flask web framework is used to handle user requests, process data, and interact with the PostgreSQL database.

Frontend
The frontend consists of static HTML and CSS files, which define the structure and styling of the online application. JavaScript is used for form validation and handling user interactions. This ensures that the information submitted by users is consistent and accurate before being sent to the backend.

Data Visualization
With regards to data visualization, the most important challenge is having up-to-date data. To beat this limitation, the information visualization layer uses Plotly, a Python library, to generate interactive charts and graphs that visualize player rankings over time. This component receives data from the backend, processes it, and presents it to users in a user-friendly format.

Database
PostgreSQL was used for each the local development environment in addition to the production environment on AWS, via Heroku. Automatic database backups are facilitated by Heroku, ensuring that data is protected and will be easily restored if mandatory.

UI/UX Research

For the UI/UX design, inspiration was drawn from the trendy web designs of Spotify and the brand new Bing search engine. The goal was to create a well-recognized and intuitive user experience.

FIGURE VII: Mockup of the Application | Image by the creator

Let’s dive into the features of the applying with a concrete scenario. Team 1 (Matthieu and Gabriel) desires to play against Team 2 (Wissam and Malik). All players have a distinct rating that’s representative of their skill level, shown below.

Calculate Odds

The very first thing players need to do before any match is to calculate their winning probability.

To achieve this, the “Calculate Odds” view allows users to pick out 4 players using the drop-down menu and generate the winning probability for the chosen teams.

FIGURE VIII: Calculate Odds | Image by the creator

This feature is primarily used before a game to confirm that a match is balanced and to tell players about their winning probability. For instance, Team 1 has a better probability of winning (64.19%) than Team 2 who has a 35.81% probability of winning. This view informs each player of the stakes and the chance taken.

Once the shape is submitted, the applying computes only the primary a part of the algorithm, which consists of calculating the expected final result of a game given the 4 chosen players.

Upload a game

The “Upload a Game” view serves the house page of the applying. It’s designed for user convenience, allowing them to upload a game immediately upon opening the app.

FIGURE IX: Upload a Game & Match Uploaded | Image by the creator

Before the shape is submitted, the applying performs data validation using JavaScript to make sure:

  • 4 different players are chosen
  • Scores are non-negative integers
  • There is just one winning team with a rating of exactly 11, with no draws allowed

When the validation is successful, the applying processes the information using the complete algorithm, updates the corresponding tables within the database, and offers users a confirmation of their upload.

The “Match Uploaded” view is designed to point out users the effect of every match on their individual rankings. It calculates the difference between the players’ rankings before and after the match was uploaded.

As shown above, the sport doesn’t have the identical effect on each player’s rating. It’s because of the person parameters of the algorithm on each player: their expected rating, their variety of games, their teammate and the opposing team.

Elo Rating

The “Player Rating” view allows users to access the real-time monthly rating and compare themselves with other players. Users can see their rating, the variety of games they played throughout the month, and the last game they played showcasing their latest rating.

FIGURE X: Player Rating | Image by the creator

Once the “Player Rating” view is accessed or a brand new period is submitted, the applying queries the database using a CTE approach.

This involves joining all mandatory tables and displaying essentially the most recent rating update, using the period selector to filter the query:

def get_latest_player_ratings(month=None, yr=None):
now = datetime.now()
default_month = now.month
default_year = now.yr
selected_year = int(yr) if yr else default_year
selected_month = int(month) if month else default_month
start_date = f'{selected_year}-{selected_month:02d}-01 00:00:00'
end_date = f'{selected_year}-{selected_month:02d}-{get_last_day_of_month(selected_month, selected_year):02d} 23:59:59'

query = '''
WITH max_player_rating_timestamp AS (
SELECT
pm.player_id,
MAX(pr.player_rating_timestamp) as max_timestamp
FROM PlayerMatch pm
JOIN PlayerRating pr ON pm.player_match_id = pr.player_match_id
WHERE pr.player_rating_timestamp BETWEEN %s AND %s
GROUP BY pm.player_id
),
filtered_player_match AS (
SELECT
pm.player_id,
pm.match_id
FROM PlayerMatch pm
JOIN max_player_rating_timestamp mprt ON pm.player_id = mprt.player_id
),
filtered_matches AS (
SELECT match_id
FROM Match
WHERE match_timestamp BETWEEN %s AND %s
)
SELECT
CONCAT(p.first_name, '.', SUBSTRING(p.last_name FROM 1 FOR 1)) as player_name,
pr.rating,
COUNT(DISTINCT fpm.match_id) as num_matches,
pr.player_rating_timestamp
FROM Player p
JOIN max_player_rating_timestamp mprt ON p.player_id = mprt.player_id
JOIN PlayerMatch pm ON p.player_id = pm.player_id
JOIN PlayerRating pr ON pm.player_match_id = pr.player_match_id
AND pr.player_rating_timestamp = mprt.max_timestamp
JOIN filtered_player_match fpm ON p.player_id = fpm.player_id
JOIN filtered_matches fm ON fpm.match_id = fm.match_id
GROUP BY p.player_id, pr.rating, pr.player_rating_timestamp
ORDER BY pr.rating DESC;
'''

The first goal in developing this comprehensive solution was to offer users with a real-time rating system that serves as a visible representation of every player’s performance.

Although powerful tools like PowerBI and Qlik can be found for data visualization, a completely mobile-compatible solution was chosen, allowing users to achieve real-time insights on their devices without incurring licensing fees.

Two methods were utilized to realize this:

  • First, Dash Plotly, a Python framework that permits developers to construct interactive, data- driven applications on top of Flask applications, was used
  • Second, various SQL queries and static HTML pages were employed to tug information from the database and display it, ensuring that users all the time have access to real-time data

Rating Evolution

This visualization allows players to look at the impact of every game on their rating and to discover broader trends. For instance, they’ll see exactly when someone overtakes them or see the impact of consecutive wins or losses.

FIGURE XI: Rating Evolution | Image by the creator

When accessing the “Rating Evolution” view, the applying performs a question on the database for every chosen player, retrieving essentially the most recent rating update for every day a game was played:

SELECT DISTINCT ON (DATE_TRUNC('day', m.match_timestamp))
DATE_TRUNC('day', m.match_timestamp) AS day_start,
CASE WHEN p.first_name = '{player}' THEN pr.rating ELSE NULL END AS rating
FROM PlayerMatch pm
JOIN Player p ON pm.player_id = p.player_id
JOIN PlayerRating pr ON pm.player_match_id = pr.player_match_id
JOIN Match m ON pm.match_id = m.match_id
WHERE p.first_name = '{player}'
ORDER BY DATE_TRUNC('day', m.match_timestamp) DESC, m.match_timestamp DESC

The retrieved data table is then transformed right into a line chart, with the columns converted into axes using Dash.

To cut back the database load and simplify the information presentation within the chart, only the newest rating update is displayed for every day.

Player Metrics

Inspired by Spotify Wrapped, the thought is to offer insights derived from constant data collection. While there’s immense potential to visualise player insights, the main target is on metrics that highlight individual performance and connections between players.

FIGURE XII: Player Metrics | Image by the creator

These metrics are organized into three color-coded categories: partner, games, and rivals, with each metric accompanied by a title, a worth, and a sub-measure for more detail.

Game Metrics
These metrics are centered on the screen and displayed in blue for neutrality. They include the full variety of games played since data collection began.

Partner Metrics
The partner metrics appear on the left side of the screen. They’re displayed in green due to their positive connotation.

  • The highest box highlights the first partner with whom the chosen player has played essentially the most games
  • The second metric identifies the player’s best partner. That is defined by the very best winning percentage
  • The third metric on this category is the chosen player’s worst partner That is calculated based on the bottom win percentage (or highest loss percentage)

Rival Metrics
Rival metrics are displayed in red to point opposition. Rival metrics represent the competitive relationship between players.

  • The highest box shows essentially the most common opponent, with a sub-metric indicating the variety of games played together, just like the partner metrics
  • The second metric, “Easiest Rival”, represents the opponent against whom the player has the very best win rate. This means a weaker opponent
  • The ultimate metric is the player against whom the chosen player has the bottom win rate. This metric indicates essentially the most difficult opponent

As I write this, it’s been 6 months that the applying has been in use, and these are the outcomes to this point:

  1. This rating system based on the Elo system predicts match results and accurately ranks players based on their actual performance
  2. Players have change into more competitive, as they at the moment are increasingly aware of their performance as a consequence of data visualization
  3. Players have change into more inclusive due to an improved formula that rewards players who take risks. Players who wouldn’t normally play together now have the inducement to pair up

By adopting a data-driven strategy, this project has highlighted the profound influence and importance of knowledge.

Going beyond easy evaluation of player performance, this project has initiated a change in the way in which players approach foosball games and interact with other players in addition to newcomers. The ability of knowledge has truly cultivated a more inclusive and competitive environment.

LEAVE A REPLY

Please enter your comment!
Please enter your name here