Hattrick University

General
Understanding Hattrick's game engine may be divided to two parts:
1. The relation between a team's players and its match ratings.
2. The relation between a team's match ratings and the match results.
Understanding the relation between ball posession (which is based on both team's midfield ratings) and the number of attack opportunities that each team gets in a match is crucial for undertanding the game engine.



Game Engine Model
In order to be able to find the function that describes the relation between ball posession in one half and the number of attack opportunities that each team gets we should first decide on a model for the problem:
1. Attacks in a hattrick game can be divided to three types: regular, counter-attacks, and special events. We assume that ball posession effects only the reguar attacks, and that in each half there are a total of five such attacks for both teams combined. Out of the five regular attacks each half, only those that resulted in a goal or almost a goal are listed in the match report.
2. Ball posession depends on several factors and might change during a game if certain situations take place. Some of these situations are reported, such as a player's injury or a red card, and some are not reported, such as a drop in a player's midfield contribution due to bad stamina. The reported ball posession in each half is correct only for the last minute of that half.
3. For every occurence of a regular attack, the current ball posession ratio is taken. According to this ratio, probabilities are given for giving an attack to the home team and to the away team.
Based on these probabilities, it is randomly decided which team will receive the attack opportunity.

Methodology
Our goal is to find the probability of number of chances per team as a function of ball posession. We will do this in two steps:
1. Finding the estimated number of chances for each ball posession value.
2. Finding a function that represents our results from step 1.

Finding the estimated number of chances
In order to do this, we will collect all match halves with a specific ball posession and find the average number of chances in these halves. This average will be a good estimator for the real estimated number of chances for this specific posession value.

Finding the function
From step 1, we now have the estimated number of chances for each ball posession. We will use the least squares estimation in order to find the wanted function. This will be done by weighting each ball posession according to the number of match halves with this posession.

Combining the two steps
Since it is known that the average function minimizes the squares error, the solution we have found is actually the least squares estimator to all the games. Therefore, we can find the function directly from the match halves without going through finding the estimations for each ball posession.

Data gathering
The two values that are improtant for use are ball posession in a half, and the number of chances for each team in this half. Since hattrick gives us information about ball posession at the end of each half, and since we assume that there are a total of five chances in each half, we should use match halves that answer these criteria:
- The number of reported chances in the half is exactly five
- The ball posession did not change during the half. Ball posession might change during a half if for example one of the players gets injured, if one of the players gets a red card, and so on. We will not use such halves for our estimations.

Results
Graph I - visualization

The number of chances in a half for a team against the ball posession aquired by this team:


Graph II - analysis
It is easy to see from the last graph that the relation between ball posession and number of chances is not linear. Still, the following graph represents a linear relation between the values.



Graph III - further analysis
One can easily see that most of the points in the previous graph lie far from the line, which indicates a big error.
The following graph represents a polynomial relation between the values:


The model line passes at about the center of mass of the points, which indicates a better estimation than the linear model results.
The last graph represented a polynomial model where we used a third degree. We also tried to use different degrees and check which one gives the minimal sum of squared difference value. The model with minimal value was declared as the best model and the one we used. The sum of squared differences against the degree of the model was this:

1 - 544
2 - 386
3 - 351
4 - 357
5 - 380
And according to these values, we selected the third degree as the most accurate model. Following is the function that was found to produce the smallest error:

If ball_posession <= 0.5
chances_prob = 4 * ball_posession^3
else
chances_prob = 1 - (4 * (1-ball_posession)^3)