As you may notice, we've been experimenting with the tactic evaluation, trying "Expected Goals or xG" based evaluation instead of "Points" based evaluation.
And it comes out "Expected Goals or xG" based evaluation isn't reliable, it has a serious issue.
At first look, "Expected Goals or xG" stat is more "attractive" because it fluctuates less than Points but it has a serious issue.
For example, when we evaluate tactics based on "Expected Goals or xG" then at the 1st place comes out a tactic that exploits the "Much Higher" defense line setting, which greatly increases "Goals For" but also increase "Goals Against" and hereby reduces the overall "Points" but the "xG Against" stat "sleeps" and "misses" the increased "Against" scoring changes that were happing due the "Much Higher" defense line, the "xG Against" stat doesn't increase enough to indicate the massive increase of "Goals Against" because for some reason the "xG Against" stat doesn't evaluate high enough those scoring chances but we can see that the "Goals Against" is massively increased and the Points is massively deceased due to that.
So, it has come out that "Expected Goals or xG" isn't reliable and soon, we are going to switch back to the Points based evaluating but we leave the "Expected Goals or xG" to be visible.
I’m not sure why you conclude that xG against is the problem here. In your example you could equally well say xG for is also below the actual goals scored and therefore the error is in the other direction. It seems very unlikely that the match engine can make a mistake that only affects goals scored by one of the two teams. After all every goal against one team is a goal for the other team. To trust actual goals more than xG you would have to believe that when the match engine says a shot is worth 0.1 xG it then calculates the result using a probability that is different than 10% chance of a goal. That would be a massive error by SI.
From what we know about real life, it is quite possible for actual points scored to be discrepant from xG difference in a season. But it is the xG difference that is the better predictor of future points totals. Of course in a simulation we should expect the two things to converge if the sample size is large enough. So are you concluding that this is not happening? If this is the case then I would agree that points is the only measure you can trust, but not that you have actually diagnosed the problem with xG only that something in the match engine is very broken.
MeanOnSunday said: I’m not sure why you conclude that xG against is the problem here. In your example you could equally well say xG for is also below the actual goals scored and therefore the error is in the other direction. It seems very unlikely that the match engine can make a mistake that only affects goals scored by one of the two teams. After all every goal against one team is a goal for the other team. To trust actual goals more than xG you would have to believe that when the match engine says a shot is worth 0.1 xG it then calculates the result using a probability that is different than 10% chance of a goal. That would be a massive error by SI.
From what we know about real life, it is quite possible for actual points scored to be discrepant from xG difference in a season. But it is the xG difference that is the better predictor of future points totals. Of course in a simulation we should expect the two things to converge if the sample size is large enough. So are you concluding that this is not happening? If this is the case then I would agree that points is the only measure you can trust, but not that you have actually diagnosed the problem with xG only that something in the match engine is very broken. Expand xG isn't a perfect metric either. And if you try to optimize for highest xG the tactic might not actually function as well as you'd like. It is possible to inflate xG by taking lots of shots that "never really have a chance to go in" and while statistically you may "deserve" to score a goal, two or three based on xG it's actually not realistic to expect it to happen week after week. On the other hand there are certain counterattacks with relatively low xG (lets say 0,26 or something) that more often than not end up being a goal.
At the end of the day given a large enough sample size I think points are a good metric to evaluate tactics. You have the option to sort by xG, GD, GF, GA as well if you prefer one those.
Lapidus said: At the end of the day, you win leagues not because you have the highest possession or xG but because you have the highest points so the points is king Expand
But in real life it’s been shown that xG difference is the better predictor of the future. The point of the testing is to know which tactic will be the most likely to be best when you use it in your game. The points gained in the testing is an inaccurate measure of this and does not decide who wins in your league, it is only a prediction with an error.
Yarema said: xG isn't a perfect metric either. And if you try to optimize for highest xG the tactic might not actually function as well as you'd like. It is possible to inflate xG by taking lots of shots that "never really have a chance to go in" and while statistically you may "deserve" to score a goal, two or three based on xG it's actually not realistic to expect it to happen week after week. On the other hand there are certain counterattacks with relatively low xG (lets say 0,26 or something) that more often than not end up being a goal.
At the end of the day given a large enough sample size I think points are a good metric to evaluate tactics. You have the option to sort by xG, GD, GF, GA as well if you prefer one those. Expand
The question I’m raising is whether the sample size is in fact enough. Clearly the thinking was to use xG difference because it gives more precision for any fixed sample size. And this is correct in real life too. I’m not trying to be annoying or rude, but I don’t see how you can just look at the results and decide that there is some particular flaw about how xG is counted. If xG difference is not accurately predicting the points winner as the number of simulation increases towards infinity then this is a massive, massive problem with the match engine. It is literally saying that when SI programs a shot to have a probability x of scoring then the either a) the probability is not x, or b) SI can’t correctly add up x over the game.
MeanOnSunday said: But in real life it’s been shown that xG difference is the better predictor of the future. The point of the testing is to know which tactic will be the most likely to be best when you use it in your game. The points gained in the testing is an inaccurate measure of this and does not decide who wins in your league, it is only a prediction with an error. Expand
Pal, this isn't real life, it's a pc game.
It could be that in Football Manager, xG stat is just a cosmetic thing, like thousand other things in the game, which are purply cosmetic.
As you might notice in Football Manager many crazy tactical approaches work but in real they would never work.
The game developers want that when play FM you to "feel" like it's real life so they add things from real life but there's no guaranty that those things work like in real life.
MeanOnSunday said: The question I’m raising is whether the sample size is in fact enough. Clearly the thinking was to use xG difference because it gives more precision for any fixed sample size. And this is correct in real life too. I’m not trying to be annoying or rude, but I don’t see how you can just look at the results and decide that there is some particular flaw about how xG is counted. If xG difference is not accurately predicting the points winner as the number of simulation increases towards infinity then this is a massive, massive problem with the match engine. It is literally saying that when SI programs a shot to have a probability x of scoring then the either a) the probability is not x, or b) SI can’t correctly add up x over the game. Expand The main thing why it works better in real life is sample size. The number of games in a season is a lot lower compared to shots. FM Arena sample sizes are much bigger and thus less prone to variance. Even in real life most leagues rather use xP than xG to sort teams, because it's a better representation of situational nuances.
As for xG difference, it's not quite so simple. As I said you can inflate the number artificially without really adding much to your point total by simply taking more shots even if they are bad on paper. There is also different variance depending what kind of shots you take. For example you can take 1 shot with 1 xG and end up scoring 1 goal, or take 100 shots at 0,01 xG per shot totalling 1 xG but it's actually very likely you'll score 0.
You can also create scenarios where tactics become win more in the sense that when they win they win big with high xG difference which adds up over the season but is actually not contributing to any additional points.
Tsubasa said: Pal, this isn't real life, it's a pc game.
It could be that in Football Manager, xG stat is just a cosmetic thing, like thousand other things in the game, which are purply cosmetic.
As you might notice in Football Manager many crazy tactical approaches work but in real they would never work.
The game developers want that when play FM you to "feel" like it's real life so they add things from real life but there's no guaranty that those things work like in real life. Expand
Yes, it could be SI presents a meaningless number. But why? irl xG is only an approximate prediction but in the game there must be a known probability of scoring for the match engine to work. So why would SI invent a different (wrong) number when they already calculated the correct number? It’s exactly because it’s a pc game that it should be even more predictive, because it is the causal mechanism of goals being scored.
Yarema said: The main thing why it works better in real life is sample size. The number of games in a season is a lot lower compared to shots. FM Arena sample sizes are much bigger and thus less prone to variance. Even in real life most leagues rather use xP than xG to sort teams, because it's a better representation of situational nuances.
As for xG difference, it's not quite so simple. As I said you can inflate the number artificially without really adding much to your point total by simply taking more shots even if they are bad on paper. There is also different variance depending what kind of shots you take. For example you can take 1 shot with 1 xG and end up scoring 1 goal, or take 100 shots at 0,01 xG per shot totalling 1 xG but it's actually very likely you'll score 0.
You can also create scenarios where tactics become win more in the sense that when they win they win big with high xG difference which adds up over the season but is actually not contributing to any additional points. Expand
This is just not correct. Larger sample size makes xG difference work better irl, and xP is a worse predictor than xG difference. There is a lot of research to show this. This is why the site was moving in that direction.
Your final point may be correct in FM but look back to the OP where it was the goals against that was being hypothesized as the source of the problem, not the goals for.
MeanOnSunday said: This is just not correct. Larger sample size makes xG difference work better irl, and xP is a worse predictor than xG difference. There is a lot of research to show this. This is why the site was moving in that direction.
Your final point may be correct in FM but look back to the OP where it was the goals against that was being hypothesized as the source of the problem, not the goals for. Expand It's a worse predictor but a better descriptor. The correlation of xP to actual points tends to be better than xGD to points (unless new research has come out lately). Also we cannot completely mirror the game and real life because real life suffers from smaller sample. We could argue if in a large enough sample size (say 10000 games ) xP would be a better predictor as well.
On top of that the player quality in FM Arena simulations are even unlike IRL. And since it's a game you can somewhat inflate xG numbers without adding any actual points.
As I said use whatever you feel is the best metric.
Hi,

As you may notice, we've been experimenting with the tactic evaluation, trying "Expected Goals or xG" based evaluation instead of "Points" based evaluation.
And it comes out "Expected Goals or xG" based evaluation isn't reliable, it has a serious issue.
At first look, "Expected Goals or xG" stat is more "attractive" because it fluctuates less than Points but it has a serious issue.
For example, when we evaluate tactics based on "Expected Goals or xG" then at the 1st place comes out a tactic that exploits the "Much Higher" defense line setting, which greatly increases "Goals For" but also increase "Goals Against" and hereby reduces the overall "Points" but the "xG Against" stat "sleeps" and "misses" the increased "Against" scoring changes that were happing due the "Much Higher" defense line, the "xG Against" stat doesn't increase enough to indicate the massive increase of "Goals Against" because for some reason the "xG Against" stat doesn't evaluate high enough those scoring chances but we can see that the "Goals Against" is massively increased and the Points is massively deceased due to that.
https://fm-arena.com/thread/18077-barbie-v36/
So, it has come out that "Expected Goals or xG" isn't reliable and soon, we are going to switch back to the Points based evaluating but we leave the "Expected Goals or xG" to be visible.
Cheers.
I’m not sure why you conclude that xG against is the problem here. In your example you could equally well say xG for is also below the actual goals scored and therefore the error is in the other direction. It seems very unlikely that the match engine can make a mistake that only affects goals scored by one of the two teams. After all every goal against one team is a goal for the other team. To trust actual goals more than xG you would have to believe that when the match engine says a shot is worth 0.1 xG it then calculates the result using a probability that is different than 10% chance of a goal. That would be a massive error by SI.
From what we know about real life, it is quite possible for actual points scored to be discrepant from xG difference in a season. But it is the xG difference that is the better predictor of future points totals. Of course in a simulation we should expect the two things to converge if the sample size is large enough. So are you concluding that this is not happening? If this is the case then I would agree that points is the only measure you can trust, but not that you have actually diagnosed the problem with xG only that something in the match engine is very broken.
At the end of the day, you win leagues not because you have the highest possession or xG but because you have the highest points so the points is king
MeanOnSunday said: I’m not sure why you conclude that xG against is the problem here. In your example you could equally well say xG for is also below the actual goals scored and therefore the error is in the other direction. It seems very unlikely that the match engine can make a mistake that only affects goals scored by one of the two teams. After all every goal against one team is a goal for the other team. To trust actual goals more than xG you would have to believe that when the match engine says a shot is worth 0.1 xG it then calculates the result using a probability that is different than 10% chance of a goal. That would be a massive error by SI.
From what we know about real life, it is quite possible for actual points scored to be discrepant from xG difference in a season. But it is the xG difference that is the better predictor of future points totals. Of course in a simulation we should expect the two things to converge if the sample size is large enough. So are you concluding that this is not happening? If this is the case then I would agree that points is the only measure you can trust, but not that you have actually diagnosed the problem with xG only that something in the match engine is very broken.
xG isn't a perfect metric either. And if you try to optimize for highest xG the tactic might not actually function as well as you'd like. It is possible to inflate xG by taking lots of shots that "never really have a chance to go in" and while statistically you may "deserve" to score a goal, two or three based on xG it's actually not realistic to expect it to happen week after week. On the other hand there are certain counterattacks with relatively low xG (lets say 0,26 or something) that more often than not end up being a goal.
At the end of the day given a large enough sample size I think points are a good metric to evaluate tactics. You have the option to sort by xG, GD, GF, GA as well if you prefer one those.
Lapidus said: At the end of the day, you win leagues not because you have the highest possession or xG but because you have the highest points so the points is king
But in real life it’s been shown that xG difference is the better predictor of the future. The point of the testing is to know which tactic will be the most likely to be best when you use it in your game. The points gained in the testing is an inaccurate measure of this and does not decide who wins in your league, it is only a prediction with an error.
Yarema said: xG isn't a perfect metric either. And if you try to optimize for highest xG the tactic might not actually function as well as you'd like. It is possible to inflate xG by taking lots of shots that "never really have a chance to go in" and while statistically you may "deserve" to score a goal, two or three based on xG it's actually not realistic to expect it to happen week after week. On the other hand there are certain counterattacks with relatively low xG (lets say 0,26 or something) that more often than not end up being a goal.
At the end of the day given a large enough sample size I think points are a good metric to evaluate tactics. You have the option to sort by xG, GD, GF, GA as well if you prefer one those.
The question I’m raising is whether the sample size is in fact enough. Clearly the thinking was to use xG difference because it gives more precision for any fixed sample size. And this is correct in real life too. I’m not trying to be annoying or rude, but I don’t see how you can just look at the results and decide that there is some particular flaw about how xG is counted. If xG difference is not accurately predicting the points winner as the number of simulation increases towards infinity then this is a massive, massive problem with the match engine. It is literally saying that when SI programs a shot to have a probability x of scoring then the either a) the probability is not x, or b) SI can’t correctly add up x over the game.
MeanOnSunday said: But in real life it’s been shown that xG difference is the better predictor of the future. The point of the testing is to know which tactic will be the most likely to be best when you use it in your game. The points gained in the testing is an inaccurate measure of this and does not decide who wins in your league, it is only a prediction with an error.
Pal, this isn't real life, it's a pc game.
It could be that in Football Manager, xG stat is just a cosmetic thing, like thousand other things in the game, which are purply cosmetic.
As you might notice in Football Manager many crazy tactical approaches work but in real they would never work.
The game developers want that when play FM you to "feel" like it's real life so they add things from real life but there's no guaranty that those things work like in real life.
MeanOnSunday said: The question I’m raising is whether the sample size is in fact enough. Clearly the thinking was to use xG difference because it gives more precision for any fixed sample size. And this is correct in real life too. I’m not trying to be annoying or rude, but I don’t see how you can just look at the results and decide that there is some particular flaw about how xG is counted. If xG difference is not accurately predicting the points winner as the number of simulation increases towards infinity then this is a massive, massive problem with the match engine. It is literally saying that when SI programs a shot to have a probability x of scoring then the either a) the probability is not x, or b) SI can’t correctly add up x over the game.
The main thing why it works better in real life is sample size. The number of games in a season is a lot lower compared to shots. FM Arena sample sizes are much bigger and thus less prone to variance. Even in real life most leagues rather use xP than xG to sort teams, because it's a better representation of situational nuances.
As for xG difference, it's not quite so simple. As I said you can inflate the number artificially without really adding much to your point total by simply taking more shots even if they are bad on paper. There is also different variance depending what kind of shots you take. For example you can take 1 shot with 1 xG and end up scoring 1 goal, or take 100 shots at 0,01 xG per shot totalling 1 xG but it's actually very likely you'll score 0.
You can also create scenarios where tactics become win more in the sense that when they win they win big with high xG difference which adds up over the season but is actually not contributing to any additional points.
Tsubasa said: Pal, this isn't real life, it's a pc game.
It could be that in Football Manager, xG stat is just a cosmetic thing, like thousand other things in the game, which are purply cosmetic.
As you might notice in Football Manager many crazy tactical approaches work but in real they would never work.
The game developers want that when play FM you to "feel" like it's real life so they add things from real life but there's no guaranty that those things work like in real life.
Yes, it could be SI presents a meaningless number. But why? irl xG is only an approximate prediction but in the game there must be a known probability of scoring for the match engine to work. So why would SI invent a different (wrong) number when they already calculated the correct number? It’s exactly because it’s a pc game that it should be even more predictive, because it is the causal mechanism of goals being scored.
Yarema said: The main thing why it works better in real life is sample size. The number of games in a season is a lot lower compared to shots. FM Arena sample sizes are much bigger and thus less prone to variance. Even in real life most leagues rather use xP than xG to sort teams, because it's a better representation of situational nuances.
As for xG difference, it's not quite so simple. As I said you can inflate the number artificially without really adding much to your point total by simply taking more shots even if they are bad on paper. There is also different variance depending what kind of shots you take. For example you can take 1 shot with 1 xG and end up scoring 1 goal, or take 100 shots at 0,01 xG per shot totalling 1 xG but it's actually very likely you'll score 0.
You can also create scenarios where tactics become win more in the sense that when they win they win big with high xG difference which adds up over the season but is actually not contributing to any additional points.
This is just not correct. Larger sample size makes xG difference work better irl, and xP is a worse predictor than xG difference. There is a lot of research to show this. This is why the site was moving in that direction.
Your final point may be correct in FM but look back to the OP where it was the goals against that was being hypothesized as the source of the problem, not the goals for.
MeanOnSunday said: This is just not correct. Larger sample size makes xG difference work better irl, and xP is a worse predictor than xG difference. There is a lot of research to show this. This is why the site was moving in that direction.
) xP would be a better predictor as well.
Your final point may be correct in FM but look back to the OP where it was the goals against that was being hypothesized as the source of the problem, not the goals for.
It's a worse predictor but a better descriptor. The correlation of xP to actual points tends to be better than xGD to points (unless new research has come out lately). Also we cannot completely mirror the game and real life because real life suffers from smaller sample. We could argue if in a large enough sample size (say 10000 games
On top of that the player quality in FM Arena simulations are even unlike IRL.
And since it's a game you can somewhat inflate xG numbers without adding any actual points.
As I said use whatever you feel is the best metric.