A philosophical question regarding simulations Topic

This post has a rating of , which is below the default threshold.
This post has a rating of , which is below the default threshold.
This post has a rating of , which is below the default threshold.
This post has a rating of , which is below the default threshold.
Posted by contrarian23 on 4/29/2016 9:33:00 AM (view original):
Without crunching the numbers, it's very unlikely for a hitter in the .290-.310 range to hit .390 across 450 at bats. It's too many standard deviations away from the mean. The .390 batting average in 1980 is powerful evidence that at that moment, Brett's true ability was quite a bit higher than .300.

I just ran the simulation assuming an average of .305. His output across 100 seasons ranged from .241 to .363.

This is partially why I don't think leagues based on career averages are the answer. You definitely need to mirror the rise and fall of a player's ability over time.
Hmmm, yeah, there is likely to be a little more to it. I did mention normalization, but I'm not convinced the differences are great enough to matter that much.

Of course, just because something is "very unlikely" does not mean it is impossible. Is it more likely that he's a .330 hitter who hit "under" his level most of his career, or a .305 hitter who had one "very unlikely" season?

What about hitters with career years that weren't quite as outrageous? Just how far away are some of these players?

This would in theory be a bit before its time in Brett's case, but could steroids or other drugs have an impact on the numbers here?
4/29/2016 9:56 AM
This post has a rating of , which is below the default threshold.
This post has a rating of , which is below the default threshold.
Brett's BABIP in 1980, if I am calculating it correctly, was .368.

Here's an amazing stat...he homered 24 times...and struck out 22 times.
4/29/2016 10:52 AM
This post has a rating of , which is below the default threshold.
Trying to identify the true talent level of an MLB player is something that analytics people both employed by MLB teams and analytics people who are independent and work at sites such as BP or Fangraphs have been trying to refine for years. contrarian, I think what you are saying is absolutely correct--Brett was not a true talent .390 hitter in 1980. He overperformed due to random variability. Projection systems such as ZIPS and PECOTA use the idea of regression to the mean (in this case, bringing Brett's BA back to earth) to project future performance. But these projections are just averages of what might happen; overperformance and underperformance of true talent level happens all the time. This site takes an observed occurrence and treats it like the mean. I would probably be way to inexact to have a simulation engine like this estimate true talent level based on a string of seasons and use that as the mean.

I often bring this up to my classes at school--the grade you get is a data point. This is an estimate of your true "talent level". I joke and tell them I want to start of movement of going away from exact grades and move toward a confidence interval that will estimate your true "talent level" or grade that accounts for random variation.
4/29/2016 1:46 PM
Here's Bill James, today, on this issue:

I always wondered what the hell happened to Carl Yastrzemski in 1969: 40 home runs, walk and strikeout rates basically in line with what he was doing in his surrounding years but a .255 batting average. Lo and behold, he had a .241 batting average on balls in play. Is that just a vicious run of bad luck or was something else going on that year?
Asked by: kingferris
Answered: 4/29/2016
It's just a vicious run of bad luck; that's all it is. But it presents an interesting problem for a Game Maker or model maker: Do you model the underlying skills, or do you model the results? And, in the end, you will find that you HAVE to have respect for the actual results, or the entire process degenerates in your hands.
4/29/2016 2:47 PM
AKlopp, thank you for your post. A very clear exposition of what I was getting at.

Yes, I know MLB teams and other analysts work on this problem, typically from two perspectives:
1.) Descriptive - what did happen, and how valuable was it?
2.) Predictive - what will happen in the future (and how much should we pay for that)?

With a simulation, we have to decide which of these two options we prefer:
--Option 1: Treat 1980 Brett as a .335 hitter, in which case there is very little chance that he will hit .390, but his range of outcomes will be very reasonable
-- Option 2: Treat 1980 Brett as a .390 hitter, in which case there is a much greater chance that he will hit .390, but also a significant chance that he will deliver a completely unreasonable level of performance

And of course we could also choose to be anywhere between those two extremes, but all that does is ask the question in a different way...now we have to weight these two scenarios and decide which one we want to carry more weight in our algorithm.

I generally prefer Option 1, though I understand that not everyone would agree (see Bill James's post quoted immediately above), and I recognize that it opens a gigantic can of worms.
4/29/2016 2:53 PM
Posted by contrarian23 on 4/29/2016 2:47:00 PM (view original):
Here's Bill James, today, on this issue:

I always wondered what the hell happened to Carl Yastrzemski in 1969: 40 home runs, walk and strikeout rates basically in line with what he was doing in his surrounding years but a .255 batting average. Lo and behold, he had a .241 batting average on balls in play. Is that just a vicious run of bad luck or was something else going on that year?
Asked by: kingferris
Answered: 4/29/2016
It's just a vicious run of bad luck; that's all it is. But it presents an interesting problem for a Game Maker or model maker: Do you model the underlying skills, or do you model the results? And, in the end, you will find that you HAVE to have respect for the actual results, or the entire process degenerates in your hands.
This is a difficult choice. Some players of OOTP play "stats only" - no ratings of skills, just statistical probability based on performance over a 3-year period, others shut the stats off and play only ratings based on abilities - contact hitting, pitching "stuff", movement, control etc.

Those who play with fictional players, like those who play here at WIS on HD Dynasty, I assume have to rely only ratings, since the stats are themselves generated by the sim, and not real life stats anyway.

So SIM baseball here at WIS we play stats from one season (not even an averaging or weighting of three or so), while at HD Dynasty it is all ratings.

None of these is a perfect solution and I the real answer is NOT what kind of game should a simulation creator make, but what kind of game do players want to play - historical replays of individual seasons, ongoing "progressives" that follow whole careers, alternative baseball worlds in which players' talents, skills, and performances vary significantly from real life, fictional baseball leagues and players creating a whole different baseball world, alternative historical leagues (in which Satchel Paige and Oscar Charleston are the best players ever), individual best ever team match-ups, random debut leagues in which players from all eras enter at different ages in a different time than they did historically, etc. etc.

Each of these can be a legit way to play a baseball simulation. The more options the better. And any of the parameters - skills, age-based performance, historical accuracy with real life stats, individual season replays, etc. can be fine, depending on what the game is meant to achieve.
4/29/2016 2:58 PM
Fascinating lines of thought. I think a good mathematician could build an algorithm for simulation that would treat Brett's .390 for what it was, an anomaly on the end of the curve of the ranges of predicted performance. A good algorithm, it seems to me, would start with career averages and include expected performance by age around the mean, and, finally, actual performance built in as upper limit areas in those rare cases like Brett's and Norm Cash's where actual performance is so many standard deviations from the norm. They should have called him Deviation From The Norm Cash.
I don't mind a Brett who hits .375 in a 1980 Progressive, I don't want one who hits .425.
4/29/2016 3:39 PM
And naturally any simulation algorithm would then have to figure in the other variables: pitcher facing, ballpark effects, fatigue.... still should be able to come up with a reasonable mean percentage for each possible result from a strikeout to a home run (assuming an at bat).....a more elaborate version of the dice game many of us have played with a random throw for each plate appearance into this range of statistical parameters.
4/29/2016 4:08 PM
1234 Next ▸
A philosophical question regarding simulations Topic

Search Criteria

Terms of Use Customer Support Privacy Statement

© 1999-2024 WhatIfSports.com, Inc. All rights reserved. WhatIfSports is a trademark of WhatIfSports.com, Inc. SimLeague, SimMatchup and iSimNow are trademarks or registered trademarks of Electronic Arts, Inc. Used under license. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.