Updated Park Factors Topic

However, a source of possible skewness would be if a city like Seattle was in a world where the other 31 ballparks were super hitter friendly. The data would possibly say that Seattle depresses offensive production more in that world compared to other worlds. The way to get around potential outliers like that would be to just get more data.

A definite source of skewness is observed in my actual Seattle Somethings data where my home ERA has been consistent over time but my away ERA hops around based on what stadiums are in my NL as owners come and go over time. In the past, this league had many hitters parks such as Colorado and Albuquerque, but more recently there is a HEAVY concentration of extreme pitchers parks which has done crazy things to league ERA
season home ERA away ERA
27 3.84 5.38 (boy those were the days)
28 4.15 4.87
29 3.66 4.69
30 3.14 4.43
31 3.22 3.85
32 3.14 4.01
33 2.99 3.52
34 2.88 3.56
35 2.96 3.40
36 2.72 3.54
37 2.45 3.23
38 2.62 2.98
39 2.18** 2.83
40 2.88 3.46
41 2.51 2.52
42* (109 games) 2.47 2.86

**Side-note, I scrolled thru as many leagues as I could to see if this was an all-time HBD record but I found one team in Iowa City that had one season of 2.13 and another in Burlington that had a season of 2.15 so unfortunately no :(

But as you can see, over time the league's stadium pool is skewing heavily towards the negative, so much so that in S41 my home-away were almost identical which means the league-average-PF that season was just insane.

Therefore, as alluded to previously this variability of stadium-pool from league to league and even within individual leagues itself is always going to be your primary source of error with data-mining home/away on HBD. Real life data good, HBD data bad
1/25/2018 4:40 PM (edited)
Posted by MikeT23 on 1/25/2018 3:46:00 PM (view original):
I'll disagree but it's your program.

We had the "Too many homers" debate years ago. It wasn't that there was too much power in the game, there were simply good hitters facing minor league quality pitchers in unbalanced worlds.
So there's a difference between a ratio of an event happening at home/away ballparks compared to the total number of times that event has occurred.

A city in a poorly run world might see a total of 800 runs produced at its home ballpark, while in games away from its home park there are 760 runs produced. That would result in an overall park factor of 1.053 (800 / 760) .

That same city in a well run world might see a total of 600 runs produced at its home ballpark, while in games away from its home park there are 570 runs produced. That would still result in an overall park factor of 1.053 (600 / 570).

Just because there is more offensive production in a poorly run world due to poor pitchers and fielders doesn't impact the ballparks influence on total run production. Park factors aren't concerned with the total amount of runs being produced, it's simply measuring the ratio of offensive production between games played at home compared to games away from home. If we were looking at the total number of occurrences of an event, the number of home runs in your example, then relative skill level of players involved would need to be controlled for.
1/25/2018 4:30 PM
Again, disagree.

Poor pitching will distort park factors more in a hitter's park than in a a pitcher's park. IOW, it will be hurt worse by positive PF than helped by negative PF.

An average ballplayer crushes little league pitching. He can only be so bad against MLB pitching.
1/25/2018 4:35 PM
Oops in my table I actually meant to do the PF decimal calculation
season home ERA away ERA
27 3.84 5.38 (boy those were the days)
28 4.15 4.87
29 3.66 4.69
30 3.14 4.43
31 3.22 3.85
32 3.14 4.01
33 2.99 3.52
34 2.88 3.56
35 2.96 3.40
36 2.72 3.54
37 2.45 3.23
38 2.62 2.98
39 2.18** 2.83
40 2.88 3.46
41 2.51 2.52
42* (109 games) 2.47 2.86

"PF" aka home runs / away runs (hERA / aERA)
27 0.714
28 0.852
29 0.780
30 0.709
31 0.836
32 0.783
33 0.849
34 0.809
35 0.871
36 0.768
37 0.759
38 0.879
39 0.770
40 0.832
41 0.996
42 0.864

So therefore, 41 - 0.996 is making a statement that "Seattle is a neutral park" despite the fact that we absolutely know for a fact that Safeco is intrinsically great for pitchers. But that season's away data includes Burlington, Portland, San Francisco, and Fresno

PF is a relativity statistic, so when the frame of reference (denominator) is a moving target you have no reliability. In real life the frame of reference has stayed almost exactly the same except for expansion teams, relocations, and stadium closures/openings. In general, data noise is demonstrably lower in real life than HBD
1/25/2018 4:56 PM (edited)
Posted by MikeT23 on 1/25/2018 4:35:00 PM (view original):
Again, disagree.

Poor pitching will distort park factors more in a hitter's park than in a a pitcher's park. IOW, it will be hurt worse by positive PF than helped by negative PF.

An average ballplayer crushes little league pitching. He can only be so bad against MLB pitching.
So I went back at the data I pulled to refresh my memory on which worlds I got the data from the original post from and it was actually close to the leagues you suggested I pull.

Moonlight Graham, Cooperstown, Capra, Big Sky Alumni, BBWAA, and around five others, none of which would typically be considered a poorly run world.
1/25/2018 7:37 PM
Posted by MikeT23 on 1/25/2018 12:20:00 PM (view original):
Not to poop on parades but there is a problem with data-collecting actual results from all worlds. There are more bad worlds than good. A good world will see total runs allowed around 700, a WHIP under 1.35 and an ERA under 4.10. What happens in a world with 800 RA, WHIP of 1.45 and ERA over 4.50 will skew the results. Those worlds are A) using terrible fielders because the dude can hit, B) using AAA-type pitchers because they don't know any better or C) both because there's no penalty for giving up 5+ runs a game and winning 48 times.

And, in all fairness, it's more fun to win 10-8 games.

If it's not too much of a pain in the ***, run the data on all of Coop and MG, 90 something seasons, and 3 random worlds(30 someodd seasons each to equal Coop+MG total seasons). The 30 someodd season worlds would have slower rolls and, I'm willing to bet, much worse pitching data.
So I went back and reread this post and noticed you didn't want to just pull data from the "good" worlds but to compare "good" worlds to "tard" worlds. So I pulled three random worlds that were around 35 seasons in and compared the park factors from them to the data in the original post.

There's no discernible difference between the factors that were calculated from both sets of data. I didn't do any T-Tests to statistically rule out any differences between the two sets of data because this was quick, but the average difference between the 92 different cities that WiS allows us to play in is .000904. That's less than 1/10th of 1 percent difference. Half of the parks fall within .0059, or about 6/10ths of a percent, of the original post's value. I could do more rigorous statistical tests, but I don't see a reason to at this point.
1/26/2018 1:39 AM
Fair enough. The data is what the data is. I still believe crap pitching will be hurt more in a hitter's park than it will be helped in a pitcher's park.
1/26/2018 9:09 AM
Posted by MikeT23 on 1/26/2018 9:09:00 AM (view original):
Fair enough. The data is what the data is. I still believe crap pitching will be hurt more in a hitter's park than it will be helped in a pitcher's park.
To be honest, I didn't bother to read everything in this thread. But this sentence alone I agree with, based on my experience.

1/26/2018 9:16 AM
The more data you gather, the closer to the middle the numbers will become. I'm thinking in terms of a few seasons, chimeara is using hundreds/thousands of seasons. If you roll a die 10 times, you might get 5 twos. If you roll it 6000 times, it's more likely that all 6 numbers will come up around 1000 times. You damn sure won't get 50% of one number.
1/26/2018 9:21 AM
In a normal population distribution, which is what park factors in WiS is likely to be, as Mike said, the more data you get the closer to the actual mean you'll get. In 40 seasons of world data that you pull, you are generating more than 1,300 individual seasonal park factors. Since park factors are a relative number since each season the composition of parks in a world is constantly changing, as PJ and myself noted, if you base your conclusions off of one or two seasons there is a high probability that whatever number you arrive at is wildly different than the true mean. In PJ's example, Seattle plays almost like a neutral park in a single season, but we know that over the course of 200+ seasons it's likely to decrease offensive production by about 10 percent. I'm sure I could look in the data and pull individual seasons where Colorado plays almost like a neutral park, and seasons where it boosts offensive production by 50 percent compared to the average park, but over the course of many seasons it'll regress to it's true mean.
1/26/2018 12:17 PM (edited)
◂ Prev 12
Updated Park Factors Topic

Search Criteria

Terms of Use Customer Support Privacy Statement

© 1999-2024 WhatIfSports.com, Inc. All rights reserved. WhatIfSports is a trademark of WhatIfSports.com, Inc. SimLeague, SimMatchup and iSimNow are trademarks or registered trademarks of Electronic Arts, Inc. Used under license. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.