crazy box score Topic

this may be the most educated and reasonable discussion I have ever witnessed on this board. Kudos, jskenner. I agree with you wholeheartedly about the variability and how WIS seems to diverge significantly from what would be statistically probable, more often than should be expected. The weighted coin is an apt analogy. A bell curve would still apply without an added variable.
12/1/2009 12:21 PM
Thanks, colorblind. By the way, that's a stud Maryland team you've built.
12/1/2009 1:11 PM
Quote: Originally posted by jskenner on 12/01/2009Great points, billyg. Re: the HCA factor you mention, I believe this DOES account for some of what where discussing. But I also see results, on a reasonably regular basis, where the ROAD team gets the good end of such an odd result. It may be that in some cases, HD uses HCA to put a negative weight for the home team, leading to such unexplainable results for the visiting team. I think you'd mentioned that before, and I think this is reasonable.

yeah, i agree completely. its really tough to separate out luck and hca, its fairly easy to get a handle on the rest but how do you know with hca what luck was? i was trying to think of a way to convince others there was a random factor there. well, its damn near impossible with all our games home and away. i tried to think of a process, where you'd predict the outcome, take away your guess of HCA, and come up with a "luck factor", but i still don't see any indicator of if HCA has a random factor outside of the shape of the bell curve, and that is too subjective.

anyway, to me its got to come down to neutral court games. im not sure if ive played 200 or what but ive also put more focus on those 200 than the rest of the games i've played combined... so i feel my handle on those games is just as good as any. but, almost beyond shadow of a doubt, i can predict the outcome of a neutral court game much more reliably than a regular season game. i should have played enough regular season games by now to have a decent feel on the average weight of HCA, which, without random factors, is a constant. so, the only explanation to me, is HCA has got to be multiplied by a random every game, possibly pushing the advantage negative. there is also the fact that the teams you play on neutral court are generally NT caliber teams and it may be easier (less variance or volatility) to guess the outcome there than for the 250 rpi team where the ratings gaps are bigger. but, if you just consider the same caliber of teams in the regular season, you still get a plenty big enough sample, and i feel the exact same way.

well, i know that doesn't prove anything, but maybe some of the long term coaches can think about their neutral court games, and see if they feel similarly. i'd really be curious what people felt there. is there less volatility than in the regular season? my experience is yes, by a wide margin, but im also paying more attention then, and my team is more refined, so its tough to say for certain. still, when i lose an away game in d1 with a really good team, it doesn't worry me. it used to, but not anymore. in no way does losing to a team you are a 15 point favorite against on the road by 15, a couple times a season even, preclude you from winning the championship (possibly in dominating fashion). HCA is just not a big enough factor ON AVERAGE to justify that. but at its worst, HCA is devastating. that is my theory, that's why my metric for "does my d1 team have any hope of winning the championship" is really a look for a 0 in the home losses category. maybe 1 is ok, but, its a lot more worrying than 4 away losses.
12/1/2009 1:51 PM
Very good thoughts, gillispie. Very well-informed analysis, and much further along as a theory than mine. I'm glad you're on "my" side. ;)
12/1/2009 2:02 PM
This is a fascinating discussion! Kudos to Gill and Jskenner for putting the time in!!
12/1/2009 2:08 PM
Thanks, emy. And to clarify further, I contend that with the top C in my example (who SHOULD average 60% shooting, where the engine causes him to average only 52%), the variability is increased by CHANGING that 52% from game to game. One game, depending on gillispie's theorized HCA positive/negative factor, it's 57%, and another it's 41%. This is a HUGE oversimplification, mind you. The way it might actually work is that all inputs are affected by the factor. Sometimes, your team gets an advantage, sometimes a disadvantage. So instead of getting just a pure calculation that is the same everytime, you get an additional factor that changes the expected rates of performance, up or down, or sometimes very little or not at all.

Let's say (to take one small example) that in calculating a C's chance to hit a basket, the calculation is:

Chance of score = (Scorer ability / Defender ability) * 0.5 = (LP + ATH + 0.5*SPD + IQ) / (DEF + ATH + 0.5*SPD + IQ) * 0.5. This would produce an output chance. Let's say Scorer / Defender = 1.2, so chance of make is 60%. And when S/D = 0.8, chance is 40%. With such consistent calculations, unaffected by some sort of HCA randomizer we are discussing, those chances would produce (I contend) much tighter, machine like game outcomes. My theory is that this randomizer is included in each calculation, and that it may be the same each time it's used for a particular act (shoot, passing, rebounding, stealing, etc.), so that the calculation above would become (S/D) * 0.5 * random factor. It this were the case, it could produce the greatly variant outcomes we see. gillispie thinks the random factor might be different for each "area" of the game (inside shooting, outside shooting, rebounding, ball handling, passing, etc.) whereas I think it may be the same for all aspects within one team's game performance. But the key that I think we agree on (in theory) is that such factor or factors change from game to game, but they are the SAME within a game. If they changed randomly within the game, this would actually produce fairly tight game to game performances, since the random factor or factors would tend to cluster around a consistent average for a game, instead of being set high/low for the entire game.
12/1/2009 2:34 PM
Warning: long post ahead! Actual numbers and statistics included!

Very interesting discussion. It seems like a large part of the issue is whether or not a simple random number generator can account for the large variation we see in HD (By simple, I mean that just the stats we see, or know exist but don't know - like HCA, go into the calculation and that there is not anything like an "F.U. factor" that just randomly decides that one team is going to get relative boost.)

So I decided to write a program that would calculate team scores for a bunch of games where each possession is simmed individually. As a starting point I used the stats from my Stonehill, Crum team, as well as our summmed opponent stats to come up with representative averages.

http://wisjournal.com/hd/TeamProfile/GameLog.aspx?tid=8343

For context, this is pretty good team, top 10 RPI and ranking, 14-3 after 17 games. I'll be using the stats as they are now as my inputs. I don't know enough about how individual events are simmed (a particular shot, who get a board ...) so my approach is to estimate an expected result for each possession.

Here are the input stats:
Through a long formula I don't want to type in here I have determined that Stonehill games average 68.56 possessions per team (rounded to 69 for sim purposes). Stonehill scores 1.025 pts/poss. Looking at my goal totals, I score a 2 bucket on 27.3% of possessions, a 3 on 10.3% and average 17.1% FT made/poss. The FT numbers are a bit trickier since there can obviously be 1-3 FT made per FT scoring possession.

What I really want to know is my chance of scoreing 1,2 or 3 on each possession (I'm ignoring the relatiely rare 3 + 1 FT scenario). So I'll massage the numbers to account for this while keeping the 1.025 pt/poss. I just increased the 2 and 3 % slightly and decreased the FT %. I eventually came up with chance of scoring 1="6.5%," 2="30%," 3="12%" on any individual possession. This gives me a total of 1.025 pts/poss. I then did the same thing for my average opponent, 1="6.5%," 2="28.75%," 3="8%." Total 0.88 pts/game.

Obviously these are estimates and to do better I would need to go over every play-by-play line-by-line. I don't want to do this, but if someone else does, more power to you.

Then I generated a uniformly random number (all calculations done in Matlab, by the way) for each possession to see if each team scored 0, 1, 2 or 3 (again, ignoring the rare 4 pt play.) I ran this sim for 50000 games and collated the results.

1) The average points per game match the expected numbers: HD: Stonehill 70.3 pts/game, MatlabSim 70.7 pts/game.
Opponent 60.4 pts/game, MatlabSim 60.7/game. The matlab sim numbers are slightly higher because each team had 69 rather than 68.56 possessions in the sim. This doesn't mean anything really, except that if the results didn't match something would have to be wrong for my program.

2) The standard deviation of each team roughly matches that of HD: Stonehill 10.6, 9.47, Opponents 8.98, 8.96. The HD values are based on only 17 games so I'm not surprised that there is a diffence in the Stonehill number. Not to mention that playing different teams should increase the randomness.

For both teams the standard deviation is pretty close to average pts/sqrt(possessions), as expected.

Take away message: almost 1/3 of the time, your team will not score within 10 points of its average. Almost 1/6 of the time, or approx. 5 games per season, your team will score significantly less than its season average.

3) Given 50000 games, the biggest win was by 65 and the worst loss was by 45. This seems like a lot of games but HD sims this many games in about 2 weeks.

4) Stonehill, which is on average 10 points better, wins only about 76% of the time.

5) I looked at sets of 30 simmed games to see what we could reasonably expect out of a 30 game season. In one typical season, Stonehill ended with a record of 22-8, largest margin of victory of 35, worst loss of 10, highest score 87, lowest score 55.

Looking at about fifty 30 games series, the record varies between 27-3 to 18-12.

Finally, I know there are a lot of things I didn't take into account (no HCA, using an average opponent glosses over many factors including team variation, team growth over the season ...) but I'm not trying to replicate the HD sim.

But I think that this shows that, at the very least, large variations in HD can be accounted for by random number generators applied fairly and consistenly to the team. Whether or not the number of large variations is in line with my results I don't know. It would take a large study of HD stats to really determine this.

If anyone would like to see my matlab code, send me a site mail. My apologies that it is in matlab and not some more popular format.
12/1/2009 5:17 PM
my favorite sim result of all time had two fairly equal teams with a halftime score of 50-2. Then the team behind stages a rally and has a 45-7 edge in the second half.

Not quite your everyday 57-47 final.
12/1/2009 5:34 PM
spintronic - interesting analysis. thanks for the work!

anyway, i thought the almost 1/3rd of the time, your team will be outside 10 points of its average was pretty interesting. i have often felt 20 points was a safe figure for the 95% confidence interval, on the variation off of the expected value for the differential in score between two teams - if memory serves, that is the confidence interval for 2 std deviations, and 1 is 67%... is that right? however, that was kind of a mix of regular season and neutral court. neutral court i would put it closer to 16 points a game, or 8 for 1 std deviation. also, i was considering something a little different than you, the variation on score differential vs expected, not variation on 1 teams points per game. i think it might be the same as what you looked at though, mathematically? i thought about it intuitively and am half convinced it is the same but didn't do the integration, if you could think that one over i'd appreciate it :)

also, i agree that large variations can be accounted for by the RNG. but, that just means any single case can be explained, not all of them, as you mentioned. i don't think the variance of a team following the variance of all teams means anything, the same random factors would be generated for each game for each team without discrimination (theoretically), so i would expect teams to follow the same pattern. really, i feel the one stat you mentioned that really captures something is that 10 points being a roughly 2/3rd confidence interval. even with the fudging, guessing the 3 point plays % etc. is probably a decent approximation.

now, you might think, hey 8 points and 10, that is pretty close with all the fudging/simplification of the analysis and the countless imperfections in gillispie's perception of the sim engine. but, i say that is vastly different, because in your sim, there were about 70 random numbers generated per team (one per possession). in the sim we play, there are several fold more, presumably. randoms for when you will get steals, who will get rebounds, if fouls will occur, who will get the ball, and so on. more random number generations decreases volatility, so IMO, this should result in a vastly higher volatility in your sim than my perception, i.e. your standard deviation should be way higher than mine. but its not, which to me, suggests there are additional random factors. as well, there's certainly an issue of weight on the randomizations, the random on a steal is not as important as the random on a 3 point shot, i would think. i'd be curious what figures you get when you run two to five times as many possessions per team (to approximate, roughly, the other randomizations), i think the number of additional randoms in the real sim is probably in that interval, somewhere.

maybe my 16 points a game = 95% confidence interval on a neutral court figure is off, im curious if any vets have an opinion on that? ive always thought of what it as what is from the question, what is your chance to win if your average differential is <blank>. well, this is % chance of winning, which is half the difference of the std deviation from 100 (so at the 2 std deviation mark its a 97.5% chance of winning, not 95% for the confidence interval). i tried to adjust for that with 16 but who knows how successful i was, my general rule of thumb is if you are a 20 point differential in the NT you are winning at least 95% of the time, probably easily (meaning its probably more like 97 or 98%). i think that translates to about 16 points for 2 std deviations but that is a number that... what would you call it, in software development when people ask how i picked a number i just kind of picked, i wave my finger in the air like i'm trying to gauge the wind. i guess you can say i pulled it out my ***, in an educated guess sort of way. really, i'd love to know if other coaches could wager a guess at the figure, and see if we have some sort of common consensus.
12/1/2009 9:02 PM
Good stuff, spintronic. I have to think about this in relation to the random factor theory. Thanks for doing all that work.
12/2/2009 9:23 AM
Thanks guys.

Just a few more thoughts here:

1) Yes, 68% is for one-sigma and 95% is for two-sigma deviations (two-sided). The 1/6 figure is the one-sided one-sigma deviation. So to take one of my thoughts a little further, on average, once or twice a season you should expect to be + or - 20 points off your average. And pretty close to once a season, your team will have one of those head-scratcher games where you score 20 or more points below average and shoot in the low 30%s.

2) Most of my experience is at D2 where HCA seems to be less important compared to D1, according to forum wisdom.

3) Definitely the HD sim program uses a lot more random numbers per game than I did. Some of the resultant outcomes are not independent so we can't just assume that more rng calls will reduce the spread in final scores. Some rngs might not really affect the final score. For example, I don't know how assists are calculated. Does one player have the ball, decide between shooting or passing? Or does a player make a shot and then the HD sim calls another rn to see if there should be an assist? Or, does it really matter which person gets the rebound on your team? I don't know how the sim does these things so it's hard to say anything concrete about it. There was a thread a long time ago, maybe 6 months to a year, where seble or someone else from wis talked about the logical flow of decision making process but I can't find it right now.

I'm also not sure how I would increase the random number calls in my program while keeping the scores consistent with actual team data. If we just multiplied the number of possessions by N but then renormalized then the standard deviation would decrease by sqrt(N). That would make both HD stds calculated above go out of whack.

4) Humans are not very good at correctly perceiving random events. Our brains are built to be extremely good at seeing paterns in events, but a side effect is that we sometimes see patterns where there are none. To the scientist in me, this just means that I don't really trust my intuition about these types of things.

To sum up, I don't really know but my opinion is that there are no "unknown unknowns" in the game.
12/2/2009 2:05 PM

Quote: Originally posted by jskenner on 12/01/2009They do happen, but not at anywhere near the rate they do in RL.

Can you back up this speculation about the variance of outcomes from evenly matched teams in RL? I think you would be quite surprised at the distribution of scores for teams where the true mean for MoV results is very near zero. If anything, I would have guessed that variance may be understated in HD and that the number of outlying performances is severely understated by the sim.
12/3/2009 2:06 AM
◂ Prev 12
crazy box score Topic

Search Criteria

Terms of Use Customer Support Privacy Statement

© 1999-2025 WhatIfSports.com, Inc. All rights reserved. WhatIfSports is a trademark of WhatIfSports.com, Inc. SimLeague, SimMatchup and iSimNow are trademarks or registered trademarks of Electronic Arts, Inc. Used under license. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.