First of all, Happy Easter to those of you who celebrate it.
Now, if you all remember, we had a little controversy over range factor in the WIS in another thread a while ago. I strongly defended WIS on the basis that they use an official, recognized stat, range factor, originally developed by Bill James and adopted by MLB, which consists of PO+A/9 Innings and then compare that stat for each player with those of other players in the same season, giving a letter grade to range. This means range is clearly defined for players with regard to others from the same season, but need to be mentally normalized when comparing the same letter grade across baseball eras.
And I stand by that defense in the sense that WIS is using a standard system used by MLB and by Sabermetricians.
BUT...reading my copy of Bill James' book "Win Shares" which I finally found a copy of and which arrived a couple of days ago, I see that he makes significant corrections to standard, simply range factor as defined above. We need to think these through, not so much for WIS play, since as many here point out the rules here are what they are and to win you need to win by the way this system works, but rather to think about how realistic play is compared to RL baseball and how it might be improved someday.
So that said, let’s note some of James’ corrections of the limitations of Range Factor as a standard.
For one thing, he notes that Richie Ashburn anomalously figures as having several of the top seasons ever by an OF for Putouts. But he finds that Ashburn’s team’s pitching staff threw a record number of fly ball outs and a record low of ground ball outs. So, one point for the critics here at WIS (with whom I was arguing) in noting the lack of realism in merely counting up PO and A and calling it range factor. When you adjust for this, Ashburn was not better than Willie Mays, though he was not that far behind.
Second, James uses the examples of Bill Buckner and Steve Garvey to discuss un-assisted putouts at First Base. Buckner and Garvey were competing for the Dodgers’ first base job in the early 70s – Buckner’s knees did not allow him to play outfield anymore and Garvey’s arm (about which more below) was too weak for third base.
Some statistical schools show Buckner as a better fielder than Garvey because he had more assists. Except, as James points out, assists by a First Baseman are largely about getting the ball hit to you and then either going to the bag ahead of the runner on your own (why throw the ball risking throwing it away when you can just step on the bag?) as Garvey usually did, or throwing the ball to the pitcher who runs over to cover (as Buckner, his knees not allowing him to run, nearly always did, sometimes to pitchers’ chagrin). Since Buckner could not get to bag ahead of the runner, his assists are high, an illusion due to his inability to fully play the position on that play. Garvey, as almost anyone who actually watched a game could have told you, was a better fielder.
Further, left handed pitching results in the use of more right-handed batters by the other team, and so more plays by shortstops and third basemen and fewer opportunities for either un-assisted putouts or 3-1 assist plays, and needs to be corrected for. James’ numbers show that a ball is hit to first base a little over one time per game historically and that for every ten innings a left-handed pitcher is on the mound, one fewer ball is hit to first base.
So, again, on the details, a point for the critics here about realism in the WIS system, and in the sabermetrics stats.
Now a fun and interesting one: first basemen’s arms !
This might seem a minor issue, but James show that it is not. On nearly every play where a First Baseman has an assist, that is NOT 3-1 at first base, it is a critical play:
- throwing to shortstop to start a double play (AMAZING stat here: from 1979-1983, Keith Hernandez fielded a ground ball with a runner on first and fewer than tow outs 206 times, and started 49 3-6-3 or 3-6-1 double plays – 20 more than any other first baseman. He started a double play 24% of the time in that situation, while Steve Garvey for example fielded 113 ground balls in that situation over those 5 years and started exactly 3 3-6-3 double plays – 2.7% !).
- throwing to the catcher to get a runner from third out at home
- throwing to SS to get a force and keep a runner from advancing.
- Throwing to the 2B at first on a bunt play (a range factor issue not calculated into Range Factor)
- And though rare, throwing across the diamond to get a runner advancing to third base.
Clearly these are all important moments in a game. So a first baseman’s arm – calculated as the number of:
Double Plays by the Team’s shortstops
Plus Assists by the First Baseman
Minus Double Plays by the Second Baseman (since the 1B virtually never throws to the 2B to start a DP, but rather to the SS)
Minus putouts by the Pitchers
Plus or minus an adjustment for left-handed pitchers (and hence chances at 1B).
The ratio of DPs at SS rather than 2B is an indicator of the arm of the 1B: in 1990 the Cubs (Mark Grace at 1B), Yankees (Don Mattingly) and Rangers (Rafael Palmeiro) had the highest ratio between SS DPs and 2B DPs, while the lowest ratios were for the Angels (an injured Wally Joyner), Twins (Hrbek), and White Sox (Carlos Martinez and Frank Thomas).
In 1980 these ratios were highest for Boston (Tony Perez) and St. Louis (Keith Hernandez), and lowest for Baltimore (Eddie Murray) and Philadelphia (Pete Rose).
In 1970 they were highest for Boston (Yaz and George Scott ) and lowest for Washington (Mike Epstein).
So the better fielding 1Bs consistently stand out using this seemingly indirect way of gleaning 1B arm factors.
In 1995, First Baseman assists minus Pitchers putouts were highest for Houston (Bagwell – 45 such assists !), NY (Mattingly) and Toronto (Olerud). The lowest total - 2 – was for Chicago (Frank Thomas) .
In 1985 highest was total for 1B assists minus pitchers’ putouts was by the Mets (Keith Hernandez – 42), and the lowest was minus 3 (-3) by Steve Balboni.
So this is clearly a useful stat.
Steve Garvey is shown to have had good range at 1B (un-assisted putouts) but a poor arm (very few assists minus pitchers’ putouts).
Keith Hernandez, Don Mattingly, Wes Parker, Jim Spencer etc. are shown to have had good arms at 1B. So in some future update at WIS, as well as in OOTP if they don’t already incorporate this stat (I doubt it but don’t know, need to ask on their forum), including such stats as un-assisted putouts by 1B or 1B arm factor, and accounting for the pitching staff’s tendency to get ground ball outs or fly ball outs or Ks in calculating IF and OF range factors, and adjusting for the presence and proportion of LH pitching is important to get done.
Having said that what can we conclude?
- The criticism that the WIS method of determining fielding and range is limited and weak is to some extent justified, in that though (and I stand by my defense of it as is given the official use of Range Factor until an update can improve the ways of calculating this) WIS quite reasonably uses an official stat, it is true that this stat is not that much more useful (it is a little) than the traditional fielding percentage (which at the major league level after the invention of leather is pretty useless – as James points out, the average 1B has a .975 fielding percentage, so the difference is between 8 and 13 errors a year, 5 plays, not much).
- HOWEVER, the wider criticism – that you CANNOT measure fielding ability or range is WRONG. What is needed is improved stats, better measurements and more precision in accounting for all the variables. So while there are clearly limitations to any baseball stat, and especially so in fielding stats, these are being improved as we speak and we need for the better and more useful ones to catch on, not a retreat into anti-intellectualism.
Finally, a note (you knew this was coming, it’s me after all) about methodology. One of James’ strengths is to have noted that fielding cannot be a completely individual statistic for various reasons (read his books for the full argument or his online blog at billjamesonline.com). The key to understanding fielding to mediate the fielding stats through the team stats at each position and then award Win Shares as he does (he is at work on a book now on Win Shares and Loss Shares) – something that applies to hitting and pitching as well but which he came to as a result of needing the right methodology for fielding, which does not consist of discreet at-bats or pitches as the other stats do.
Otherwise you are comparing the fielder directly to the league. But since every team MUST get 27 outs in 9 innings, unlike hits etc. these are merely DISTRIBUTED differently across different teams, not an absolute difference. There is no SS that makes 28 assists in a game compared to one that makes only 18. Any out not made by a SS may be made by the 2B, 3B, CF etc.
I continue to think that Win Shares, with its flaws, is much better methodologically than WAR because of this team context, which is then applied as well in different ways to pitching and to hitting. It allows for taking game situation and context into account – giving up a hit when pitching with 2 outs, no one on and your team ahead 10-0 as opposed to when it is 2-2 with the bases loaded in the 9
th. Or grounding to the left side to move a runner over, or getting a HR with no one on in the 9
th and your team behind 10-0 so the pitcher on the other team is throwing strikes and avoiding walks.
Running stats through the team’s wins, team wins being, we forget so easily, is the WHOLE POINT of baseball, not increasing WAR, allows to at least implicitly address such context play.
This has, most recently in the GREAT, GREAT talk that James gave about a year ago at a college
Here:
https://www.youtube.com/watch?v=kHEoEEUUHhw
to James’ acknowledging that in early on discounting such issues as team leadership and teamwork as factors because they were not visible or measurable was wrong on the part of Sabermetricians.
He now argues that these factors ARE measurable and visible in the difference that can be found between individual players’ potential and their actual performance. That gap narrows when a good team with real teamwork play together.
Whatever actual quantitative measurements he will subsequently come out with demonstrating this effect (there is already a little on his online blog – see link above), this acknowledgement leads me to a couple of thoughts:
- Some of what is off-putting about sabermetrics to some people is the seeming arrogance and certainty (his talk is all against certainty in that video – watch it, it is brilliant) of stat people when arguing with those unable to do the math or unwilling to try to at least wrap their heads around it (I am admittedly very poor at this, but try). So the criticisms serve a useful role WHEN they seek to show that certain factors that MAY be important are not being considered. BUT NOT when they argue that there is no way to show or measure or ascertain a certain performance or the why of an outcome. The former is helpful, the latter is anti-scientific.
- One problem with sabermetrics is not with its content but its use of names for things. Why is Runs Created called that when it is not actually looking at actual runs that scored in the real world? Why not call it Expected Runs Created, or Total Bases Adjusted or Batter’s Contribution or something that it really is? Why call Range Factor that when it is measuring putouts and assists? And when it leaves out so many variables as we have noted above? Why not Fielders Events? Or something of the sort? I think the reason is due to number 1) above – the early on arrogance of early sabermetricians, which as James points out in his talk, helped at first to combat the parallel arrogance of the mainstream status quo, convinced of stuff merely because everyone thought this for so long. This is why I had patience at first with the tone of the “New Atheists” (Richard Dawkins, Sam Harris, Christopher Hitchens) – the other side had held the field for so long and with such certainty in its own (diverse and often mutually hostile internally to the religious camp) prejudices that the tone of the atheists captured attention and was a needed corrective. Then I and everyone else got real tired of them when we saw that they were just as arrogant, certain of themselves, prejudiced and close-minded as their opponents. So it has been with sabermetrics versus the traditional baseball view. Are we sure that Mark McGwire created 165 runs in 1998 as James argues in Win Shares? No, since we are not actually talking about runs that actually scored (we know he created 70 in the real world for sure all by himself plus some help from chemistry), but we are trying to get a measurement of his contribution to the 1998 Cardinals, and James’ method (and maybe WAR etc.) are good starts. No more, but claiming that we know exactly the contribution as though that same performance with only Park Factor adjusted for would have happened for any other team or that it is with only statistical adjustments for league averages over time comparable to other performances in history in a way we can be certain is too great a claim. But no one who is serious makes such claims. Names of tools that measure our knowledge to this point and our ignorance should make this clear to avoid misunderstanding.
- As to the importance of team play as a factor, I kept wondering when someone would notice that on the 1998 Yankees, Homer Bush hit .380 (!), Shane Spencer hit .373 with 10 homers in 73 at bats, and Scott Brosius had essentially a .300 20 HR season. To think that this is just coincidence that this all happened that year and that the Yankees were just lucky is not very scientific. To merely treat these as facts (thank you Mr. Durkheim, but please contact Dr. Lakatos for more help) is state the obvious are engage in circular logic – the teams that players perform better for do better that year etc.
What is at stake here is as Bill James says in that video talk that once performance takes place we forget about potential, letting it drop to the floor. But potential always remains and players rarely play to their potential he argues, and when they do, or close to it, it happens BECAUSE they are playing for a certain team in a certain year (when will Manager factor be taken into account as well? James wrote a whole book on Managers but while it was a fun read it was disappointing in not having a lot of rigor and not asking these kind of questions..).
Further, there is a methodological problem it seems to me he has still not considered and which perhaps it takes not a statistician but a social scientist to raise:
When you study how players contributed to a team’s success it is easy to forget that the players are the team and the team is the players. For example, we correctly now correct for the fact that players should not be evaluated on having played for a good or poor team, so a pitcher’s W-L record cannot be taken at face value (this was a central early James contribution). But the team won those games with that pitcher pitching, and so to call it a “good team” is to miss that it would not have been as good a team without that pitcher pitching. There is the Replacement level player as a concept to address this. But aside from a number of problems with this concept despite it usefulness to a point (in the real world how many winning pitchers have been replaced with AAA level performers? Almost by definition none since any pitcher that can pitch in the majors is better than any that can’t), the team won those games as a TEAM with THAT pitcher on the mound, with a sense of confidence or lack thereof, in the pitcher, with fielders knowing what they had to do or could not worry about because of who was on the mound etc. and the same goes for any of the position players as well to a greater or less extent.
Put differently, imagine that I were to publish a sociological article on the contribution of various activist members or staff to a local labor union’s contract or organizing campaign, or a company’s competitive efforts in a certain market and the contributions of specific departments or individual employees, or of the staff or active members of an NGO or non-profit of church activities organization. If I were to suggest that a certain individual employee or member or staff person would, based on some statistical analysis, prove just as able at a corporate department in a marketing campaign as they were in organizing members into a union, or in organizing a human rights campaign for an NGO, outside of context, organizational culture ( a MAJOR concept in Organizational Behavior, Management Studies, Human Resources and Labor Relations etc.), my colleagues in the field would be, to put it mildly, very critical of such an individualistic methodology and approach.
If I were to try to measure whether Dr. Martin Luther King had a bigger impact or less impact on the Civil Rights Movement than say William Lloyd Garrison or Frederick Douglass had on the Abolitionist Movement, or Susan B. Anthony or Elizabeth Cady Stanton had on the Women’s Movement, and I did so by abstracting Dr. King out of his time and place, organizational context, organizational culture (the Black Church for example) as though these had little to do with his relative impact, it would not be taken very seriously by either sociologists, political scientists or historians. To be sure, things like normalization or park factor are precisely to help us make such distinctions in baseball and perhaps we could develop analogous methodological tools for social science (it would be interesting) but it can’t be done in a way that assumes that the Civil Rights Movement would not have been changed if Dr. King instead were participating in the Women’s Rights Movement (traded for Gloria Steinem and a third round draft pick) for example, and that the organizational cultures, team chemistry etc. of both those organizational forms and movements would not have been different.
So I look forward to seeing more from James on teamwork, leadership etc. and to seeing how, perhaps in a different way than Nate Silver at 538 has applied some sabermetric stuff to politics (though to the most superficial aspect – election results, not legislative outcomes, foreign policy successes etc.) we might learn from baseball’s advanced approaches to say something about which sectors of a company or of the economy are really contributing what, which social classes are contributing what to the overall outcome (I think sooner or later people will realize that Karl Marx in Capital with the Labor Theory of Value gave us the Sabermetric basis for this long ago and was dismissed by the mainstream economists just as James and sabermetrics was, though for longer because the vested interests are more powerful and even more is at stake), and more.
What we should not do is decide that because a certain methodology, tool , concept or statistic is not good enough to capture enough of the whole reality, that we should abandon any attempt to overcome our (collective and individual) ignorance and keep learning more. The more we understand about baseball the more fun it should be (same for everything else, including the universe , evolution etc.), and at no point, as James makes clear, will we be in a position to reduce everything to “we know who the best player is because this one statistic explains everything so we have made it boring.” That is not what is happening and those with that perspective, either for or against it, do not realize how little we still know and understand and how much work we still need to do.