On some of the limits of WAR Topic

Ain’t Gonna Study WAR No More

By Bill James from billjamesonline.com
February 13, 2015
Nothing I will say here is correctly understood to involve any criticism of WAR (Wins Above Replacement) whatsoever. I know that people will take it that way, but that’s not what I’m saying. Nothing I have said should be taken to suggest that Win Shares/Loss Shares is any better than WAR, or that the concept of Replacement Level is not a valid analytical approach.
Rather, what I am trying to say is merely something that should be obvious, but which I believe some people have lost track of in their analytical over-confidence: that it is entirely impossible to accurately measure a complex, multi-dimensional reality in one dimension, much as we might want to do so. That being the case, it is my preference, when we need to use a one-dimensional measurement of value, to use measurements which are easy to unpack into their various components—wins and losses, offense and defense, playing time and performance level, etc.
This research arose as a result of a discussion in the article/comments section, specifically The Fielding Jones, Parts LXI – LXV, published here on February 6. In Part 62, The Compression Issues, I wrote the following:
Suppose that a player, who we will call Ted Williams, 1953, hits .400 with 13 homers and a ridiculously high OPS, but plays only 37 games for one reason and another, while another player, who we will call Eddie Bressoud, 1966, plays 133 games in a season but hits .225 and makes 23 errors at the four infield positions. WAR might say that these two players each have a value of 2.0 WAR, and thus implicitly says that these two seasons are the same—but they’re not the same; they’re not the same at all.
I wasn’t intending to say anything original or argumentative here; I thought I was merely stating the obvious, but several other people responded that they did not believe this to be true. "Can someone explain why it's obvious that Williams '53 and Bressoud '66 did not provide equivalent value?" asked one reader, "Was Williams clearly more valuable because he was so much better on a per-PA basis? Or was Bressoud clearly more valuable because he played four times as many games? To me, it seems entirely plausible they provided equal value."
"Speaking for myself," posited another positer, "I don't see that it's obvious that Williams' season is less or more valuable than Bressoud's. They were very different, obviously, but that is what causes the problem in my mind. How to compare such different seasons? That's what I thought Win Shares and WPA were doing, and I've enjoyed both those stats for those very reasons."

"The Ted Williams/Eddie Bressoud dispute can be resolved in this way," I responded. "Let us say that these two players each have a WAR of 2.0 (which is what I found on Baseball Reference.) The question is, do teams that have Ted Williams-type 2.0s on them have an equal chance to win as teams that have Eddie Bressoud-type 2.0 WAR? Actually, the Ted Williams example is SO extreme that there probably is no comparison group; you’d have to find a somewhat less bizarrely dominant season to form the study. . .Willie McCovey in 1959 or something. But on the general question. .. .seasons of a certain WAR in limited playing time (Type Williams) vs. seasons of the same WAR in abundant playing time (Type Bressoud), do teams that have Williams-type seasons on their roster have the same chance of winning the pennant or the same chance of winning 90 games as teams that have a Bressoud-type season?
It would seem obvious that they would not. It would seem obvious that if you were to study THAT issue, you would find that teams that had Williams Type seasons on their roster were more likely to have successful seasons. Which, if true, would demonstrate that their value is not, in fact, equal."

Departing from that, I have spent two days doing that study, which I will get to in a moment. Well, no, I guess I will get to it now; hard to figure out the line of march here with the discussion wandering all over the place.
My study
I took all players (non-pitchers) in the National League in 1975, and copied into a spreadsheet their full-career "value" stats, including Playing Time, copied from Baseball Reference. There are a total of 254 players, who played a total of 3,197 seasons, although this counts a player who plays for two teams in one season as two players. I eliminated from the study all seasons with negative WAR, for obvious reasons, and I eliminated all seasons from the 1981 strike-shortened season, also for obvious reasons. Then I formed "matched sets" of players, a "Ted Williams Group" and an "Eddie Bressoud Group". The Ted Williams Group was players who had a certain value—1.0 WAR, 1.5 WAR, 2.0 WAR, 3.0 WAR, whatever—in a small number of games and plate appearances, and the "Eddie Bressoud Group" was players who had identical WAR, but in a larger number of games and plate appearances. For example, Merv Rettenmund in 1970 played 106 games, 385 plate appearances, and was credited with 4.8 WAR—quite exceptional performance. Pete Rose in 1970 was also credited with 4.8 WAR, but played 159 games and had 730 plate appearances.
Rettenmund’s team won 108 games; Rose’s team won 102 games, so. . .no big difference. Reggie Smith in 1980 played 92 games, 362 plate appearances, and had 4.3 WAR; Gary Matthews in 1979 played 156 games, 695 plate appearances, and also had 4.3 WAR. Smith’s team was 24½ games better, but on the other hand, Keith Hernandez with the Mets, 1983, and Bobby Murcer, 1970, also both had 4.3 WAR, Hernandez in 95 games and Murcer in 159. In that case Murcer’s team was 25 games better, so that balances the Smith/Matthews example.
This can be seen as a study of active and inert ingredients. The player’s value is the active ingredient. The playing time without value is the inert ingredient. We are studying the impact of adding inert ingredients to a package of value.
I made 225 matched sets of players like this. Here is a chart of twenty more:



Willie McCovey, 1959, wound up matched against Darrell Evans, 1975, as you can see. I selected the matched sets in a systematic fashion with no subjective input; I could have selected more, but the difference between the two groups was so large that to extend the study would have been pointless. The performance of the teams which had the Ted Williams-Group players was 859 games better than the performance of the teams with the Eddie Bressoud-Group Players. The teams with the Ted Williams Group Players were 18,501-17,750, a .510 winning percentage. The teams with the Eddie Bressoud-Group Players were 17,687-18,654, a .487 winning percentage. The teams which invested fewer games and plate appearances in the value package were, on average, almost 4 games better than the other teams.

This difference, in fact, is so large that it demonstrates that something else is being measured here, other than just the difference in the value of the players; the difference in the value of the players can’t be that large. There’s a selection bias that is augmenting the totals. Let’s set that issue aside, however, because I realized as I was doing the study that I could probably explain why this HAS to be the right answer, why the teams with Ted Williams Group players have to win more games.
Suppose that you take the extreme cases; suppose that you have a player on Team A who has just one plate appearance and has 0.0 WAR, and on the other end (Team Z) you have a player who plays 162 games and also has 0.0 WAR. Is the information that we have about these two teams the same, or different?
Of course it is different. About Team A we know almost nothing; therefore, we would have to assume that the team will probably win half of their games, a .500 team. About Team Z, on the other hand, we know that they got no value at all from a player who played every inning. It is likely, then, that the team had a losing record. The average of all "Team Zs" would certainly have a losing record. Team A has an advantage over Team Z.
Suppose, then, that we add some amount of "value" to each package; let’s say that each player hits an additional 5 home runs, so that each player has a value of 1.0 WAR. The relationship between the two is the same. If X+1 = Z + 1, then X = Z.
If you creep upward in value, so long as it is the same value on both ends, the relationship doesn’t change. If you reduce the amount of the inert ingredient (the playing time differential), that reduces the amount by which Team A might be better than Team Z, but it does not change the fact that Team A IS better off than Team Z. It has to be true.
In the real case, the "production" which is added is of a mixed type—some innings at shortstop, some triples, some stolen bases. That confuses the comparison, but it doesn’t fundamentally change it. We can put this in one sentence, one short sentence: When teams play players who have no value, that hurts their chance of winning. If that isn’t obvious to you, because of your commitment to the concept of WAR, then might I suggest the possibility that you’ve lost perspective?
Again, I am not criticizing WAR; WAR is a good stat, a valuable part of the catalogue of measurements we use to make sense of baseball. But no statistic will tell us everything we need to know.
Having devoted about 30 work hours to this study, I went back to the comments section for the relevant article, and realized that the reader who had more or less provoked the study had, in a subsequent note, acknowledged that the study would turn out as it did—but had then confused the issue by adding in irrelevant parameters.
Guy123
The study Bill suggests is fine, with the obvious stipulation that each matched pair of teams have the same total WAR. That is how we will find out if Williams' 2.0 WAR is truly more valuable than Bressoud's. And I agree the outcome seems obvious -- there will be no measurable difference in wins -- but maybe we'll be surprised.

If you run the study without that condition, you would no longer be comparing the value of the two 2.0 WAR seasons, but would instead be measuring what teams do when a superstar goes down (or emerges late). Whether those teams would actually be above-average as Bill guesses, I don't know -- an injury to a star player is not an obvious formula for success -- but in any case it won't answer the question we are wrestling with here.

Wait a minute. . .what?
Look, if Player A’s team may be expected to win more games than Player B’s team, based on the performance of Player A vs. Player B, what that means is that Player A is more valuable than Player B. That is how we define value for a baseball player, in a perfect world; it’s winning more games for your team. A player who hits 30 homers has more value than a player who 10 homers, for one reason and one reason only: that his team will win more games. If grounding into double plays made your team win more games, grounding into double plays would add value.
If you run the study without that condition, you would no longer be comparing the value of the two 2.0 WAR seasons.
Yes, we are. If you throw in some crap about the performance of other players on the team,you’re not measuring the value of the 2.0 WAR seasons.
What Guy123 is doing here is starting with the assumption that WAR = value, and then reasoning to the conclusion that WAR = value.
Clearly, if the only two things we know about a pair of teams is that one got 2 wins from 110 PA, while the other got 2 wins from 464 PA, we would expect the first team to be stronger. That simply means that teams typically generate more than zero wins from 354 PA, which is obviously true. If that's all the matched pair study is designed to show, I think we can all stipulate to that and save the labor.

But if teams typically generate more than zero wins from replacing the player, what that means is that the replacement level is not the actual replacement level. That’s true; it isn’t. Replacement Level is a contrivance, a fiction that we have invented by which to compare superstars to featureless AAA players. It is a very useful contrivance, a useful fiction—but it is a fiction.
Look, I’m not knocking WAR, and I’m not knocking Guy234. WAR is a useful concept; it is not a perfect concept. One of many reasons it is not a perfect concept is that it compresses Wins and Losses into one unit, and when you compress Wins and Losses into one unit, that results in distorted comparisons. Suggesting that Eddie Bressoud in 1966 has value equal to Ted Williams in 1953 is one result of that compression distortion. It’s not true; it’s not accurate. It’s not reasonable. It’s ridiculous. That’s not value.
When we produced data, in the 1970s, showing that run support for different starting pitchers does not even out over the course of a season, and therefore won-lost records for starting pitchers often had nothing to do with how well the pitcher pitched, there were lots and lots and lots of people who had lots of reasons why it couldn’t be true. We tried to talk to those people, for a month or so, and then we said, "OK, we’re moving on now; if you get it, you get it, and if you don’t you don’t." And that’s where we are now: This is not a debatable point; this is obvious. WAR is not value. If you get it, you get it, and if you don’t, you don’t, but I’m moving on now.
?
2/24/2016 4:17 AM
On some of the limits of WAR Topic

Search Criteria

Terms of Use Customer Support Privacy Statement

© 1999-2026 WhatIfSports.com, Inc. All rights reserved. WhatIfSports is a trademark of WhatIfSports.com, Inc. SimLeague, SimMatchup and iSimNow are trademarks or registered trademarks of Electronic Arts, Inc. Used under license. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.