Posted by tommy_cian on 8/24/2010 4:03:00 PM (view original):
Posted by goyankees2 on 8/24/2010 3:51:00 PM (view original):
I was gonna do pitchers pretty soon (I literally just did hitters a day ago), but I would be less confident. Any basic pitcher stat on Player Search (how I'm getting the data) is just too dependent on other factors, as you said.
A potential solution to your problem: either a) ignore 3rd-5th altogether, since they probably aren't that important, or b) use dummy variables to indicate whether a pitcher HAS a third pitch. Correlation does not equal causation -- my guess is that you're getting your result because usually only starters have a 3rd pitch, and that starters are usually worse per inning than relievers. Assign 0 if a pitcher has no 3rd pitch, and 1 if he does. Might fix your problem.
I'm not new to stats, (or WhatIf, as I played SimLeague a while back on an account I no longer have the email for), but I am new to HBD, so this stuff is interesting to me as well.
or create an ordinal variable, 0 = no 3rd pitch, 1 = third pitch with rating under XXX, 2 = third pitch with rating over xxx. To figure out what XXX equals, try to graph the stat against the dependant variable and see if there is a clear break in the function (a point where the slope of the line changes, you may even find two breaks).
You could definitely try that, but I'd guess that, when you weight by the pitchers actually getting playing time, pitchers with non-zero 3P are gonna have their 3P be normally distributed. I could be wrong, and testing is the only way to be sure, but if you're like me and aren't anal-retentive, you won't waste too much time with a relatively unimportant attribute. (Not saying anal-retentive is bad; in this area it's quite good. But most people are probably like me in that they're lazy.) Besides, if you're going to use techiques beyond each individual attribute, there are others that will make a much larger difference.