Deciding lineups based on sums Topic

Posted by tommy_cian on 8/24/2010 4:03:00 PM (view original):
Posted by goyankees2 on 8/24/2010 3:51:00 PM (view original):
I was gonna do pitchers pretty soon (I literally just did hitters a day ago), but I would be less confident. Any basic pitcher stat on Player Search (how I'm getting the data) is just too dependent on other factors, as you said. 

A potential solution to your problem: either a) ignore 3rd-5th altogether, since they probably aren't that important, or b) use dummy variables to indicate whether a pitcher HAS a third pitch. Correlation does not equal causation -- my guess is that you're getting your result because usually only starters have a 3rd pitch, and that starters are usually worse per inning than relievers. Assign 0 if a pitcher has no 3rd pitch, and 1 if he does. Might fix your problem.

I'm not new to stats, (or WhatIf, as I played SimLeague a while back on an account I no longer have the email for), but I am new to HBD, so this stuff is interesting to me as well.

or create an ordinal variable, 0 = no 3rd pitch, 1 = third pitch with rating under XXX, 2 = third pitch with rating over xxx.  To figure out what XXX equals, try to graph the stat against the dependant variable and see if there is a clear break in the function (a point where the slope of the line changes, you may even find two breaks). 

You could definitely try that, but I'd guess that, when you weight by the pitchers actually getting playing time, pitchers with non-zero 3P are gonna have their 3P be normally distributed. I could be wrong, and testing is the only way to be sure, but if you're like me and aren't anal-retentive, you won't waste too much time with a relatively unimportant attribute. (Not saying anal-retentive is bad; in this area it's quite good. But most people are probably like me in that they're lazy.) Besides, if you're going to use techiques beyond each individual attribute, there are others that will make a much larger difference.
8/24/2010 4:35 PM
Posted by jvford on 8/24/2010 4:34:00 PM (view original):
Posted by goyankees2 on 8/24/2010 3:03:00 PM (view original):
I just ran a multiple linear regression -- predicting players' OPS by their batting attributes. I only had a half-season of data, but I still got an R^2 of 58%, which is pretty solid considering it ignores park factor, platooning, etc. It also isn't as advanced statistically as it could be, although I will say it's not just using each attribute as a variable straight-up. I made a little calculator and plan on using it regularly, especially for the draft.
The formula I use to predict OPS gave me an R^2 of 79% using career data (while ignoring players with significant ABs before full development or during sharp decline). 

I've had very little luck producing anything of real value for pitching.
I'd guess that the main difference between mine and yours is the data (although you could very well have a better technique than me). Are you getting the career data fairly easily?
8/24/2010 4:37 PM
early results not great for pitchers...  still trying.
8/24/2010 4:42 PM
Posted by goyankees2 on 8/24/2010 4:37:00 PM (view original):
Posted by jvford on 8/24/2010 4:34:00 PM (view original):
Posted by goyankees2 on 8/24/2010 3:03:00 PM (view original):
I just ran a multiple linear regression -- predicting players' OPS by their batting attributes. I only had a half-season of data, but I still got an R^2 of 58%, which is pretty solid considering it ignores park factor, platooning, etc. It also isn't as advanced statistically as it could be, although I will say it's not just using each attribute as a variable straight-up. I made a little calculator and plan on using it regularly, especially for the draft.
The formula I use to predict OPS gave me an R^2 of 79% using career data (while ignoring players with significant ABs before full development or during sharp decline). 

I've had very little luck producing anything of real value for pitching.
I'd guess that the main difference between mine and yours is the data (although you could very well have a better technique than me). Are you getting the career data fairly easily?
No, not easily.  I had to manually click on each player and it took about 2 hours one day to do it for half of the world (about 160 players).
8/24/2010 4:44 PM
Posted by tommy_cian on 8/24/2010 4:42:00 PM (view original):
early results not great for pitchers...  still trying.
Every time I thought I was getting somewhere with pitchers, some pitcher would make my work look stupid.
8/24/2010 4:46 PM
Jeez. Well, you earned it then.
8/24/2010 4:47 PM
I figured career numbers were the only way to minimize park factors and platooning.

Anyway, not something I'm looking forward to doing again.  I just hope they don't change the engine.
8/24/2010 4:51 PM
Posted by jvford on 8/24/2010 4:46:00 PM (view original):
Posted by tommy_cian on 8/24/2010 4:42:00 PM (view original):
early results not great for pitchers...  still trying.
Every time I thought I was getting somewhere with pitchers, some pitcher would make my work look stupid.
X2

I eventually gave up on pitchers.   I tried separating by DUR/STM, GS/RA, grouping(to widen the fields) and any other number of things.   I'd get it to work with the top 2-3 on a staff and 4-8 would be scrambled. 
8/24/2010 5:42 PM
Posted by MikeT23 on 8/18/2010 8:50:00 AM (view original):
I have a formula for hitters that runs almost perfect for OPS vs L/R.   It's not opinion-based.  It's result-based. 

I won't give it out but a player has to have at least 100 AB vs. the handedness before I'll count it.    Out of 10 players who'll qualify, there are usually one overperformer and one underperformer.   The rest run 1-10 in order. 
Is this the formula that has helped you shape your Rochester and Charleston teams?
8/24/2010 7:33 PM
Wow. I just bat guys in order of their push/pull rating.
8/24/2010 10:29 PM
so what if we analyzed pitchers that were within a specific range (lets say starters within a 69-74 range), and looked at what was the key areas that found success, given ballpark...   I'm drunk tonight, tomorrow I figure this out.  that will help with understanding who to sign within this specific group (i.e., two starters around 72, should I sign the guy with higher splits or the guy with higher 1st pitch...).
8/25/2010 1:44 AM
Posted by cjlancaster on 8/24/2010 7:33:00 PM (view original):
Posted by MikeT23 on 8/18/2010 8:50:00 AM (view original):
I have a formula for hitters that runs almost perfect for OPS vs L/R.   It's not opinion-based.  It's result-based. 

I won't give it out but a player has to have at least 100 AB vs. the handedness before I'll count it.    Out of 10 players who'll qualify, there are usually one overperformer and one underperformer.   The rest run 1-10 in order. 
Is this the formula that has helped you shape your Rochester and Charleston teams?
It's one thing to know what works and an entirely different one to actually be able to acquire it.   I knew before the season started that I'd be struggling for .500.  I didn't know my Rochester team would be incapable of scoring(although we aren't even 20 games into the season).
8/25/2010 6:42 AM
Posted by tommy_cian on 8/25/2010 1:44:00 AM (view original):
so what if we analyzed pitchers that were within a specific range (lets say starters within a 69-74 range), and looked at what was the key areas that found success, given ballpark...   I'm drunk tonight, tomorrow I figure this out.  that will help with understanding who to sign within this specific group (i.e., two starters around 72, should I sign the guy with higher splits or the guy with higher 1st pitch...).

The biggest problem I ran into with pitchers is defense.  It has such a big effect on pitchers, it's hard to measure, and it can't be minimized by a large pool of data (like park effects).

8/25/2010 8:48 AM

That's why I only did it with my teams.  My defense doesn't change that much from year to year.    I think the variance has a lot to do with competition/parks.  A SP is going to get 36 starts(more or less).  Depending on how the schedule works, he may face good offensive teams in hitter's parks 75% of the time.   Or maybe he won't.   Position players will get 500 AB under all sorts of conditions because they're playing every day. 

One could probably break it down to opponents/parks but then the sample size would be so tiny it would be pointless.

8/25/2010 9:26 AM
To try to eliminate the ballpark effect for batters, I'm only harvesting data from players that played in a city with zeros for all park effects.  The list includes Syracuse, Rochester, Charlotte, Ottawa, Richmond, Vancouver, Little Rock, and Trenton.  You can't get stats for Home & RHP/LHP, but I figure just taking against RHP or LHP should be enough, as the effects of the away ballparks should be normally distributed.
8/25/2010 10:11 AM
◂ Prev 123 Next ▸
Deciding lineups based on sums Topic

Search Criteria

Terms of Use Customer Support Privacy Statement

© 1999-2026 WhatIfSports.com, Inc. All rights reserved. WhatIfSports is a trademark of WhatIfSports.com, Inc. SimLeague, SimMatchup and iSimNow are trademarks or registered trademarks of Electronic Arts, Inc. Used under license. The names of actual companies and products mentioned herein may be the trademarks of their respective owners.