Saturday, August 22, 2009

Advanced Statistics (Defense)

Probably the most overlooked, undervalued, and most difficult portion of the game of baseball is defense. Up until recently (around the last 5 years) teams and fans did not realize how important good defense was. This ignorance towards defense can be attributed to two things. The first is the era we are living in. As you know the last 15 years have been by far the most potent in terms of offensive production due to the advancement in PED's and realization of players that staying in shape reduces injuries and lengthens careers (which amounts to more money).
Now I could go into a whole discussion on why the way we measure defense now (a least in the mainstream) is wrong but instead I will just show you the way advanced statistics are tackling defense now.
There have been multiple ways created to measure defensive values in the last few years. A couple major ones are Dewan's Plus/Minus system and Mitchel Lichtman's Ultimate Zone Rating, known as UZR. While the Plus/Minus system is a huge step over fielding percentage and the such I believe that UZR is the best system that advanced statistics has in place. So that is what we will use. So the obvious question, how does UZR work?
The baseball field is first broken down so there are 64 zones. Almost all of this work is done by computers. What a computer will do is track the number of hits in each zone, the run value of the hit in that zone, and the number of outs recorded in that zone (for each position) on a league-wide basis. The computer then tracks each player at a fielding position and measures The number of hits in that zone while the player was on the field at that position and The number of outs recorded by that player, at that position, in that zone. Here is a example used by Michael Lichtman to help you understand how UZR works, it was written in 2003.

"Let's use the data to calculate Mike Bordick's UZR runs in zone 56 (the area between third base and shortstop). First we establish the out rate for all ground balls hit into zone 56. That is 1419 divided by 2474 (1419 plus 1055), or .57. That is, 57% of all ground balls hit into zone 56 in 2002 were turned into outs (by all fielders). Therefore, the "extra" value of a "caught ball" by a fielder in zone 56 is 1 minus .57, or .43 balls. Since Bordick caught 18 balls in zone 56, he has 18 times .43, or 7.7 "extra" caught balls so far.

Now what about the hits? There were 79 hits in zone 56 while Bordick was playing SS. Surely he is not responsible for all of those hits. How many is he responsible for? Well, since an average SS catches 294 balls in zone 56 out of 1419, or 20.7% of the outs, Bordick is responsible for 20.7% of the 79 hits as well, or 16.4 hits (the third baseman is responsible for the other 62.6 hits). I told you it was going to be tricky! Now, just like the "extra" positive value of a "caught ball" is 1 minus .57, the "extra" negative value of a hit is the .57 itself (an average ball hit into zone 56 gets caught 57% of the time, so when a ball isn't caught, the responsible fielders, in this case the SS and third baseman, get "docked" .57 balls). Since Bordick is responsible for 16.4 of the 79 hits in zone 56, he has 16.4 times .57, or 9.4 "negative" caught balls added to his 7.7 "positive" ones, for a total of -1.7 "extra" caught balls. In other words, given the number of balls hit into zone 56 while Bordick was at SS, he caught 1.7 fewer balls than the average SS in the AL in 2002.

Now we want to convert those "extra" balls into runs saved or cost. For that, we use the average run value of a hit in zone 56 - which is .47 runs. Since a 2002 AL out is worth -.29 runs, the "swing" between an out and a hit is .47 plus .29, or .76 runs. Since Bordick caught 1.7 fewer balls in zone 56 than an average SS, he has cost his team 1.7 times .76, or 1.3 runs so far (i.e., his UZR runs in zone 56 is 1.3). If we do this for every zone in which any SS made at least one out (i.e., the applicable SS zones), and we add up all the runs Bordick saved or cost in each zone, we get a total of +6.2 runs, or 6.2 runs saved by Bordick while playing SS (he must have done well in the other zones)."

A lot to read I know, but absolutely incredible stuff. There are other portions that add up to this as well. They are errors, turning double plays for infielders, and arm strength for outfielders. Let's just focus on errors. Here is some more of Michael Lichtman's work:

"The average SS committed 169 ROE (reached on base errors) errors in 5218 balls gotten to (outs plus ROE's) in all zones. That is an error rate of 169 divided by 5218, or .032. Since Bordick got to a total of 277 balls in all zones, he should have committed .032 times 277, or 8.9 errors. Instead, Bordick committed only 1 error, for a net gain in errors of 7.9. Since an infield error is worth around .49 runs, the swing between an error and an out is .49 plus .29, or .78 runs. Therefore, Bordick saved another .78 times 7.9, or 6.2 runs, by virtue of his "good hands". So far, we have Bordick saving 6.2 runs with his range and another 6.2 runs with his sure hands.

There is one final thing to consider Bordick's non-ROE errors. Like ROE errors, that is easily done.

The average SS committed 45 non-ROE errors and Bordick none. If we do the same calculations as above, using .3 as the value of a non-ROE error, we come up with Bordick saving another .72 runs. So it looks like even at the ripe old age of 36, Mike Bordick saved his team last year a total of 13 runs by virtue of his outstanding play (range and hands) at SS!"

So in determining infield defense you measure the range runs (RngR, error runs (ErrR and double play runs Dpr all together and come up with a single number, which is UZR runs. For outfielders it''s RngR, ErrR, and Arm strength runs (ARM). For example let's add up Nyjer Morgan's season this year in terms of UZR. Here are the numbers: ARM-10.1, RngR-17.9, ErrR-0.6. When you add this up you get 28.7 runs above average. Nyjer Morgan, under UZR, is the best defensive outfielder in baseball.

A few things UZR doesn't do, at least for now, is measure pitcher and catcher defense, the park the game is played in, the speed of the ball, groundball/flyball tendencies by the pitcher, batter handedness, combinations of runners out/ runners on base. has the UZR ratings of every player in baseball in case you would like to see some more examples. The data only goes back to 2002, so don't go searching for Willie Mays' UZR in 1954. The next section will deal with positional adjustments.

Also here is a link to Michael Lichtman's own article onUZR.

Tuesday, August 18, 2009

Advanced Stats (Offense Part 4)

In part 3 we discussed wOBA and how combines both OBP and SLG and uses weights to form a number which resembles an OBP. I then raised the question, what makes wOBA so special? Why is is it better than OPS? The first reason was covered in PART 3 well. A major flaw in OPS is that it treats a player with a .370/.430 hitter the same as a .400/.400 hitter. Both players would have the same OPS (.800) but the latter player is actually a better hitter. This is covered with the use of linear weights methodology.
But clearly, the most amazing thing about wOBA is that it is incredibly simple to convert into runs (aka wRAA, weighted runs above average). You may ask, why convert to runs? I would begin answering this by asking the question what does wOBA really tell you? Sure it's good for evaluating an offensive player's performances but how much is a .380 wOBA worth? The formula for runs weighs wOBA against the rest of the league. So if you see the the league average wOBA is .335 a .380 wOBA will probably be worth a lot of runs. But how many you ask? here is the formula:
((single player wOBA minus league average wOBA divided by scale) multiplied by plate appearances
The formula is pretty self explanatory except for the scale portion. The scale is 1.15, which represents the weights that season are 115 percent greater than the "standards" weights calculated. In doing so, we more closely match that season’s OBP. This scale is only necessary to convert the wOBA figure into runs above average. Yes it's confusing. In my advice don't worry about and just realize that the number is 1.15.
So let's use the example I used before, a player with a .380 wOBA against a league average wOBA of .335, and we will say that he had 600 plate appearances. Let's use the formula:
0.0391*600= 23.46
So this fictional player used his .380 wOBA to add roughly 24.5 runs to his team during the season. In case you were wondering 24.5 wRAA is good, top 15 in baseball last season.
This concludes the Advanced Stats (Offense) series. Next I will cover defense, replacement, and positions.

Wednesday, August 12, 2009

Advanced Stats- Offense (Part 3)

At the conclusion of Part 2 of offensive statistics I put forth the query of what is more important, OBP or SLG? While OPS is a very useful statistic, there is actually a more effective statistic. This statistic is called Weighted On Base Average.
Weighted On Base Average (wOBA) is a statistic that takes different offensive outcomes and gives them a number, this number we call a weight. The weights are determined by something we call run value. I'm not going to go into a lot of detail about run values but if you would like to learn more follow the link at the end of the post. The value is then multiplied by 15% to make it look more like an On Base Percentage. So whenever you see wOBA think OBP. A .400 wOBA is well above average, .340 is average, and under .300 is well below average.
Here are the weights-
non intentional walk- 0.72
Hit By pitch- .75
Single- 0.90
Double- 1.24
Triple- 1.56
Home Run- 1.95
(Some people like to include reaching base on an error, worth 0.92)
So in order to complete the formula you multiply every outcome by the weight, then add them together. You divide that number by the amount of plate appearances a player has.
Now to answer the question, what is more important. It turns out that getting on base more will lead to a higher wOBA. Let's use two players as examples. Jason Bartlett is the first player the second player is one of my personal favorites, Pablo Sandoval.
Jason Bartlett has a triple slash line of .391/.541/.932 (that's OBP/SLG/OPS). Pablo Sandoval's triple slash line is .378/.549/.927. The two are relatively close (.05 difference) in terms of OPS, Bartlett higher OBP, Sandoval higher Slugging. Bartlett just happens to have a .408 wOBA, top 10 in baseball. Sandoval's is considerably lower at .391 (a .17 difference). Bartlett's extra on base skills helps him maintain a higher wOBA. Remember when looking at wOBA think OBP, when you do this you see that both players are well above average hitters this year, Bartlett being the better of the two.
But why do we need another offensive statistic? What makes this so special? The secret behind wOBA will be revealed in part 4, the final section of Advanced Stats- Offense.

Advanced Stats- Offense (Part 2)

We discussed how OBP is a better statistic than BA. Towards the end I discussed how Aaron Rowand and another player by the name of Nelson Cruz, have a similar OBP, but Cruz gets more big hits. This difference in power is shown by a statistic called Slugging Percentage (SLG). Slugging Percentage is simply total bases/ divided by at bats. Total Bases are all the bases that a player accumulates. A single is worth 1, a double is worth 2, a triple is worth 3, a home run is worth 4. Walks are not factored into slugging percentage. The average team slugging percentage since 2007 is .419. However, a good/acceptable slugging percentage is really dependent upon which position a player plays. A .420ish SLG may be acceptable for a shortstop, but is almost definitely unacceptable for a first baseman.

So, when we compare the slugging percentages of Aaron Rowand and Nelson Cruz we notice that Cruz has an enormous advantage. Cruz has a .547 SLG compared to Rowand's .436. What does this tell us? It tells us that Cruz is just as good as Rowand in terms of drawing walks and getting on base and is much better at hitting for power. Overall, Nelson Cruz is a better hitter.

Although better than BA, OBP and SLG have their flaws. OBP doesn't take into account extra base hits while SLG doesn't take walks into account. One way that people have combated this problem is by combining OBP and SLG together. This combined statistic is called OPS (On Base Percentage + Slugging Percentage). The formula is calculated exactly how it sounds you take OBP and add SLG together. This stat helps when comparing players with different OBP and SLG. For Example let's use Nelson Cruz again and substitute Aaron Rowand with Jayson Werth.

Jayson Werth has an OBP of .381 compared to Cruz's .337. Werth has a .503 SLG compared to Cruz's .547. When combined we notice that Werth and Cruz have a nearly identical OPS. Werth's OPS is .883 while Cruz's OPS is .884. This suggests that both players are both pretty equal in terms of hitting.
Going back to that league average slugging percentage, we can see that not only is slugging percentage dependent on the position you play but as well as how often you get on base. As stated before a .420 SLG for a first baseman is below average, but if he gets on base at a .420 clip his OPS would be .840. That's pretty acceptable for a first baseman.
But what is more important? Getting On Base more or hitting for power? Is Werth's OBP more important than Cruz's power or is it the other way around. Wait for Part 3 to find out.

Monday, August 10, 2009

Advanced Stats- Offense (Part 1)

Before I start posting articles, I want to let everyone know about some of the advanced statistics that will be used on this site. The next few posts will make it easier for people to understand what I will be talking about in the future.
Over the last several years it has been made pretty clear that players were being extremely over valued by a certain statistic. Most people have learned that while batting average is a neat and somewhat worthwhile stat to judge a player on, there are much more important statistics that one can use to gauge a players performance. The two main ones that have been recently been brought in the spotlight are On Base Percentage and Slugging Percentage.
What does on Base Percentage tell us that Batting Average doesn't? The easiest way to win a baseball game is to not make outs. How does one not get out? Get on base. Sure, it is a lot cooler to get a hit in baseball than it is to get a walk, but getting a walk is just as important as getting a hit because in both situations an out is avoided. The formula for Batting Average (BA) is simply hits/ at bats On Base Percentage (OBP) is hits+walks(BB)+Hit By pitch(HBP)/ At bats (AB), walks(BB), HBP, and Sacrifice Flies (SF). So when looking at a players stat page look first at his OBP. Lets use two familiar players as examples. The first player is Jose Lopez of the Mariners , the second is Aaron Rowand of the Giants. Jose Lopez currently has a .278 BA, that's a bit above average. Aaron Rowand only has a .275 BA. 15 years ago people would look at those numbers and say "Well since Lopez has a sizable edge in terms of RBI's and Homers, it's safe to say that Lopez is a better hitter than Rowand." OBP tells a much different story. Jose Lopez likes to swing. . . a lot. He currently only has 16 walks (3.8 of his AB) on the season good for an OBP of only .305. Rowand on the other hand has an OBP of .332, a huge improvement over Guzman.
It has been made pretty evident over the years that taking pitches and drawing walks is a definite skill. It's very rare for the same player to have varying walk rates year-to-year. Hits on the other hand constantly change.
But what about power hitters? Is it fair to compare a guy like Aaron Rowand to say Nelson Cruz? Cruz has a similar OBP as Rowand (.337 for Cruz, .332 for Rowand) but he has more big hits like home runs and doubles. This problem will be covered in PART 2.