Earlier, we talked about how hard it is to predict the future. As our old professor Larry Sabato used to say:
He who lives by the crystal ball ends up eating ground glass.
So, with that in mind, let’s fry up a delicious glass omelet!
Jim Albright’s system gives us the means to translate Japanese into American, so to speak: numbers from NPB become numbers from MLB. Of course, the translations overlook several factors. They do not account for park effects, for one. Another: they don’t adjust for age. And finally: they don’t account for league difficulty. These are problems I’ll try to tackle at some point in the future; but for now, we’ll overlook our beauty queen’s gapped teeth and barely noticeable moustaches, and get her ready for the swimwear competition.
The first set of translations are easy: we simply hold IP constant and multiply the other statistics by the translation factors. As mentioned yesterday, 100 hits in 100 IP become 107 hits in 100 IP. Et cetera. (Stats we don’t have translation factors for, like HBP and WP, were left unchanged). The largest adjustment turns out to be for home runs; despite the bigger parks in America, pitchers have trouble keeping the ball in the yard when they make the trip over here.
So, we’ve translated hits, home runs, strikeouts, and walks to MLB equivalents. What now? We used Bill James’ component ERA formula to calculate an ERC for each player. Then, based on the number of innings, we figure out how many earned runs that player must have allowed given the number of innings they pitched.
Aside: the ERC formula requires BFP (which we don’t have for all years) as one of its inputs. Using the Lahman database and Excel, I regressed BFP on IPouts, H, BB, K, HBP, and HR. I used the weights from this regression to estimate BFP. For a full season’s worth of hitters (800+ BFP) the calculated value is usually within 5 BFP and rarely further than 15 BFP from the correct value. Click here to see the regression results.
Then, using their historical ratio of R to ER, we take a stab at guessing how many unearned runs they might have allowed in addition. If we wanted, we could also try to guess how many wins and losses a player would have had based on their RA, an assumed team RA and run context. For now, we’ll just ignore them in our translated statistics.
After we have all our translations done, we should adjust everything for age and park. And maybe we will, later. But for now, a simple flat 3/2/1 projection without mean regression will have to suffice. What that means in English: we will assign each of the last three years a weight of either 3 (for the most recent year), 2, or 1. We will then calculate the weighted average for stats like BB, H, K, IP, etc. using that algorithm. ERC, ER, and R are re-calculated as described above. Finally, we will re-calculate starts and innings pitched based on the assumption that Japanese pitchers will throw fewer pitches per start (but start more frequently) in America, and pro-rate other stats accordingly.
Aside: Since the start of the 2000 season, 420 pitchers have started at least 25 games with one team during a season while making no relief appearances. I calculated the average number of batters those pitchers faced per start — it’s 26.7. From this number, we can assume either a number of starts or a number of batters faced and back into innings pitched (and hence other numbers) that way.
Here’s how Igawa did in Japan the past three years:
Pretty good numbers (although lots of home runs). 228 K in 200 IP looks great. Watch what happens after the translation:
Some good, some bad. Note that Igawa’s BB/K ratios are always pretty good, though he gave up too many baserunners and homers in ’04 and ’05. Hard to find anything wrong with the translated 2006 line, although we find it a tad too optimistic a translation. Keep in mind that GS has not been adjusted, and it’s unlikely that Igawa would have stayed in each start as long as these stats would lead you to believe. We will adjust for that in his projection.
Aside: You might wonder why Igawa’s Japanese ERA was nearly identical in 2004 and 2005 yet translated so differently. The first numbers use his actual Japanese ERA; the second estimate what his ERA would have been in America given his component stats. Thus, despite posting similar ERAs in Japan in 2004 and 2005, Igawa’s components indicate he pitched much better in 2004 than he did the following year.
Now, we project his stats using the model described above. We will assume he makes 30 starts and faces 26.7 batters per start. Also, we assume he plays for a team that scored 4.85 runs per game (splitting the difference between the AL and the NL, as this is projection applies to neither league in particular). Finally, we’ll assume he got a decision for every 9 IP and calculate his winning percentage using James’ pythagorean formula with an exponent of 1.82. That gives us this:
The ERA is deceptive – he’s giving up a lot of unearned runs. Basically a league-average starter. This line is somewhat similar to Matt Clement or Jeremy Bonderman ca. 2005. If he can match this projection, he’s worth Jeff Suppan money.
Kuroda apparently re-signed with the Carp already, but let’s take a look anyway. Japanese actual stats:
He doesn’t strike out a ton of hitters, but he keeps the ball in the park and has great control (his R/ER ratios are surprising – they’re very low for a groundball pitcher, as he reportedly is). Translated:
Those hold up very well, mainly because he doesn’t walk anyone and keeps the ball down. Note the 2005 3.17 translated ERA matches the 3.17 actual ERA by a lucky quirk: his Japanese peripherals suggested he was unlucky to have an ERA as high as it was. Projected to 2007:
A Cy Young candidate in the National League. Two important caveats: first, there’s no age adjustment, and he’s on the bad side of 30. This would cause him to take a hit. Second, it seems unlikely that a guy who relies on control could allow so many balls in play but so few over the fence. This projection is probably at least a run too low.
Ahh, my favorite player in NPB. His numbers are fantastic; how will they hold up? Actual numbers:
Not a lot of innings in 2004 and 2005; was he hurt? Tons of runs in 2004 despite pretty good peripherals, too. Translations:
Not a bad 2006, huh? The H/9 looks too low, though. He would have run away with the Cy Young if he put up those numbers in MLB. 2007 projection:
Sign me up! I’m not sure if he would be able to sustain the BABIP, though. I think this projection is a tad optimistic, but I buy it more than Kuroda’s. Note that I didn’t give Saitoh 30 starts, as that would have been a reach given the number of innings he’s thrown recently.
What we’ve all been waiting for. Actual stats:
Absolutely dominant. 138 hits in 186 innings is incredible. He did miss a few starts in 2004 to injury. Translated:
It’s hard not to get excited. K/BB is still over 5. HR rates are low. Wow. And the projection:
Wonder why teams are bidding $25 million just to talk to this guy? Now you know. He probably won’t be this good — his projected BABIP is too high, for instance. But you never know…