*** The information used here was obtained free of
charge from and is copyrighted by Retrosheet. Interested
parties may contact Retrosheet at 20 Sunset Rd.,
Newark, DE 19711. ***
The information used here was obtained free of
charge from and is copyrighted by Retrosheet. Interested
parties may contact Retrosheet at 20 Sunset Rd.,
Newark, DE 19711.
Without game logs and play-by-plays, conveniently provided by Retrosheet,
a study this intricate would not have been easily possible.
Retrosheet's comprehensive site has been an invaluable source of historical data.
-----------------------------------------------------------------
-----------------------------------------------------------------
The two fundamental things starting (or any) pitcher strives to do are 1) to prevent opponent runs from scoring and 2) to win games (and not lose them).
The process strives to score each of these categories for every qualifying starting pitcher season since 1890, express each as a 'winning percentage' and standardize them. Finally, the two scores are simply added to derive a final score. All scores are then ranked.
In order to complete an in-depth analysis of this magnitude, thanks to
Retrosheet*, the final scores for over 200,000 MLB games since 1890 were loaded into a relational database. In addition, each year's relavent statistics (from a varity of sources) for each pitcher were loaded into the database in order to
perform comparison analysis. Finally, over 16 million event (play-by-play)
records from Retrosheet were loaded into the relational database.
Many processes have been created to rate starting pitchers. This one is somewhat unique in that it is not solely based upon a pitcher's relative value to his team or league. Rather, it's foundation is to rate the 'achievement value' of his
numbers, considering all of the things that affect the level of difficulty.
For example, generally everyone agrees the it is much better in 2019 (all things being equal) to have a pitcher throw 250 innings with an ERA of 3.00 than it is to have one who throws 200 innings with the same ERA. But, it's the same ERA. The 250 IP starter is much more valued but the achievement is essentially the same, with a modest advantage for the 250 inning pitcher.
Likewise, a winning percentage on an average team of 16-8 is better than 22-12
(again, all other things being equal). A team wants the second pitcher more, but the first has achieved slightly more because he won a higher percentage of his games.
Below is a brief, step by step explantion of the process.
THE PROCESS
DEFENSE FACTOR
Determining to what extent fielders may have helped or hurt a starter's E.R.A. is the most complex piece to this study. This is a separate issue than the difference between earned and unearned runs. This analysis focuses only on earned runs. Still, the range and other abilities of fielders certainly affects opponents' ability to score runs. The actual E.R.A. is revised based on these things.
First up: The foundation of what I do is count bases, any way a base can be obtained, by going through the play-by-plays from
Retrosheet. Example 1: Runners on 1st and 2nd = 3 (1 + 2). After the play (an out), runners on 2nd and 3rd (2 + 3 = 5). Two bases have been gained. Had the batter walked, it would have been 3 bases. Example 2: Runner on 1st (1). Double play = 1 base lost.
Fielder's choice, runner at first is replaced? 0 bases.
__________________________________________________ ____________________
Next, I sum and break down these bases gained (or lost) by 6 categories of events (5 really.
For the most part, bases are not gained following strikeout events):
1) Non-HR hits.
2) Non-Strikeout outs (balls in play)
3) Strikeouts
4) Walks/HBP
5) Home Runs
6) Errors
All other events are included in the category where they best fit. Ex. A stolen base, in effect, is the stretching
of a walk or single into a double. It is placed into Non-HR hits as 2 bases. Pickoffs and caught steals are placed into
Non-Strikeout out, etc. All major events are categorized in one of the above to make the process less convoluted, but
still a good approximation.
-------------------------
The Details (an example). Note: After this process was written, additional data became available and the process
was slightly tweaked. The numbers may vary but the concept is the same.
Bob Gibson-1968: 304.67 IP (914 outs)
Bases given up via:
1) Non-HR hits: 345 (on 187 hits)
2) Non-Strikeout outs: 19 (on 636 outs -- 914 minus 268 strikeouts)
3) Strikeouts: # of bases is negligible (on 268 strikeouts)
4) Walks/HBP: 81 (on 69 BB/HB)
5) Home Runs: 57 (on 11 HR)
6) Errors: 26 (on 13 errors)
National League-1968: 14,681 IP (44,043 outs) Note: A few play-by-plays were not available
when this was written. Below numbers vary a little. Plus, the park factor (PFOPP) number may vary.:
1) Non-HR hits: 25,559 (on 12,187 hits)
2) Non-Strikeout outs and other events: 2611 (on 33,668 outs and other events -- roughly 44,043 minus 9338 strikeouts)
3) Strikeouts: # of bases is negligible (on 9338 strikeouts)
4) Walks/HBP: 5911 (on 4543 BB/HB)
5) Home Runs: 4682 (on 884 HR)
6) Errors: 1565 (on 774 errors)
Pro-rated for Gibson's 304.67 IP (an average league pitcher with Gibson's IP):
1) Non-HR hits: 530.4 bases (on 252.9 hits)
2) Non-Strikeout outs: 54.2 bases (on 698.7 outs)
3) Strikeouts: negligible
4) Walks/HBP: 94.3 walks/HBP
5) Home Runs: 18.3 home runs
6) Errors: 16.1 errors
-----------------------------------------------------------------
The Process:
a) The league average pitchers gave up 584.6 bases (530.4 + 54.2) on non-HR hits and non-KO outs.
b) Gibson gave up 364 (345 + 19)
The league average pitcher's 584.6 is based upon 951.6 events (698.7 non-KO outs + 252.9 non-HR hits).
Gibson's is based upon 823 events (636 non-KO outs + 187 non-HR hits).
We need to adjust to bring Gibson's 823 up to 951.6, by adding 128.6 events. How many non-KO outs and non-HR hits do we add?
* Based upon Gibson's breakdown of the two, we add:
- 29.2 non-HR hits (equating to 61.3 bases by league average)
- 99.4 non-KO outs (equating to 7.7 bases by league average)
Then we add 61.3 and 7.7 to Gibson's 364 to get 433. This standardizes Gibson's events to make it comparable to league average.
c) 584.6 (league average) minus 433 = 151.6. This is his Base Figure (get it). Then, we make some adjustments based on misc. factors.:
* BB/HBP: League average over 304.67 IP (94.3) minus Gibson's 69 = 15.3. Based on league average, this equates to 32.9 bases. 151.6 minus 32.9 = 118.7.
* HRs: League average over 304.67 IP (18.3) minus Gibson's 11 = 7.3. Based on league average, this equates to 38.9 bases. 118.7 minus 38.9 = 80.7.
* Errors: League average over 304.67 IP (16.1) minus Gibson's team's 13 = 3.1. Based on league average, this equates to 6.1 bases saved.80.7 + 6.1 = 86.8 bases.
* Park Factor/Qual of Opp. Bats (PFOPP): I have calculated this to .889, which is quite low. (1.000 is avg.)
86.8 * .889 = 77.1. Had Gibson faced more average conditions, he would have given up more bases, reducing his base figure.
---------------------------------------------------------
What was the league average bases per run?
Let's move to Jim Palmer-1973, and we strongly suspect his defense helped significantly:
The Details:
296.33 IP (889 outs)
Bases given up via:
1) Non-HR hits: 436 (on 208 hits)
2) Non-Strikeout outs: 18 (on 718 outs -- 889 minus 158 strikeouts, unsure why 13 outs are missing)
3) Strikeouts: # of bases is negligible (on 158 strikeouts)
4) Walks/HBP: 150 (on 116 BB/HB)
5) Home Runs: 73 (on 16 HR)
6) Errors: 13 (on 11 errors)
American League-1973: 17,397 IP (52,191 outs)
1) Non-HR hits: 34,026 (on 15,641 hits)
2) Non-Strikeout outs and other events: 3104 (on 42,072 outs and other events -- roughly 52,191 minus 9851 strikeouts)
3) Strikeouts: # of bases is negligible (on 9851 strikeouts)
4) Walks/HBP: 9405 (on 7044 BB/HB)
5) Home Runs: 8450 (on 1552 HR)
6) Errors: 2006 (on 975 errors)
Pro-rated for Palmer's 296.33 IP (an average league pitcher with Palmer's IP):
1) Non-HR hits: 579.6 bases (on 266.4 hits)
2) Non-Strikeout outs: 52.9 bases (on 716.6 outs)
3) Strikeouts: n/a
4) Walks/HBP: 120.0 walks/HBP
5) Home Runs: 26.4 home runs
6) Errors: 16.6 errors
-----------------------------------------------------------------
The Process:
a) The league average pitchers gave up 632.5 bases (579.6 + 52.9) on non-HR hits and non-KO outs.
b) Palmer gave up 454 (436 + 18)
The league average pitcher's 632.5 is based upon 983.0 events (716.6 non-KO outs + 266.4 non-HR hits).
Palmer's is based upon 926 events (718 non-KO outs + 208 non-HR hits).
We need to adjust to bring Palmer's 926 up to 983.0, by adding 57.0 events. How many non-KO outs and non-HR hits do we add?
* Based upon Palmer's breakdown of the two, we add:
- 12.8 non-HR hits (equating to 26.9 bases by league average)
- 44.2 non-KO outs (equating to 1.1 bases by league average)
Then we add 26.9 and 1.1 to Palmer's 454 to get 482.
c) 632.5 (league average) minus 482 = 150.5. This is his Base Figure. Then, we make some adjustments based on misc. factors.
* BB/HBP: League average over 296.33 IP (120.0) minus Palmer's 116 = 4.0. Based on league average, this equates to 5.3 bases. 150.5 minus 5.3 = 145.2.
* HRs: League average over 296.33 IP (26.4) minus Palmer's 16 = 10.4. Based on league average, this equates to 47.6 bases. 145.2
minus 47.6 = 97.5.
* Errors: League average over 296.33 IP (16.6) minus Palmer's team's 11 = 5.6. Based on league average, this equates to 6.6 bases saved. 97.5 + 6.6 = 104.2 bases.
* Park Factor/Qual of Opp. Bats (PFOPP): I have calculated this to 1.026. (1.000 is avg.)
104.2 * 1.026 = 106.9. Had Palmer faced more average conditions, he would have given up fewer bases, increasing his base figure.
---------------------------------------------------------
What is the league average bases per run?
= 56,991 divided by 8314 runs = 6.9. Palmer's 106.9 bases divided by 6.9 = 15.6 runs saved by the defense.
We add the 15.8 (as earned runs) to his actual and it moves his ERA from 2.40 to 2.87. (His FIP was 3.38.)
__________________________________________________
This method can approximate FIP (Fielding Independent Pitching) but it's not exactly the same.
Note: After this documentation was written, additionl data was loaded and minor changes were made.
As of the most recent processing of this method, below is a comparison of it versus WAR 2.0.:
Defensive runs saved / Defensive support:
Bob Gibson-1968: 12.5 runs; WAR 2.0: 11
Sandy Koufax-1966: 5.6; WAR 2.0: 5
Jim Palmer-1973: SP: 17.9; WAR 2.0: 18
Nolan Ryan-1973: SP: 1.7 WAR 2.0: -4
Pedro Martinez-1999: 5.7; WAR 2.0: -3
Pedro Martinez-2000: 11.7; WAR 2.0: 9
Walter Johnson-1912: 23.0; WAR 2.0: 22
Using this process for the ~12,000 seasons since 1890**, the range of runs saved or lost per season is roughly +/- 25. The total deviation from zero of all ~12,000 seasons is just ~8000 runs.
** Where play-by-play data is not available (1890-1909), two approaches were taken to emulate the above.:
1901-1909: Game totals (hits, walks, HRs, etc.) were used to guesstimate the number of bases needed to produce the
runs scored by a team. This was done by using a categorized matrix of details from known games from 1910-1919. The
totals were then pro-rated to starting pitchers based on their career IP per start.
For 1890-1900, the game total details are not even available from Retrosheet. Pitchers ERAs were used to guesstimate
the number of bases allowed, using similarly high scoring games from the lively ball era (1920-39) that best matched.
It is assumed all pitchers threw complete games, as was generally the case.
PARK FACTOR
1) For every team season, the park factor is determined by simply dividing the total runs scored in a given park (by the team and its opponents) by the total runs scored in away games (by the team and its opponents).
2) In some processes, it is assumed the starting pitcher started exactly half his games at home and half on the road. Not here. Each starter game entails a different set of circumstances. Thus, the average PF a starter has experienced in a year is based upon the average of each of his individual starts, wherever they were. Where IP vs. each opponent is available, the PF is weighed accordingly.
OPP FACTOR
1) For every team season, the offensive ability for the starter's opponents to score runs (relative to the teams' own park factors) is determined in order to determine the approximate strength of the opposing bats. Like park factor, this can also affect a starter's ERA.
2) Park Factor and OPP Factor are merged into a 'PFOPP' metric. This allows for gauging how difficult it was in each game for the starter to keep earned runs by the opponents off the scoreboard.
The Final ERA SCORE is calculated by applying the IP Facter, the Park Factor, and the OPP Factor to his actual ERA, then comparing it to his League ERA (with the starter's team factored out). Finally, using the Pythagorean Theorem (PT), this adjusted ERA is converted to a winning percentage (Adjusted ERA relative to League ERA). The score is then standardized, representing the number of Standard Deviations he was over an average starter in the field of nearly 12,000 starter seasons.
W% SCORE
DECISION FACTOR
Winning % is significantly more difficult to score because of the many variables involved in how pitchers win and lose games. This approach attempts to correlate W% to ERA as strongly as possible, but also to provide a means of comparison across the ~150 years of MLB history.
Of course, including the factors below, the #1 thing that can affect W% is ERA (runs given up, actually), and this plays a major factor in this analysis.
1) Generally, throughout MLB history, all pitchers obtain 1 decision for every 9 innings they throw. Where possible, only each starter's W-L as a starter will be the starting point. 75-100 years ago, it was quite normal for starter's to be used as relievers. In many cases, a pitcher would have more decisions than starts. In recent times, because starters generally no longer throw 8 or 9 innings, many no-decisions result.
2) For starters where the W-L (as starter) is not easily available (pre-1901), their actual W-L is adjusted downward to equal their number of starts, using the Pythagorean Theorem (using the team's runs scored and runs given up in the games the starter started).
AVERAGE FACTOR
1) The next step is to calculate the starter's W-L versus what an average pitcher might have done, if the average pitcher had gotten the same run support as the starter. Using the PT, the number of runs an average starter might have given up (adjusted for park factors, since the starter himself experienced these park factors) is weighed against the runs the team scored -- to derive an average pitcher's W%.
2) To calculate a starter's expected W%, 3 steps are involved:
a) So many things affect a starter's W-L record. Nevertheless, his actual W-L (as starter) is still important because it provides a metric that entails actual game by game situational conditions that go beyond any expectations that math provides. Since an average pitcher starts at 0-0, it is determined how many games over .500 each starter was, using his actual W-L.
For example, Steve Carlton was 27-10 in 1972 and starts at +17.
This number is then modified, upward or downward, based upon the relative (to league) run support he got. Carlton lands at 17.7 wins above average, leaving 19.3 decisions (37 - 17.7). We use the PT* to determine that in those 19.3 decisions, he 'should have' won 13.8. An average NL pitcher with Carlton's run support might have won 9.5 of these.
The difference is 13.8 - 9.5 = 4.3.
* When possible, the Pythagorean Theorem uses the runs scored for the starter while he was in the game and runs attributed against the starter.
b) We add that difference of 4.3 to Carlton's 17.7 and get 22.1.
If this absolute value for expected games over (or under) is lower than the pitcher's actual absolute value of W-L, we take that difference through another step of pitcher's expected Wins minus an average pitcher's expected wins. Carlton's 22.1 is higher than 17 and he remains at 22.1 wins over .500. This is the sum score.
c) Then, an efficiency value (based on the usage required to reach 22.1) is calculated.:
We take the 22.1 and multiply it by 55 / 37 decisions for Carlton. (Since no one in this study has 55 decisions, using this number allows us to normalize starters across eras.)
We then divide this number by the value required to place the efficiency score on the same scale as the sum score, roughly 1.78.
(22.1 * (55/37)) / 1.78 = 18.4.
The average of the sum score 22.1 and efficiency score 18.4 (times the strength of schedule, OPPWL of .528 and divided by .500) is ~21.4 games over .500. 81 is added for at an expected W-L of ~102-60 or .632.
Note: When comparing the starter's expected W-L with an average starter's, the PFOPP is applied to the average league starter, since the starter himself experienced this.
This method does mitigate pitchers whose W-L was overly inflated due to very high
run support. For example, Whitey Ford-1961 starts at +21 (25-4), but ends at +15.
OPPWL FACTOR
1) To help detemine a starter's expected W-L, the relative strength of the opponents he faced is determined. This certainly contributes to any starter's W%. For every team and starter he faced in a given year, the actual W-L record of the opponents (only in the games that each opposing starter started) is calculated.
** For example, Billy Pierce went 'only' 15-10 in 1955 even though his ERA was quite low. In looking at the teams and starters he faced that year, we find that -- with those starters -- those teams won games at a rate of .548, much higher than average.
In his 26 starts, the opposing starters he faced, and the W-L records of the teams when those opposing starters started were:
Early Wynn (x3)......57-36 {In games Wynn started, the Indians were 19-12. Pierce faced Wynn 3 times.}
Whitey Ford (x2).....48-18
Jim Wilson (x2).......26-36
Bob Porterfield (2)...22-32
Steve Gromek (2)....30-20
Bob Turley..............20-14
Herb Score..............19-13
Ned Garver..............13-19
Bob Lemon..............20-11
Mike Garcia............20-11
Willard Nixon.........17-14
Eddie Lopat..............8-11
Bobby Shantz...........7-10
George Susce.............7-8
Bill Wight..................6-8
Bob Feller..................5-6
Mel Parnell................6-3
George Zuvernick.....3-3
Rip Coleman.............3-3
Glenn Cox................0-2
-------------------------------
Total.................337-278 (.548)
Note: Pursuant to the example above, the process was revised to allot all no-decisions by starters
as 1/2 win and 1/2 loss, rather than use the team's W-L in the game. This is more indicative of
modern starters who have many more no-decisions than starters in the past, with their teams relying
more heavily on relievers for their decision.
The OPPWL is then calculated a second way: The median W% of the 26 starts is determined. (This helps alleviate skewing due to the vast differences in the number of starts by all opposing starters.) In the case of Pierce in 1955, the median is .522. The average of .548 and .522 is about .535 and this is the OPPWL used.
2) The final W% Score, like the ERA Score, is then standardized. The two are are on a similar (but not identical) scale.
The STDEVs for ERA and W% are added to derive the starter's overall effectiveness that year. The two values correlate at 82%. Since that is so high, why use both metrics? Because they are two distinct views at the same data and complement each other.
PLAYOFFS
As of Version 5 (2024) all playoff games have been added. They count no more than a regular season game.
In the ERA part of the analysis, the 'starting ERA' (after adjusted for defense, park factors, etc.)
is modified by the literal IP and earned runs given up by the starter in the post season.
In the W-L part of the analysis, the expected wins (RF versus RA) of the starter in the post season, over
average, is added to the expected win total during the regular season.
QUALIFICATIONS
1) The original qualifications were to include any pitcher (starter or reliever) who had 1 IP / team game. But, since much of the available data for this study relates solely to starters, this analysis is limited to starters (who may also have relieved).
2) If a pitcher failed to meet the 1 IP/game requirement then a second qualification may apply: IF that pitcher had enough decisions to compensate for the shortfall of IP (assuming 1 decision = 9 IP), then they also qualify. (i.e. In a 162-game schedule, an expected number of decisions is 162/9 = 18. If a pitcher had 19 decisions, then the extra '9 IP' could be applied to his actual innings.) This exception will not apply to the 60-game 2020 season.
3) Starters' IP per game has been drastically falling since 2000. It has become the new normal. As of Version 5b, all seasons since 1900 of at least 15 starts and at least 100 IP as starter were included. Although these seasons will count toward a starter's career score, they will not count among the best seasons anywhere else. For 2020, the standard 1 IP per team decision is used.
4) The addition of playoff games could also cause a change in qualifications.
IP for a pitcher and team decisions in the playoffs will be added in.
--------------------------------------------------------
Version 1 (2020).
Version 2 (2021): Added to the database the entire careers of NL HOF pitchers, even if some seasons go back to the 1890s. They are: Cy Young, Kid Nichols, Amos Rusie, Joe McGinnity, Vic Willis, and Jack Chesbro. Although, various data from this era is not available and needs to be estimated.
Also added were 5 of the best NL seasons of the 1890s: Billy Rhines (1890), Bill Hoffer (1895), Al Maul (1898), Clark Griffith (1898), and Jay Hughes (1899). All other seasons from the 1890s have been omitted.
Version 3 (2022): Updated to include the 2020 and 2021 seasons. Also enhanced the Win% part of the study and refined the way multi-team players were being considered.
Version 4a (2023): Updated to include the 2022 season.
Added more noteworthy seasons from the 1890s, including those which complete the careers of several noteworthy starters who pitched primarily after 1900.
Added 157 single and multi-team seasons to the analysis, after expanding the IP requirement to factor in the reduced usage of starters in recent times.
Also added the Defense Factor.
Version 4b (2023): Loaded all available event (play-by-play) records since 1913 in order to more precisely capture what starters and teams did when the starter was in the game, rather than pro-rating numbers out to the entire game. That is, use data by IP rather than by starts.
Version 5 (2024): Added 2023 and all playoff games since 1903 to the analysis.
Corrected data and minor bugs.
Slightly altered part of the OPPWL metric definition .
Version 5b (2024): Added ~1900 previously non-qualifying seasons
in order to include more of starters' careers. The seasons will
not be shown among the Top 150 seasons but will count toward
career scores. The guideline was at least 15 starts in the
regular season and at least 100 IP (as a starter).
Version 6 (2025): Greatly modified and enhanced the defense
piece to extensively use play-by-play data to derive bases
and bases advanced saved or lost.
Added Retrosheet seasons 1912 and 2024.
Version 7 (2026): Added IP per start as an additional means
to gaude pitcher usage and work load.
Added Retrosheet seasons 1910, 1911, and 2025.
Revised and ehanced web site, turning it from just web pages
to a more contemporary experience, accessing a database and
including a 'Contact Me'. Moved it to statxmanx.com
Named the final metric ROP (Ranks Over Peers).
1) Dead Ball Era (1900-1919); indicated by very few runs, dominance by 6 teams, and a low competitiveness among players (more easily dominated by a few)
2) Lively Ball Era / pre-Integration (1920-46); indicated by many runs, dominance by few teams, but more competitiveness among players
3) Integration/Expansion Era (1947-69); indicated by gradual inclusion of Black/Latin players, gradually diminishing team and player dominance, 50% additional teams
4) Balanced Era (1970-1993); indicated by the onset of the DH, very competitive leagues, players, and teams, but gradually decreasing Black players and increasing Latins/Asians
5) Slugging Era (1994-2015); indicated by moderate competitiveness, start of PEDs, increased SLG%, Pct. of Blacks goes (and remains) below 15%, and widespread changes in how starters are used
6) Home Run Era (2016-); indicated by dominance of few teams, spikes in home runs and strikeouts, and vastly reduced starter innings pitched