*** The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711. ***


Contact Me






OVERVIEW






The information used here was obtained free of
charge from and is copyrighted by Retrosheet. Interested
parties may contact Retrosheet at 20 Sunset Rd.,
Newark, DE 19711.


Without game logs and play-by-plays, conveniently provided by Retrosheet,
a study this intricate would not have been easily possible.
Retrosheet's comprehensive site has been an invaluable source of historical data.
-----------------------------------------------------------------
-----------------------------------------------------------------

The two fundamental things starting (or any) pitcher strives to do are 1) to
prevent opponent runs from scoring and 2) to win games (and not lose them).

The process strives to score each of these categories for every qualifying starting
pitcher season since 1890, express each as a 'winning percentage' and standardize
them. Finally, the two scores are simply added to derive a final score. All
scores are then ranked.

In order to complete an in-depth analysis of this magnitude, thanks to
Retrosheet*, the final scores for over 200,000 MLB games since 1890 were loaded
into a relational database. In addition, each year's relavent statistics (from a
varity of sources) for each pitcher were loaded into the database in order to
perform comparison analysis. Finally, over 16 million event (play-by-play)
records from Retrosheet were loaded into the relational database.

Many processes have been created to rate starting pitchers. This one is somewhat
unique in that it is not solely based upon a pitcher's relative value to his team
or league. Rather, it's foundation is to rate the 'achievement value' of his
numbers, considering all of the things that affect the level of difficulty.

For example, generally everyone agrees the it is much better in 2019 (all things
being equal) to have a pitcher throw 250 innings with an ERA of 3.00 than it is to
have one who throws 200 innings with the same ERA. But, it's the same ERA. The
250 IP starter is much more valued but the achievement is essentially the same,
with a modest advantage for the 250 inning pitcher.

Likewise, a winning percentage on an average team of 16-8 is better than 22-12
(again, all other things being equal). A team wants the second pitcher more, but
the first has achieved slightly more because he won a higher percentage of his games.

Below is a brief, step by step explantion of the process.


THE PROCESS



DEFENSE FACTOR



Determining to what extent fielders may have helped or hurt a starter's E.R.A. is the most complex
piece to this study. This is a separate issue than the difference between earned and unearned runs.
This analysis focuses only on earned runs. Still, the range and other abilities of fielders certainly affects
opponents' ability to score runs. The actual E.R.A. is revised based on these things.

First up: The foundation of what I do is count bases, any way a base can be obtained, by going through the play-by-plays from
Retrosheet. Example 1: Runners on 1st and 2nd = 3 (1 + 2). After the play (an out), runners on 2nd and 3rd (2 + 3 = 5). Two bases
have been gained. Had the batter walked, it would have been 3 bases. Example 2: Runner on 1st (1). Double play = 1 base lost.
Fielder's choice, runner at first is replaced? 0 bases.
__________________________________________________ ____________________
Next, I sum and break down these bases gained (or lost) by 6 categories of events (5 really.
For the most part, bases are not gained following
strikeout events):

1) Non-HR hits.
2) Non-Strikeout outs (balls in play)
3) Strikeouts
4) Walks/HBP
5) Home Runs
6) Errors

All other events are included in the category where they best fit. Ex. A stolen base, in effect, is the stretching
of a walk or single into a double. It is placed into Non-HR hits as 2 bases. Pickoffs and caught steals are placed into
Non-Strikeout out, etc. All major events are categorized in one of the above to make the process less convoluted, but
still a good approximation.
-------------------------
The Details (an example). Note: After this process was written, additional data became available and the process
was slightly tweaked. The numbers may vary but the concept is the same.

Bob Gibson-1968: 304.67 IP (914 outs)

Bases given up via:

1) Non-HR hits: 345 (on 187 hits)
2) Non-Strikeout outs: 19 (on 636 outs -- 914 minus 268 strikeouts)
3) Strikeouts: # of bases is negligible (on 268 strikeouts)
4) Walks/HBP: 81 (on 69 BB/HB)
5) Home Runs: 57 (on 11 HR)
6) Errors: 26 (on 13 errors)

National League-1968: 14,681 IP (44,043 outs) Note: A few play-by-plays were not available
when this was written. Below numbers vary a little. Plus, the park factor (PFOPP) number may vary.:

1) Non-HR hits: 25,559 (on 12,187 hits)
2) Non-Strikeout outs and other events: 2611 (on 33,668 outs and other events -- roughly 44,043 minus 9338 strikeouts)
3) Strikeouts: # of bases is negligible (on 9338 strikeouts)
4) Walks/HBP: 5911 (on 4543 BB/HB)
5) Home Runs: 4682 (on 884 HR)
6) Errors: 1565 (on 774 errors)

Pro-rated for Gibson's 304.67 IP (an average league pitcher with Gibson's IP):

1) Non-HR hits: 530.4 bases (on 252.9 hits)
2) Non-Strikeout outs: 54.2 bases (on 698.7 outs)
3) Strikeouts: negligible
4) Walks/HBP: 94.3 walks/HBP
5) Home Runs: 18.3 home runs
6) Errors: 16.1 errors
-----------------------------------------------------------------
The Process:

a) The league average pitchers gave up 584.6 bases (530.4 + 54.2) on non-HR hits and non-KO outs.
b) Gibson gave up 364 (345 + 19)

The league average pitcher's 584.6 is based upon 951.6 events (698.7 non-KO outs + 252.9 non-HR hits).
Gibson's is based upon 823 events (636 non-KO outs + 187 non-HR hits).

We need to adjust to bring Gibson's 823 up to 951.6, by adding 128.6 events. How many non-KO outs and non-HR hits do we add?
* Based upon Gibson's breakdown of the two, we add:
- 29.2 non-HR hits (equating to 61.3 bases by league average)
- 99.4 non-KO outs (equating to 7.7 bases by league average)

Then we add 61.3 and 7.7 to Gibson's 364 to get 433. This standardizes Gibson's events to make it comparable to league average.

c) 584.6 (league average) minus 433 = 151.6. This is his Base Figure (get it). Then, we make some adjustments based on misc. factors.:

* BB/HBP: League average over 304.67 IP (94.3) minus Gibson's 69 = 15.3. Based on league average, this equates to 32.9 bases. 151.6
minus 32.9 = 118.7.

* HRs: League average over 304.67 IP (18.3) minus Gibson's 11 = 7.3. Based on league average, this equates to 38.9 bases. 118.7 minus
38.9 = 80.7.

* Errors: League average over 304.67 IP (16.1) minus Gibson's team's 13 = 3.1. Based on league average, this equates to 6.1 bases
saved.80.7 + 6.1 = 86.8 bases.

* Park Factor/Qual of Opp. Bats (PFOPP): I have calculated this to .889, which is quite low. (1.000 is avg.)
86.8 * .889 = 77.1. Had Gibson faced more average conditions, he would have given up more bases, reducing his base figure.
---------------------------------------------------------
What was the league average bases per run?

25,559 (non-HR hits) + 5911 (BB/HBP) + 2611 (non-KO outs) + 4682 (HRs) + 1565 (errors)

= 40,328 divided by 5459 runs = 7.4. Gibson's 77.1 bases divided by 7.4 = 10.4 runs saved by the defense.

We add the 10.4 (as earned runs) to his actual and it moves his ERA from 1.12 to 1.43. (His FIP was 1.77.)

__________________________________________________ _____________

Let's move to Jim Palmer-1973, and we strongly suspect his defense helped significantly:

The Details:

296.33 IP (889 outs)

Bases given up via:

1) Non-HR hits: 436 (on 208 hits)
2) Non-Strikeout outs: 18 (on 718 outs -- 889 minus 158 strikeouts, unsure why 13 outs are missing)
3) Strikeouts: # of bases is negligible (on 158 strikeouts)
4) Walks/HBP: 150 (on 116 BB/HB)
5) Home Runs: 73 (on 16 HR)
6) Errors: 13 (on 11 errors)

American League-1973: 17,397 IP (52,191 outs)

1) Non-HR hits: 34,026 (on 15,641 hits)
2) Non-Strikeout outs and other events: 3104 (on 42,072 outs and other events -- roughly 52,191 minus 9851 strikeouts)
3) Strikeouts: # of bases is negligible (on 9851 strikeouts)
4) Walks/HBP: 9405 (on 7044 BB/HB)
5) Home Runs: 8450 (on 1552 HR)
6) Errors: 2006 (on 975 errors)

Pro-rated for Palmer's 296.33 IP (an average league pitcher with Palmer's IP):

1) Non-HR hits: 579.6 bases (on 266.4 hits)
2) Non-Strikeout outs: 52.9 bases (on 716.6 outs)
3) Strikeouts: n/a
4) Walks/HBP: 120.0 walks/HBP
5) Home Runs: 26.4 home runs
6) Errors: 16.6 errors
-----------------------------------------------------------------
The Process:

a) The league average pitchers gave up 632.5 bases (579.6 + 52.9) on non-HR hits and non-KO outs.
b) Palmer gave up 454 (436 + 18)

The league average pitcher's 632.5 is based upon 983.0 events (716.6 non-KO outs + 266.4 non-HR hits).
Palmer's is based upon 926 events (718 non-KO outs + 208 non-HR hits).

We need to adjust to bring Palmer's 926 up to 983.0, by adding 57.0 events. How many non-KO outs and non-HR hits do we add?
* Based upon Palmer's breakdown of the two, we add:
- 12.8 non-HR hits (equating to 26.9 bases by league average)
- 44.2 non-KO outs (equating to 1.1 bases by league average)

Then we add 26.9 and 1.1 to Palmer's 454 to get 482.

c) 632.5 (league average) minus 482 = 150.5. This is his Base Figure. Then, we make some adjustments based on misc. factors.

* BB/HBP: League average over 296.33 IP (120.0) minus Palmer's 116 = 4.0. Based on league average, this equates to 5.3 bases. 150.5
minus 5.3 = 145.2.

* HRs: League average over 296.33 IP (26.4) minus Palmer's 16 = 10.4. Based on league average, this equates to 47.6 bases. 145.2
minus 47.6 = 97.5.

* Errors: League average over 296.33 IP (16.6) minus Palmer's team's 11 = 5.6. Based on league average, this equates to 6.6 bases
saved. 97.5 + 6.6 = 104.2 bases.

* Park Factor/Qual of Opp. Bats (PFOPP): I have calculated this to 1.026. (1.000 is avg.)
104.2 * 1.026 = 106.9. Had Palmer faced more average conditions, he would have given up fewer bases, increasing his base figure.
---------------------------------------------------------
What is the league average bases per run?

34,026 (non-HR hits) + 7044 (BB/HBP) + 3104 (non-KO outs) + 8450 (HRs) + 2006 (errors)

= 56,991 divided by 8314 runs = 6.9. Palmer's 106.9 bases divided by 6.9 = 15.6 runs saved by the defense.

We add the 15.8 (as earned runs) to his actual and it moves his ERA from 2.40 to 2.87. (His FIP was 3.38.)

__________________________________________________
This method can approximate FIP (Fielding Independent Pitching) but it's not exactly the same.

Note: After this documentation was written, additionl data was loaded and minor changes were made.
As of the most recent processing of this method, below is a comparison of it versus WAR 2.0.:

Defensive runs saved / Defensive support:

Bob Gibson-1968: 12.5 runs; WAR 2.0: 11
Sandy Koufax-1966: 5.6; WAR 2.0: 5
Jim Palmer-1973: SP: 17.9; WAR 2.0: 18
Nolan Ryan-1973: SP: 1.7 WAR 2.0: -4
Pedro Martinez-1999: 5.7; WAR 2.0: -3
Pedro Martinez-2000: 11.7; WAR 2.0: 9
Walter Johnson-1912: 23.0; WAR 2.0: 22


Using this process for the ~12,000 seasons since 1890**,
the range of runs saved or lost per season is roughly +/- 25.
The total deviation from zero of all ~12,000 seasons is just ~8000 runs.




** Where play-by-play data is not available (1890-1909), two approaches were taken to emulate the above.:

1901-1909: Game totals (hits, walks, HRs, etc.) were used to guesstimate the number of bases needed to produce the
runs scored by a team. This was done by using a categorized matrix of details from known games from 1910-1919. The
totals were then pro-rated to starting pitchers based on their career IP per start.

For 1890-1900, the game total details are not even available from Retrosheet. Pitchers ERAs were used to guesstimate
the number of bases allowed, using similarly high scoring games from the lively ball era (1920-39) that best matched.
It is assumed all pitchers threw complete games, as was generally the case.

PARK FACTOR


1) For every team season, the park factor is determined by simply dividing the
total runs scored in a given park (by the team and its opponents) by the total
runs scored in away games (by the team and its opponents).

2) In some processes, it is assumed the starting pitcher started exactly half his
games at home and half on the road. Not here. Each starter game entails a different
set of circumstances. Thus, the average PF a starter has experienced in a year is
based upon the average of each of his individual starts, wherever they were.
Where IP vs. each opponent is available, the PF is weighed accordingly.


OPP FACTOR


1) For every team season, the offensive ability for the starter's opponents to
score runs (relative to the teams' own park factors) is determined in order to
determine the approximate strength of the opposing bats. Like park factor, this can
also affect a starter's ERA.

2) Park Factor and OPP Factor are merged into a 'PFOPP' metric. This allows for
gauging how difficult it was in each game for the starter to keep earned runs by
the opponents off the scoreboard.


The Final ERA SCORE is calculated by applying the IP Facter, the Park Factor, and
the OPP Factor to his actual ERA, then comparing it to his League ERA (with the starter's team
factored out). Finally, using the Pythagorean Theorem (PT), this adjusted ERA is converted to a
winning percentage (Adjusted ERA relative to League ERA). The score is then standardized,
representing the number of Standard Deviations he was over an average starter in
the field of nearly 12,000 starter seasons.


W% SCORE


DECISION FACTOR


Winning % is significantly more difficult to score because of the many variables
involved in how pitchers win and lose games. This approach attempts to correlate
W% to ERA as strongly as possible, but also to provide a means of comparison across
the ~150 years of MLB history.

Of course, including the factors below, the #1 thing that can affect W% is ERA
(runs given up, actually), and this plays a major factor in this analysis.

1) Generally, throughout MLB history, all pitchers obtain 1 decision for every 9
innings they throw.
Where possible, only each starter's W-L as a starter will be the starting point. 75-100 years ago, it was quite normal for
starter's to be used as relievers. In many cases, a pitcher would have more
decisions than starts. In recent times, because starters generally no longer throw
8 or 9 innings, many no-decisions result.

2) For starters where the W-L (as starter) is not easily available (pre-1901), their actual W-L is adjusted downward to equal their number
of starts, using the Pythagorean Theorem (using the team's runs scored and runs given up in the games the starter started).


AVERAGE FACTOR



1) The next step is to calculate the starter's W-L versus what an average pitcher
might have done, if the average pitcher had gotten the same run support as the
starter. Using the PT, the number of runs an average starter might have given up
(adjusted for park factors, since the starter himself experienced these park
factors) is weighed against the runs the team scored -- to derive an average
pitcher's W%.

2) To calculate a starter's expected W%, 3 steps are involved:

a) So many things affect a starter's W-L record. Nevertheless, his actual W-L (as starter) is
still important because it provides a metric that entails actual game by game
situational conditions that go beyond any expectations that math provides. Since an average
pitcher starts at 0-0, it is determined how many games over .500 each starter was,
using his actual W-L.

For example, Steve Carlton was 27-10 in 1972 and starts at +17.

This number is then modified, upward or downward, based upon the relative (to league)
run support he got. Carlton lands at 17.7 wins above average, leaving 19.3 decisions (37 - 17.7).
We use the PT* to determine that in those 19.3 decisions, he 'should have' won 13.8.
An average NL pitcher with Carlton's run support might have won 9.5 of these.
The difference is 13.8 - 9.5 = 4.3.

* When possible, the Pythagorean Theorem uses the runs scored for the starter
while he was in the game and runs attributed against the starter.

b) We add that difference of 4.3 to Carlton's 17.7 and get 22.1.

If this absolute value for expected games over (or under) is lower than the pitcher's
actual absolute value of W-L, we take that difference through another step of pitcher's
expected Wins minus an average pitcher's expected wins. Carlton's 22.1 is higher than
17 and he remains at 22.1 wins over .500. This is the sum score.

c) Then, an efficiency value (based on the usage required to reach 22.1) is calculated.:

We take the 22.1 and multiply it by 55 / 37 decisions for Carlton. (Since no one in
this study has 55 decisions, using this number allows us to normalize starters across eras.)

We then divide this number by the value required to place the efficiency score
on the same scale as the sum score, roughly 1.78.

(22.1 * (55/37)) / 1.78 = 18.4.

The average of the sum score 22.1 and efficiency score 18.4 (times the strength of schedule, OPPWL
of .528 and divided by .500) is ~21.4 games over .500. 81 is added for at an expected W-L of ~102-60 or .632.

Note: When comparing the starter's expected W-L with an average starter's, the PFOPP is applied to the
average league starter, since the starter himself experienced this.

This method does mitigate pitchers whose W-L was overly inflated due to very high
run support. For example, Whitey Ford-1961 starts at +21 (25-4), but ends at +15.


OPPWL FACTOR



1) To help detemine a starter's expected W-L, the relative strength of the
opponents he faced is determined. This certainly contributes to any starter's W%.
For every team and starter he faced in a given year, the actual W-L record of the
opponents (only in the games that each opposing starter started) is calculated.


** For example, Billy Pierce went 'only' 15-10 in 1955 even though his ERA
was quite low. In looking at the teams and starters he faced that year, we find
that -- with those starters -- those teams won games at a rate of .548, much higher
than average.

In his 26 starts, the opposing starters he faced, and the W-L records of the teams
when those opposing starters started were:

Early Wynn (x3)......57-36 {In games Wynn started, the Indians were 19-12. Pierce
faced Wynn 3 times.}
Whitey Ford (x2).....48-18
Jim Wilson (x2).......26-36
Bob Porterfield (2)...22-32
Steve Gromek (2)....30-20
Bob Turley..............20-14
Herb Score..............19-13
Ned Garver..............13-19
Bob Lemon..............20-11
Mike Garcia............20-11
Willard Nixon.........17-14
Eddie Lopat..............8-11
Bobby Shantz...........7-10
George Susce.............7-8
Bill Wight..................6-8
Bob Feller..................5-6
Mel Parnell................6-3
George Zuvernick.....3-3
Rip Coleman.............3-3
Glenn Cox................0-2
-------------------------------
Total.................337-278 (.548)

Note: Pursuant to the example above, the process was revised to allot all no-decisions by starters
as 1/2 win and 1/2 loss, rather than use the team's W-L in the game. This is more indicative of
modern starters who have many more no-decisions than starters in the past, with their teams relying
more heavily on relievers for their decision.

The OPPWL is then calculated a second way: The median W% of the 26 starts is
determined. (This helps alleviate skewing due to the vast differences in the number
of starts by all opposing starters.) In the case of Pierce in 1955, the median is
.522. The average of .548 and .522 is about .535 and this is the OPPWL used.

2) The final W% Score, like the ERA Score, is then standardized. The two are
are on a similar (but not identical) scale.


The STDEVs for ERA and W% are added to derive the
starter's overall effectiveness that year.
The two values correlate at 82%. Since that is so high, why use both metrics?
Because they are two distinct views at the same data and complement each other.




PLAYOFFS


As of Version 5 (2024) all playoff games have been added. They count no more than a regular season game.
In the ERA part of the analysis, the 'starting ERA' (after adjusted for defense, park factors, etc.)
is modified by the literal IP and earned runs given up by the starter in the post season.

In the W-L part of the analysis, the expected wins (RF versus RA) of the starter in the post season, over
average, is added to the expected win total during the regular season.

QUALIFICATIONS



1) The original qualifications were to include any pitcher (starter or
reliever) who had 1 IP / team game. But, since much of the available data for this
study relates solely to starters, this analysis is limited to starters (who may
also have relieved).

2) If a pitcher failed to meet the 1 IP/game requirement then a second
qualification may apply: IF that pitcher had enough decisions to compensate for the
shortfall of IP (assuming 1 decision = 9 IP), then they also qualify. (i.e. In a
162-game schedule, an expected number of decisions is 162/9 = 18. If a pitcher had
19 decisions, then the extra '9 IP' could be applied to his actual innings.)
This exception will not apply to the 60-game 2020 season.

3) Starters' IP per game has been drastically falling since 2000. It has become the
new normal. As of Version 5b, all seasons since 1900 of at least 15 starts and at least
100 IP as starter were included. Although these seasons will count toward a starter's
career score, they will not count among the best seasons anywhere else.
For 2020, the standard 1 IP per team decision is used.

4) The addition of playoff games could also cause a change in qualifications.
IP for a pitcher and team decisions in the playoffs will be added in.
--------------------------------------------------------

Version 1 (2020).

Version 2 (2021): Added to the database the entire careers of NL HOF pitchers, even
if some seasons go back to the 1890s. They are: Cy Young, Kid Nichols, Amos Rusie,
Joe McGinnity, Vic Willis, and Jack Chesbro. Although, various data from this era is not available and needs to be estimated.

Also added were 5 of the best NL seasons of the 1890s: Billy Rhines (1890), Bill
Hoffer (1895), Al Maul (1898), Clark Griffith (1898), and Jay Hughes (1899). All
other seasons from the 1890s have been omitted.

Version 3 (2022): Updated to include the 2020 and 2021 seasons.
Also enhanced the Win% part of the study and refined the way multi-team players
were being considered.

Version 4a (2023): Updated to include the 2022 season.
Added more noteworthy seasons from the 1890s, including those which complete the
careers of several noteworthy starters who pitched primarily after 1900.
Added 157 single and multi-team seasons to the analysis, after expanding the
IP requirement to factor in the reduced usage of starters in recent times.
Also added the Defense Factor.

Version 4b (2023): Loaded all available event (play-by-play) records since 1913
in order to more precisely capture what starters and teams did when the
starter was in the game, rather than pro-rating numbers out to the entire game.
That is, use data by IP rather than by starts.

Version 5 (2024): Added 2023 and all playoff games since 1903 to the analysis.
Corrected data and minor bugs.
Slightly altered part of the OPPWL metric definition .

Version 5b (2024): Added ~1900 previously non-qualifying seasons
in order to include more of starters' careers. The seasons will
not be shown among the Top 150 seasons but will count toward
career scores. The guideline was at least 15 starts in the
regular season and at least 100 IP (as a starter).

Version 6 (2025): Greatly modified and enhanced the defense
piece to extensively use play-by-play data to derive bases
and bases advanced saved or lost.
Added Retrosheet seasons 1912 and 2024.

Version 7 (2026): Added IP per start as an additional means
to gaude pitcher usage and work load.
Added Retrosheet seasons 1910, 1911, and 2025.
Revised and ehanced web site, turning it from just web pages
to a more contemporary experience, accessing a database and
including a 'Contact Me'. Moved it to statxmanx.com
Named the final metric ROP (Ranks Over Peers).

--------------------------------------------------------

DEFINITIONS of ERAS Since 1900



1) Dead Ball Era (1900-1919); indicated by very few runs, dominance by 6 teams,
and a low competitiveness among players (more easily dominated by a few)

2) Lively Ball Era / pre-Integration (1920-46); indicated by many runs, dominance
by few teams, but more competitiveness among players

3) Integration/Expansion Era (1947-69); indicated by gradual inclusion of
Black/Latin players, gradually diminishing team and player dominance, 50%
additional teams

4) Balanced Era (1970-1993); indicated by the onset of the DH, very competitive
leagues, players, and teams, but gradually decreasing Black players and increasing
Latins/Asians

5) Slugging Era (1994-2015); indicated by moderate competitiveness, start of PEDs,
increased SLG%, Pct. of Blacks goes (and remains) below 15%, and widespread changes
in how starters are used

6) Home Run Era (2016-); indicated by dominance of few teams, spikes in home runs
and strikeouts, and vastly reduced starter innings pitched








#bottom
contact form

Contact me

Simple HTML email form provided by: FreeContactForm.com