This problem has been considered before, and I not aim to shed much new light on it, except to show a couple of graphs based on recent data.
To establish the difference between starter and reliever performance, there is a long, comprehensive article from Steve Treder on THT that often gets cited. It looks at starter/reliever differences throughout baseball history, and concludes that the difference between starter and reliever performance is consistently about 8% of ERA (about 0.3-0.4 on the ERA scale). However that is not the question I need to answer. Instead, I want to compare a group of pitchers' collective performances, and adjust for the groups' tendencies to be throw innings as starters or as relievers. Since starters are, collectively, better pitchers than relievers, the adjustment has to be larger than 8%.
A more useful study is Tom Tango's work on his blog. He looks at the same pitchers as starters and relievers, making several important adjustments. His conclusion is that pitchers have 17% more strikeouts, and 17% fewer home runs as relievers, as they do as starters. They also have a 17% better BABIP. He concludes that the same pitchers are "about a 1 run per 9IP" RA better as relievers. I did something simpler (although less thorough), and came up with much of the same conclusions.
Consider the graph below. I mapped IP to FIP by bucketing real pitcher seasons by IP (2005-2009 data). The graph plots the median FIP for each bucket, along with the median start percentage (% of innings thrown by individual pitchers in starting roles) for each bucket.
By looking at the median, rather than the average, it is easy to see where the transition from starters to relievers really takes place (as measured by IP). This is like chemistry class. As we move up the IP scale, the average pitcher has higher energy. However he needs to overcome a state change to move from full time reliever to full time starter. The 60-120 IP buckets find him in a state of transition. Supposing that these pitchers are all of roughly the same ability, and their FIP is different only due to role changes, I use the data to fit the following function (where start % is on a 0-1 scale):
trFIP = a + (start %) * b
Here "trFIP" represents a "translated FIP," which I assume to be constant, on average, throughout the transition. The best fit is for a = 3.92 and b = 0.81. Therefore for any pitcher:
trFIP = FIP + 0.81 (1 - start %)
Now FIP is meant to reflect the pitcher's skills at striking out hitters and at (not) giving up walks, but it also predicts ERA very well (at least for FIPs near the league average). The translated trFIP does not have this property. As Tom Tango showed, BABIP changes along with SO9 and HR9, so ERA increases more than FIP increases as relievers become starters. A similar exercise for ERA led me to this formula:
trERA = ERA + 1.19 (1 - start %)
However, trERA is not very useful in comparing talent levels between groups of pitchers. I can explain why, but let's get back to comparing groups of pitchers...
Back to trFIP, here is the same graph as before, but also with trFIP buckets by IP:
The graph between 60IP and 120IP is not quite flat. This is not surprising, as I am taking the median of individual trFIP values within the buckets. Some pitchers get the full 0.81 penalty, but not all do. I am taking the median of a combined distribution.
My study is not as thorough as Tom Tango's, but I like my results. My median trFIP buckets imply that your typical 60-80 IP pitcher from 2005-2009 is a little bit better than a typical 100-120 IP pitcher from the same time period. I think this is actually true. A 60-80 IP pitcher is very likely to be a high-IP reliever, such as a closer or setup man. He could also be a swing man/long man, but bear with me. A 100-120 IP pitcher is likely a back-end starter or also a swing man/long man who ended up filling in for an injured regular. Although the former is likely a failed starter, he must have found success in the bullpen to get such heavy use. At the very least, it is plausible to suggest that the top-end relievers are better overall pitchers than low-end starters, if only slightly.
The again, I might be reading too much into a small matter. Going forward, I will use trFIP as a simple measure of overall performance, which is not biased by changes in starter/reliever usage. It creates a useful benchmark, but is not meat to as an absolute definition of differences between starters and relievers.
Here is the same graph of average performance by pitcher type, but using trFIP instead of FIP:
The type 1 (cutter throwers) pitchers are still collectively the best of the groups, but the difference between type 2 (slider throwers with secondary pitches) and type 6 (slider throwers without secondary pitches) is eliminated by the differences in the trFIP adjustment.
Also trFIP shows type 0 pitchers and type 3 pitchers to be largely ineffective. Type 0 pitchers are those whose core pitch is a change-up, but who also throw other pitches, often sliders (as opposed to type 5 pitchers who throw change-ups, but with curveballs as secondary offerings). Type 3 pitchers throw both a slider and a curve, or possibly a slurve.
Type 3 pitchers are similar to both the type 2 slider-throwers and the type 4 curveball-throwers, but are more likely to become type 2 pitchers. Average trFIP seems to suggest that moving from type 3 to type 2 is not a bad move. This may just be confirmation bias, but the trFIP graph seems to suggest that throwing a slurve is, indeed, not an effective way to pitch. The fact that only 4% of pitchers feature both a slider and a curveball (but 70% featured one or the other) might also suggest the same. To be fair, successful type 3 pitchers Chris Carpenter and Adam Wainwright throw both a hard slider, and a slow curve. They are not throwing slurves. But most pitchers find more success with one pitch or the other, it seems.
Now that I have a way of comparing pitchers without concern for starter/reliever issues, I will look into what happens when pitchers switch types. Expect more charts, and maybe a confusion matrix.