You have repeatedly made up backstories describing scenarios where one pitcher with a better w-l record, despite having a higher ERA, could reasonably be called the better pitcher. These backstories involve their performance in 'high-leverage' scenarios, and you claim that this difference is reflected in their w-l record.
However, you seem unaware, or simply don't care, that we could just as easily reverse the backstories, so that the guy with the better ERA was better in the high-leverage scenarios. You don't give much weight to all the factors that could still result in the better guy in high-leverage cases getting a L/ND, such as a terrible bullpen, anemic offense, etc.
Whether you realize it or not, your scenarios effectively state that a reasonable person can guess that the guy with a better w-l record was better "when it mattered." The fact that w-l does not show strong correlation to this seems completely lost on you. Did a guy with a good w-l record pitch well when it mattered? Maybe. You don't know. He might have just been a guy who could go 5 innings, and had a good offense to back him up. Does a poor w-l record mean a guy sucked 'when it mattered?' Maybe. You don't know. He may have actually been one of the best pitchers in the league (in terms of preventing runs, which is his job), but had a terrible offense backing him up.
Does that sound like a particularly useful for measuring performance to you? Why would you want to guess when you look at so many other stats that simply show how many runs he gave up? Why would you use w-l to guess how he did, when w-l has a much stronger correlation to run support than actual pitching performance (preventing runs)?