When you are saying "balanced" defense, I take that to mean 50/50 pass and rush defense. This means that half the time you are calling rush defense and half the time calling pass defense, so it's only going to be different than "all rush" defense about half the time. This is why it's also difficult to evaluate the results based on aggregate data alone. We really have to look at the individual results as well. It's nearly impossible to break down all the possible combination of effects within the engine to get strict comparisons of results for ratings and settings. Really at the end of the day, all any of us can do is evaluate the engine on feel, but my job is to make sure that everything is considered in that evaluation. There may be times where something looks way out of whack until it is pointed out that some other rating or setting also affects the result that wasn't considered. So when I question any of your observations, don't take that as "nah, you're wrong. everything is fine." but rather that I just have to make sure everything is considered and look a little closer at the results than "feel".
The biggest issue with 2.0 was how it factored ratings and settings into the results. The way it did it left no room to adjust the results based on ratings and settings. The 3.0 engine pulls apart that logic in to way more moving parts that allows us to focus more on ratings and settings for each of those little pieces of the play. However, this makes it WAY more complicated when it comes to trying to hit certain numbers with results for those ratings and settings. It's really just trying to get everything moving in the right direction, observing, adjusting, and repeating until we feel it is where it needs to be. It's a long process and we are closer than we were, but there still needs to be adjustments.