i don't have evidence but i can give a solid summary.
- obviously the 3pt % is better against a negative as you mention and reverse for 2pt scoring, especially close to the basket (i consider mid range jumpers in the middle and long range 2s hurt by +, but less than 3pt scoring). but i do want to mention it also impact attempts in a significant way, which is worth considering. for example, someone who takes a lot of 3s, but sucks at them - you may want to let them take them rather than trying to defend them better.
- rebounding is the next most prominent effect IMO. i'm not sure the effect is as extreme at the edges - like, is a -4 to -5 as beneficial as a -1 to -2? maybe. i definitely think within the +/- 2 or 3, theres a substantial effect, like a -2 and +2 are really obviously different. in general, i play those extremes less, so my confidence interval is not ultra high - but i feel like for most of the secondary characteristics, if you will, the effect is blunted at the extremes (which may be true for everything actually). the impact on rebounding between for example, a -2 and a +2, is very significant. i don't think this varies much by set, its just that 3-2 basically sucks at the boards and man has a slight edge over the other 2 (probably man, press, 2-3, 3-2, if you really want to split hairs, but i feel like the only major difference is 3-2 is sucky on the boards).
- turnovers are definitely part of it, but i think its more about the negative side of the board. the more negative, the fewer TOs. i dont find a +5 to give much benefit over a +1 or +2.
- fouling is a big part of it. the negative side especially, increases foul trouble. this is extremely important for high fatigue situations where depth is a factor. zone teams can write this off a bit as they foul less, but not so much as to ignore it completely. for like, 10 deep press teams and stuff, i am extremely reluctant to play - sets. i do it - but i might run a -3 against a 0 3pta team, instead of the automatic -5 most teams should play. against a team who i'd normally go -3, i'm between -1 and -2 depending how severe the 3pta deficit is. so maybe folks don't consider that extreme, but it feels like a pretty huge factor to me.
there's definitely ample room for reasonable folks to disagree about the magnitude of the above, and especially on what that means for +/- implementation, which inherently is very situational to start with. however, i think these effects existing and being meaningful enough to consider is approaching the territory of being a fact.