The statistical issue with Negro Leagues is the same as the issue with the National Association, missing data (For many seasons they only have boxscores for half-two thirds of the games - making players stats fairly unreliable since the records we do have represent such small samples). It's not like the missing data for less vital data such as CS records from the dead ball era. In both cases, there are groups that have done fabulous jobs of finding and compiling records, and even comparing the strengths of the leagues and teams to other existing leagues and teams from the same and other eras. However, there just isn't enough data. If we had more records I'm sure we could easily norm the stats and records and fill in some of the gaps on less vital stats like we do already here, but we just don't have enough data to make either subset trustworthy.
The comparison the the National Association is also apt due to the length of the seasons for most Negro League teams and National Association teams. Taking a cue from greater minds than mine, both would be aided, along with the rest of the seasons in the WIS database by adjusting the proration formula from a /162 games formula to a /180 day season formula. The length of the season has been fairly consistent since the inaugural NA season in 1871 to today of roughly 180 days. If we prorate innings on this basis instead of on the /162 games basis the insane number of innings we see here, would become much more realistic. Clarkson's '85 season would no longer project to 900+ IP, but to a much more reasonable ~670 IP. Likewise, instead of Al Spalding's 1871 season prorating to 1,390 IP/162, it would prorate to roughly 260 IP/180 (I didn't actually do any of the math here, just ballpark guesstimated) and Satchel Paige's 1934 season (from the stats we do have, though we're likely missing roughly 6 starts) would go from 375 IP/162 to roughly 270 IP/180.
Hitting stats aren't adjustable to quite the same extent as off days typically allow a hitter the chance to accrue more PA like they do for pitchers with IP, and the small sample sizes (24 games, 30 games, 60 games with 95, 140, 200 PA, etc... don't lend themselves well to prorating as they're very small sample sizes. As Napolean mentions above, look at Gibson's stats... it's hard to take them too seriously when two of his three best seasons have less than 100 PA accounted for... that puts him in the 1871 Levi Meyerle camp. Shoot, even marginal hitters can hit .400/.500/.700 for 100 PA stretches... and this is where the stats break down. Which brings back full circle, there's just not enough season-by-season data for either set of leagues to do anything useful with the stats (unless we treat them as short season hitters and use just the stats we have with no proration at all and price accordingly, but then we just have a bunch of overpriced bench players that will never be drafted).