Tree-Infested Forests
Anyone who spins away as a cog in the great engine of science has seen that, in your own work, it's often necessary to get very, very close to the problem in order to understand it. In the long hours and months toiling away at gathering a data set, forcing it to be relevant to a subtle hypothesis with an array of obscure statistical techniques, resigning oneself to the narrow focus of their work, and describing all of it as precisely as possible, one runs the risk of losing sight of the errors that threaten the solution. Or, more specifically, the risk is of committing basic, invalidating methodological or logical errors in an attempt to overcome the basic difficulties of research.
This appears to be the problem with Tuesday's paper, "Measuring Behavioral Dependency for Improving Change-Proneness Prediction in UML Models" by A-R Han, S-U Jeon, D-H Bae, and J-E Hong, in Journals of Systems and Software, 83: 222-234, 2010. Seeking to improve their model, the authors have included in it data collected from later versions of the software under their study, despite the fact that the model is apparently supposed to predict an attribute of the earliest version. It's easy to predict the future when it's already known. But if you know the "future" in your research, it's easy to forget that it won't be known when you're applying the results.
