Mark Kollasch's Incomparable Software-Making Weblog

Go ahead and try to compare it. You can't.

17 February 2010

Tree-Infested Forests

Anyone who spins away as a cog in the great engine of science has seen that, in your own work, it's often necessary to get very, very close to the problem in order to understand it. In the long hours and months toiling away at gathering a data set, forcing it to be relevant to a subtle hypothesis with an array of obscure statistical techniques, resigning oneself to the narrow focus of their work, and describing all of it as precisely as possible, one runs the risk of losing sight of the errors that threaten the solution. Or, more specifically, the risk is of committing basic, invalidating methodological or logical errors in an attempt to overcome the basic difficulties of research.

This appears to be the problem with Tuesday's paper, "Measuring Behavioral Dependency for Improving Change-Proneness Prediction in UML Models" by A-R Han, S-U Jeon, D-H Bae, and J-E Hong, in Journals of Systems and Software, 83: 222-234, 2010. Seeking to improve their model, the authors have included in it data collected from later versions of the software under their study, despite the fact that the model is apparently supposed to predict an attribute of the earliest version. It's easy to predict the future when it's already known. But if you know the "future" in your research, it's easy to forget that it won't be known when you're applying the results.

Labels: ,

11 February 2010

Predicting Fault Incidence using Software Change History

...is the title of a paper, published July 2000 in IEEE Transactions on Software Engineering, by Todd L. Graves, Alan F. Karr, J. S. Marron, and Harvey Siy. They propose a method for estimating the number of faults in a program based on data collected about the development process rather than just the program itself, built on the insight that bugs do not arise in a vacuum, but must at some point be added to the program. (They examined a number of common (for the time) metrics of "complexity," most of which correlated well with each other and with size metrics such as lines of code, suggesting circumstantially that process measures are more meaningful predictors than artifact measures.)

Specifically, they studied an enormous telephone-switching program by Bell Labs, written in C and a domain-specific language, and used the source control history as their data set. Seems sensible, right? And in the end, their statistical model did demonstrate more predictive power than just sheer lines-of-code, without even considering any attribute of the software itself.

The most useful predictive measure that they examined was the average age of each change in a module, weighted by the size of the change; the older, the less buggy. Obviously, this doesn't mean that the best way to fix bugs is to let your code gather dust in a repository for a few years. Rather, what it means is that the longer a piece of code has been in use and under development, the more likely it is that any bug in it has been found and corrected. It amounts to an endorsement of thorough testing.


A correlation was found between the number of changes made in one version and the number of faults found in the next. It is self-evident that bugs cause changes, but this finding also suggests that changes cause bugs.This is similar to the above finding, but its implications are more circular. Like another strong predictor, that the number of faults in previous versions correlates well with the number of faults in the current version, it seems merely an affirmation of ontological inertia: buggy code tends to remain buggy; or even tautological: buggy code is buggy.

Interestingly, and with implications for the open source movement, there was no correlation whatsoever between fault incidence in a module and the number of developers which committed changes to that module (and, in a program of this size and age, this sometimes measured in excess of one hundred developers): the authors noted that "too many cooks" did not spoil the proverbial broth, but their data also show that many eyes did not make all bugs shallow.

Labels: ,

02 February 2010

Software Evolution

Discussed the paper Software Evolution by Meir Lehman and Juan C. Fernández-Ramil, as published in chapter 1 of Software Evolution and Feedback: Theory and Practice by N. Madhavji, J. Ramil, and D. Perry (editors).

Interesting notions on display here. The authors reuse a classification of programs into S-type and E-type, where S-type programs hew precisely to a formal specification (and, consequently, appear principally in academic contexts or as subsets of E-type programs) and E-type programs are of the more conventional sort, meant to be used by actual users and, accordingly, are subject to evolutionary forces over its lifetime as its developers attempt to satisfy the users.

"Evolution," they say, specifically refers to the effects of maintenance, or to maintenance itself. "Maintenance," then, would be a misnomer. Driven by user feedback, or by a motivation to retain user satisfaction, programs will increase in functionality and other measures, such as size or some notion of complexity, or so the thesis goes. This can (and, in fact, was) observed in several programs; the article makes reference to a study tracking the size of IBM OS/360 over many releases, demonstrating that the size of the operating system increased steadily for twenty releases, then becomes unstable. An exploration of the instability ensues; a possible explanation is that the "ripple" resembles a feedback loop: tenuous, but plausible. The original study may provide necessary detail.

Thinking of software development as a self-directing process yields a degree of insight. The authors specifically note that evolutionary behavior is evident in software once more than one level of management is involved, as individual groups' activity may be directed, but the interaction between them is more chaotic.

Labels: ,

26 January 2010

Quantification

An open question in software is the matter of obtaining useful quantitative measures describing programs, both artifacts and their creation. Such measures, applied to large real-world data sets, would, it is hoped, lead the way to a sort of empirically-determined Holy Grail of best practices... for given values of "best," at least. Producing concrete, meaningful, complete, and adequate descriptions of phenomena which can be identified (preferably automatically) in code and the development history behind it would permit the discovery, by way of statistical analysis, of predictive models describing software.

I will give an example.

Suppose you want your software to be as good as possible. This is, of course, a tremendous assumption. Well, what do you mean by "good?" Let's start a bit less subjectively. Suppose you want your software to be, to draw an attribute out of a hat, as maintainable as possible, by which you mean you want it to take less effort to maintain. Fortunately, you've got data about projects which can be generalized to apply to your own. So you analyze the data to find which attributes correlate well to the effort used in maintenance, give or take a few caveats, invariants, and sanity checks. Optimize those attributes in your own process, and you'll know exactly how to carry out your project and just how maintainable you can expect it to be, and even how likely it is that it won't turn out that way.

There are, clearly, a number of steps in that procedure which are filled in by what amounts to wishful thinking. The science is not, as they say, in. That's what this class is for. We can identify more informative composite measures than the raw data. We can develop ways to automatically identify data attributes which are not obvious, perhaps most interestingly locating the use of explicit design patterns, or even hitherto-unidentified patterns used unintentionally. We can try to reverse-engineer the design intentions of developers on projects where there are no process records, and also beg open-source developers to begin keeping such records, in hopes that our successors don't have to bother with this dreary guesswork.

And if we succeed? Another thin layer will be peeled away from the vast onion of misinformation, superstition, and ad-hoc stumbling in software engineering. I think I'm going to cry.

Labels:

21 January 2010

Reboot

I am enrolled in a master's degree program in computer science at Colorado State University. In one of my classes, Advanced Topics in Software Engineering: Measurement, Analysis, & Evaluation, it is a requirement for each student to keep a weblog consisting of comments on the other students' presentations. This seems a convenient excuse for me to throw my hat back in the blogging ring.

So, in addition to the minimal class requirements, I will use this space to comment on related material and concepts, whether within the course's curriculum or without. Topics will generally fall in the realm of computer science, although I make no promises, since digression is half the fun. I will write in a style whose principal aim is to examine and understand the concepts myself, so many posts will be unsuitable for laymen, while others will be unsuitable for experts.

Labels: