Analyzing 450 million lines of software code

A new Coverity report details the analysis of more than 450 million lines of software code through the Coverity Scan service. The service, which began as the largest public-private sector research project focused on open source software integrity, was initiated between Coverity and the U.S. Department of Homeland Security in 2006 and is now managed by Coverity.

Over the past seven years, the Coverity Scan service has analyzed nearly 850 million lines of code from more than 300 open source projects including Linux, PHP and Apache.

Code quality for open source software continues to mirror that of proprietary software-and both continue to surpass the industry standard for software quality. Defect density (defects per 1,000 lines of software code) is a commonly used measurement for software quality. The analysis found an average defect density of .69 for open source software projects that leverage the Coverity Scan service, and an average defect density of .68 for proprietary code developed by Coverity enterprise customers.

Both have better quality as compared to the industry average defect density for good quality software of 1.0. This marks the second, consecutive year that both open source code and proprietary code have achieved defect density below 1.0, indicating that the accepted industry standard for good quality code has improved.

As projects surpass one million lines of code, there’s a direct correlation between size and quality for proprietary projects, and an inverse correlation for open source projects. Proprietary code analyzed had an average defect density of .98 for projects between 500,000 – 1,000,000 lines of code. For projects with more than one million lines of code, defect density decreased to .66, which suggests that proprietary projects generally experience an increase in software quality as they exceed that size.

Open source projects with between 500,000 – 1,000,000 lines of code, however, had an average defect density of .44, while that same figure increased to .75 for open source projects with more than one million lines of code, marking a decline in software quality as projects get larger. This discrepancy can be attributed to differing dynamics within open source and proprietary development teams, as well as the point at which these teams implement formalized development testing processes.

Linux remains a benchmark for quality. Since the original Coverity Scan report in 2008, scanned versions of Linux have consistently achieved a defect density of less than 1.0, and versions scanned in 2011 and 2012 demonstrated a defect density below .7.

In 2011, Coverity scanned more than 6.8 million lines of Linux code and found a defect density of .62. In 2012, they scanned more than 7.4 million lines of Linux code and found a defect density of .66. At the time of this report, Coverity scanned 7.6 million lines of code in Linux 3.8 and found a defect density of .59.

High-risk defects persist. 36 percent of the defects fixed by the 2012 Scan report were classified as “high-risk,” meaning that they could pose a considerable threat to overall software quality and security if undetected.

Resource leaks, memory corruption and illegal memory access, all of which are considered difficult to detect without automated code analysis, were the most common high-risk defects identified in the report.

“We started the Coverity Scan project seven years ago with the U.S. Department of Homeland Security, as a resource for the open source development community to improve the quality of their software,” said Andy Chou, co-founder and CTO for Coverity.

“Each year, driven in part by advances in static analysis technology, the size and scope of the report increases—as do the number of defects identified and fixed. We’re very proud to see how the Coverity Scan service has evolved to become a key indicator of code quality for both open source and proprietary software, and look forward to continuing this work in the years to come,” Chou added.

Don't miss