Background and Related Work



Static Analysis

Automated static analysis tools can be used to identify potential source code anomalies, which we call alerts, early in the software process that could lead to field failures [6] . Alerts generated by automated static analysis tools require inspection by a developer to determine if the alert is an indication of an anomaly important enough for the developer to fix, called an actionable alert [2, 12] . When an alert is not an indication of an actual code anomaly or is deemed unimportant to the developer (e.g. the alert indicates a programmer anomaly inconsequential to the program's functionally), the alert is called unactionable [2, 12]. Static analysis tools may generate an overwhelming number of alerts [8] , the majority of which are likely to be unactionable [6]. To mitigate the costs of false positives when using static analysis, we want to build project specific models to predict or prioritize which alerts are actionable.

The goal of this research is to decrease the inspection latency and increase the rate of anomaly removal when using automated static analysis tools by creating and validating an adaptive false positive mitigation model to prioritize automated static analysis alerts by the likelihood the alert is actionable by a developer. We hypothesize that false positive mitigation models can be built that predict which alerts are actionable by developers, and these models can be used to prioritized alerts for developer inspection. False positive mitigation models are built by observing patterns in the characteristics about alerts that have been fixed or suppressed by a team or developer in the past, and using these patterns to predict which alerts are likely to be actionable and unactionable in the future. Additionally, the developer remains ignorant of an injected anomaly until automated static analysis issues an alert that the developer chooses to inspection. Using static analysis during development reduces the time between the anomaly's injection and the alert's creation, which reduces the amount of time the developer is ignorant of the potential problem.

False Positive Mitigation Models

Our prior research [4, 5] has proposed a project-specific, in-process, FP mitigation prioritization technique that utilizes the alert's type and location at the source folder, class, and method levels. The model, aware-apm [4, 5] , uses developer feedback in the form of alert suppression and alert closures . Suppressing an alert is an explicit developer action to indicate the alert is unactionable. Closure is determined by comparing subsequent static analysis runs. If the alert is not in the later run, the alert is closed. After a developer inspects the alert and takes an action on that alert, the prioritization of the remaining alerts is adjusted from the feedback. We evaluated three versions of aware-apm model on the faultbench benchmark subject programs and found an average accuracy of 67-76% [4] . The precision and recall were in the 16-19% and 25-42% range, respectfully, for the benchmark programs. The low accuracy suggests that while the models may work well for some programs, the models do not work well for others. Additionally, the alert type and alert location together and in isolation may not be the best predictors of actionable alerts.

Ruthruff et al. [12] screened 33 ACs from 1,652 alerts sampled from Google's code base to develop logistic regression models for predicting actionable and unactionable alerts. Ruthruff et al. describe a screening process whereby ACs were selected for the model. The generated models contained 9-15 ACs and had an accuracy ranging from 71-87%. Ruthruff et al. [12] compared their generated models to a linear regression model containing all ACs and models developed by Bell et al. [3, 11] for predicting the number of faults. Overall, the models generated by Ruthruff et al. generally had a higher accuracy than the other models. Additionally, the time to gather the data to build the generated model was substantially shorter than the time to build the model with all ACs. Many of the ACs suggested by Ruthruff et al. are used in our research in addition to other project specific metrics. We also consider additional machine learners.

Kim and Ernst [7, 8] describe two static analysis alert prioritization techniques that utilize data mined from source code repositories. The first prioritization technique uses the average lifetime of alerts sharing the same type to prioritize the alert types [7] . The lifetime of an alert is the time (in days) between alert creation and alert closure. Kim and Ernst assumed that alert types with shorter lifetimes have a higher ranking (e.g. alerts fixed quickly are likely important).

The second technique is a history-based alert prioritization that weights alert types by the number of alerts closed by fault- and non-fault-fixes. A fault-fix is a source code change where the developer fixes a fault or problem and a non-fault-fix is a change where a fault is not fixed, like a feature addition [8] . Alerts may be closed during any code modification, and are therefore considered actionable, but Kim and Ernst expect that those alerts closed during fault-fixes are more important when predicting actionable alerts.

The history-based alert prioritization presented by Kim and Ernst [8] improves the alert precision by over 100% when compared to the alert precision of alerts prioritized by tool severity. However, the precision ranged from 17-67%, which might be due to alert closures not having causal relationships with the root cause of an anomaly-fix. We include the alert lifetime, measured in revisions instead of days, as a candidate AC. We also utilize source code repository mining for other ACs. Unlike Kim and Ernst, we are interested in prioritizing or classifying individual alerts rather than the alert type.

Williams and Hollingsworth [13] created a static analysis tool which evaluates how often the return values of method calls are checked in source code. A method is flagged with an alert when the return value for the method is inconsistently checked in calling methods. Williams and Hollingsworth use the historyaware prioritization technique to prioritize methods by the percentage of time the return value for the methods are checked in the software repository and the current version of the code. The results show a FP rate of 70% and 76% when using the historyaware prioritization technique on two case studies involving httpd and Wine applications, respectively. The historyaware technique mines data from the source code repository, which we also do, but for different ACs. Instead of using alert type specific information to identify actionable alerts, we use ACs that can prioritize or classify many alert types.

Kremenek et al. [9] show that static analysis alerts in similar locations tend to be homogeneous. On average, 88% of methods, 52% of files, and 13% of directories with two or more alerts contained homogeneous alerts. Kremenek et al. created a feedback-rank algorithm whereby the developer's feedback is used to prioritize the remaining alerts. The static analysis tools used by Kremenek et al. take advantage of understanding where a static analysis tool checked for an alert, but did not find a potential anomaly [10] . Kremenek et al. [9] prioritize the alerts via a Bayesian Network [14] .

[1] N. Ayewah, D. Hovemeyer, J. D. Morgenthaler, J. Penix, and W. Pugh, "Using Static Analysis to Find Bugs," in IEEE Software . vol. 25, no. 5, 2008, pp. 22-29.

[2] N. Ayewah, W. Pugh, J. D. Morgenthaler, J. Penix, and Y. Zhou, "Evaluating Static Analysis Defect Warnings On Production Software," 7th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering , San Diego, CA, USA, June 13-14, 2007, pp. 1-8.

[3] R. M. Bell, T. J. Ostrand, and E. J. Weyuker, "Looking for Bugs in All the Right Places," International Symposium on Software Testing and Analysis , 2006, pp. 61-71.

[4] S. Heckman and L. Williams, "On Establishing a Benchmark for Evaluating Static Analysis Alert Prioritization and Classification Techniques, to appear," 2nd International Symposium on Empirical Software Engineering and Measurement , Kaiserslautern, Germany, October 9-10, 2008.

[5] S. S. Heckman, "Adaptively Ranking Alerts Generated from Automated Static Analysis," in ACM Crossroads . vol. 14, no. 1, 2007, pp. 16-20.

[6] D. Hovemeyer and W. Pugh, "Finding Bugs is Easy," 19th ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications , Vancouver, British Columbia, Canada, October 24-28, 2004, pp. 132-136.

[7] S. Kim and M. D. Ernst, "Prioritizing Warning Categories by Analyzing Software History," International Workshop on Mining Software Repositories , Minneapolis, MN, USA, May 19-20, 2007, p. 27.

[8] S. Kim and M. D. Ernst, "Which Warnings Should I Fix First?," 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering , Dubrovnik, Croatia, September 3-7, 2007, pp. 45-54.

[9] T. Kremenek, K. Ashcraft, J. Yang, and D. Engler, "Correlation Exploitation in Error Ranking," 12th ACM SIGSOFT International Symposium on Foundations of Software Engineering , Newport Beach, CA, USA, 2004, pp. 83-93.

[10] T. Kremenek and D. Engler, "Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations," 10th International Static Analysis Symposium , San Diego, California, 2003, pp. 295-315.

[11] T. J. Ostrand, E. J. Weyuker, and R. M. Bell, "Where the Bugs Are," International Symposium on Software Testing and Analysis , 2004, pp. 86-96.

[12] J. R. Ruthruff, J. Penix, J. D. Morgenthaler, S. Elbaum, and G. Rothermel, "Predicting Accurate and Actionable Static Analysis Warnings: An Experimental Approach," 30th International Conference on Software Engineering , Leipzig, Germany, May 10-18, 2008, pp. 341-350.

[13] C. C. Williams and J. K. Hollingsworth, "Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques," IEEE Transactions on Software Engineering, vol. 31, no. 6, pp. 466-480, 2005.

[14] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques , 2nd ed. Amsterdam: Morgan Kaufmann, 2005.