J. King, B. Smith, L. Williams, “Modifying Without a Trace: General Audit Guidelines are Inadequate for Electronic Health Record Audit Mechanisms”, Proceedings of the International Health Informatics Symposium (IHI 2012), to appear, 2012.
Without adequate audit mechanisms, electronic health record (EHR) systems remain vulnerable to undetected misuse. Users could modify or delete protected health information without these actions being traceable. The objective of this paper is to assess electronic health record audit mechanisms to determine the current degree of auditing for non-repudiation and to assess whether general audit guidelines adequately address non-repudiation. We derived 16 general auditable event types that affect non-repudiation based upon four publications. We qualitatively assess three open-source EHR systems to determine if the systems log these 16 event types. We find that the systems log an average of 12.5% of these event types. We also generated 58 black-box test cases based on specific auditable events derived from Certification Commission for Health Information Technology criteria. We find that only 4.02% of these tests pass. Additionally, 20% of tests fail in all three EHR systems. As a result, actions including the modification of patient demographics and assignment of user privileges can be executed without a trace of the user performing the action. The ambiguous nature of general auditable events may explain the inadequacy of auditing for non-repudiation. EHR system developers should focus on specific auditable events for managing protected health information instead of general events derived from guidelines.
B. Smith, “Systematizing Security Test Case Planning Using Functional Requirements Phrases“, Proceedings of the International Conference on Software Engineering Doctoral Symposium (ICSE Doctoral Symposium), Honolulu, Hawaii, pp. 1136-1137, 2011.
Security experts use their knowledge to attempt attacks on an application in an exploratory and opportunistic way in a process known as penetration testing. However, building security into a product is the responsibility of the whole team, not just the security experts who are often only involved in the final phases of testing. Through the development of a black box security test plan, software testers who are not necessarily security experts can work proactively with the developers early in the software development lifecycle. The team can then establish how security will be evaluated such that the product can be designed and implemented with security in mind. The goal of this research is to improve the security of applications by introducing a methodology that uses the software system’s requirements specification statements to systematically generate a set of black box security tests. We used our methodology on a public requirements specification to create 137 tests and executed these tests on five electronic health record systems. The tests revealed 253 successful attacks on these five systems, which are used to manage the clinical records for approximately 59 million patients, collectively. If non-expert testers can surface the more common vulnerabilities present in an application, security experts can attempt more devious, novel attacks.
B. Smith, L. Williams, “Using SQL Hotspots in a Prioritization Heuristic for Detecting All Types of Web Application Vulnerabilities“, Proceedings of the International Conference on Software Testing, Verification and Validation (ICST 2011), Berlin, Germany, pp. 220-229, 2011.
Development organizations often do not have time to perform security fortification on every file in a product before release. One way of prioritizing security efforts is to use metrics to identify core business logic that could contain vulnerabilities, such as database interaction code. Database code is a source of SQL injection vulnerabilities, but importantly may be home to unrelated vulnerabilities. The goal of this research is to improve the prioritization of security fortification efforts by investigating the ability of SQL hotspots to be used as the basis for a heuristic for prediction of all vulnerability types. We performed empirical case studies of 15 releases of two open source PHP web applications: WordPress, a blogging application, and WikkaWiki, a wiki management engine. Using statistical analysis, we show that the more SQL hotspots a file contains per line of code, the higher the probability that file will contain any type of
B. Smith, A. Austin, M. Brown, J. King, J. Lankford, A. Meneely, L. Williams, “Challenges for Protecting the Privacy of Health Information: Required Certification Can Leave Common Vulnerabilities Undetected“, Proceedings of the Security and Privacy in Medical and Home-care Systems (SPIMACS 2010) Workshop, co-located with CCS, Chicago, IL, pp. 1-12, 2010.
The use of electronic health record (EHR) systems by medical professionals enables the electronic exchange of patient data, yielding cost and quality of care benefits. The United States American Recovery and Reinvestment Act (ARRA) of 2009 provides up to $34 billion for meaningful use of certified EHR systems. But, will these certified EHR systems provide the infrastructure for secure patient data exchange? As a window into the ability of current and emerging certification criteria to expose security vulnerabilities, we performed exploratory security analysis on a proprietary and an open source EHR. We were able to exploit a range of common code-level and design level vulnerabilities. These common vulnerabilities would have remained undetected by the 2011 security certification test scripts from the Certification Commission for Health Information Technology, the most widely used certification process for EHR systems. The consequences of these exploits included, but were not limited to: exposing all users’ login information, the ability of any user to view or edit health records for any patient, and creating a denial of service for all users. Based upon our results, we suggest that an enhanced set of security test scripts be used as entry criteria to the EHR certification process. Before certification bodies spend the time to certify that an EHR application is functionally complete, they should have confidence that the software system meets a basic level of security competence.
Y. Shin, A. Meneely, L. Williams, J. Osborne, “Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities”, IEEE Transactions on Software Engineering (to appear), 2010.
Security inspection and testing requires experts in security who think like an attacker. Security experts need to know code locations on which to focus their testing and inspection efforts. Since vulnerabilities are rare occurrences, locating vulnerable code locations can be a challenging task. We investigated whether software metrics obtained from source code and development history are discriminative and predictive of vulnerable code locations. If so, security experts can use this prediction to prioritize security inspection and testing efforts. The metrics we investigated fall into three categories: complexity, code churn, and developer activity metrics. We performed two empirical case studies on large, widely-used open source projects: the Mozilla Firefox web browser and the Red Hat Enterprise Linux kernel. The results indicate that 24 of the 28 metrics collected are discriminative of vulnerabilities for both projects. The models using all the three types of metrics together predicted over 80% of the known vulnerable files with less than 25% false positives for both projects. Compared to a random selection of files for inspection and testing, these models would have reduced the number of files and the number of lines of code to inspect or test by over 71% and 28%, respectively, for both projects.
A. Austin, B. Smith, and L. Williams, “Towards Improved Security Criteria for Certification of Electronic Health Record Systems“. Proceedings of the Second Workshop on Software Engineering in Healthcare (SEHC), Cape Town, South Africa, 2010.
The Certification Commission for Health Information Technology (CCHIT) is an electronic health record certification organization in the United States. In 2009, CCHIT’s comprehensive criteria were augmented with security criteria that define additional functional security requirements. The goal of this research is to illustrate the importance of requiring misuse cases in certification standards, such as CCHIT, by demonstrating the implementation bugs in an open source healthcare IT application. We performed an initial evaluation of an open source electronic health record system, OpenEMR, using an automated static analysis tool and a penetration-testing tool. We were able to discover implementation bugs latent in the application, ranging from cross-site scripting to insecure cryptographic algorithms. Our findings stress the importance that certification security criteria should focus on implementation bugs as well as design flaws. Based upon our findings, we recommend that CCHIT be augmented with a set of misuse cases that check for specific threats against EMR systems and thereby improve one aspect of the certification process.
Andrew Meneely and Laurie Williams, “Strengthening the Empirical Analysis of the Relationship between Linus’ Law and Software Security”. Empirical Software Engineering & Measurement (ESEM) 2010.
Open source software is often considered to be secure because large developer communities can be leveraged to find and fix security vulnerabilities. Eric Raymond states Linus’ Law as “many eyes make all bugs shallow”, reasoning that a diverse set of perspectives improves the quality of a software product. However, at what point does the multitude of developers become “too many cooks in the kitchen”, causing the system’s security to suffer as a result? In a previous study, we quantified Linus’ Law and “too many cooks in the kitchen” with developer activity metrics and found a statistical association between these metrics and security vulnerabilities in the Linux kernel. In the replication study reported in this paper, we performed our analysis on two additional projects: the PHP programming language and the Wireshark network protocol analyzer. We also updated our Linux kernel case study with 18 additional months of newly-discovered vulnerabilities. In all three case studies, files changed by six developers or more were at least four times more likely to have a vulnerability than files changed by fewer than six developers. Furthermore, we found that our predictive models improved on average when combining data from multiple projects, indicating that models can be transferred from one project to another.
A. Meneely, M. Corcoran, L. Williams, “Improving Developer Activity Metrics using Issue Tracking Annotations” Workshop on Emerging Trends in Software Metrics (WETSoM ’10), to appear.
Understanding and measuring how groups of developers collaborate on software projects can provide valuable insight into software quality and the software development process. Current practices of measuring developer collaboration (e.g. with social network analysis) usually employ metrics based on version control change log data to determine who is working on which part of the system. Version control change logs, however, do not tell the whole story. Information about the collaborative problem-solving process is also documented in the issue tracking systems that record solutions to failures, feature requests, or other development tasks. To enrich the data gained from version control change logs, we propose two annotations to be used in issue tracking systems: solution originator and solution approver. We examined the online discussions of 602 issues from the OpenMRS healthcare web application, annotating which developers were the originators of the solution to the issue, or were the approvers of the solution. We used these annotations to augment the version control change logs and found 47 more contributors to the OpenMRS project than the original 40 found in the version control change logs. Applying social network analysis to the data, we found that central developers in a developer network have a high likelihood of being approvers. These results indicate that using our two issue tracking annotations identify project collaborators that version control change logs miss. However, in the absence of our annotations, developer network centrality can be used as an estimate of the project’s solution approvers. This improvement in developer activity metrics provides a valuable connection between what we can measure in the project development artifacts and the team’s problem-solving process.
Williams L., Meneely A., Shipley G., Protection Poker: The New Software Security “Game” in IEEE Privacy & Security 2010, to appear
Tracking organizations such as the US CERT show a continuing rise in security vulnerabilities in software, increasing awareness of insecure coding practices. Not all discovered vulnerabilities are equal – some have the potential to cause much more damage to organizations and individuals than others. In the inevitable absence of infinite resources, software development teams need to prioritize security fortification efforts to prevent the most damaging attacks. We propose the Protection Poker “game” as a collaborative means of guiding this prioritization. A case study of a Red Hat IT software maintenance team demonstrated the potential of Protection Poker for improving software security practices and team software security knowledge.
R.A. Syed, B. Robinson, L. Williams, “Does Hardware Configuration and Processor Load Impact Software Fault Observability?,” Proceedings of Third International Conference on Software Testing, Verification and Validation (ICST 2010), To Appear.
Intermittent failures and nondeterministic behavior complicate and compromise the effectiveness of software testing and debugging. To increase the observability of software faults, we explore the effect hardware configurations and processor load have on intermittent failures and the nondeterministic behavior of software systems. We conducted a case study on Mozilla Firefox with a selected set of reported field failures. We replicated the conditions that caused the reported failures ten times on each of nine hardware configurations by varying processor speed, memory, hard drive capacity, and processor load. Using several observability tools, we found that hardware configurations that had less processor speed and memory observed more failures than others. Our results also show that by manipulating processor load, we can influence the observability of some faults.