Speaker Name Dr. Veena Mendiratta and Dr. Swapna S. Gokhale
Title A Methodology for Architecture-Based Software Reliability Analysis

Biography

Dr. Swapna S. Gokhale is an Associate Professor in the Dept. of Computer Science and Engineering at the University of Connecticut. She received her M. S. and Ph. D. in Electrical and Computer Engineering from Duke University in 1996 and 1998 respectively, and her B. E. (Hons.) in Electrical and Electronics Engineering and Computer Science from Birla Institute of Technology and Science, Pilani India, in 1994. Prior to UConn, she was a Research Scientist at Telcordia Technologies and a Post Graduate Researcher at the University of California, Riverside. Her research interests lie in performance and dependability analysis of computer systems, mining of social network and web log data, and software engineering education. She has published over 150 conference and journal papers on these topics. She is elected a Senior Member of the IEEE and a recipient of the Best Paper award at several international conferences. She received the National Science Foundation CAREER award to support her research in architecture-based software reliability analysis.

Biography

Dr. Veena Mendiratta is the Practice Lead for Network Reliability and Analytics in Bell Labs at Alcatel-Lucent in Naperville, Illinois. She began her career at AT&T Bell Labs in 1984. Her work has focused on the reliability and performance analysis for telecommunications systems products, networks, and services to guide system architecture solutions, and on telecom data analytics. She has led projects to develop anomaly prediction algorithms for wireless networks and customer experience analytics using data mining and social network analysis techniques. Current work includes using data mining methods for improving performance of wireless networks and on cloud reliability engineering for telecom applications. She is a member of INFORMS and elected a Senior Member of IEEE. Dr. Mendiratta received a B.Tech in engineering from the Indian Institute of Technology, New Delhi, India and a PhD in operations research from Northwestern University, USA. She is an Adjunct Professor in the MS in Analytics program at Northwestern University.

Abstract

The critical dependence of our society on the services offered by software systems places a heavy premium on their reliability. An important step in achieving high reliability in a software system is systematic reliability analysis at the architectural level. Such analysis should consider customer usage patterns (operational profile), component reliability, system architecture, and deployment of the components across hardware hosts. While one of the outcomes of this analysis is a prediction of the system reliability, the more important outcomes are an assessment of the sensitivity of the system reliability to its components’ attributes and the identification of components that are critical from a reliability perspective. These components can then be targeted for reliability enhancement, so that the desired system reliability targets can be achieved in a cost-effective manner.

In this tutorial we will first present an overview of the different types of software reliability models depending on the phase of the development cycle when the model is developed and used to set the context for the architecture-based reliability models. The main part of the tutorial will focus on: a hierarchical, two-tier methodology to analyze the reliability of a software system. The methodology partitions the analysis into two steps. In the first step, the reliability of a service offered by the system is obtained by composing the reliabilities of its components within the context of its architecture. Service reliability will also consider the co-location and deployment configurations of the system components. Service reliability analysis will be conducted by mapping the message flow that occurs among the different components of a system to a Markov model. In developing the service reliability analysis methodology, we draw and build upon our extensive recent work in the area of architecture-based software reliability analysis. In the second step, the system-level reliability is obtained by composing the service reliabilities obtained from the first step in conjunction with the customer usage patterns and service distributions. The methodology thus considers the impact of several diverse aspects that influence system reliability namely, component failures, component interactions, deployment configurations, and customer usage scenarios in an integrated manner. Finally, we will discuss how the methodology can be used to allocate reliabilities to system components, based on the expected end-to-end service reliability targets, considering the architectural characteristics of the services.

Once the reliability budget of the components is determined, the next step is to determine how the target component reliabilities can be achieved. To understand the factors that influence component reliabilities, we partition each component into three sub-components, namely, hardware, middleware and application software. Since the hardware and middleware are typically expected to be highly reliable, we will focus on the factors that affect the reliability of the software sub-component. Subsequently, we will discuss how the reliability allocated to the software sub-component can be used to guide the selection of a combination of testing, restart and repair strategies. Finally, we will discuss approaches for obtaining reliability data (model inputs) for the software components.

We will illustrate the use of the methodology to gain insights into the influence of different parameters on system reliability using two examples. The first example is the Integrated Multimedia Subsystem (IMS), which is a standardized Next Generation Networking (NGN) architecture for telecom providers for offering mobile and fixed multimedia services. The second example will involve a virtualized application deployed on a cloud platform. Examples of implementing such models in the SHARPE modeling tool will also be shown.

Outline

Introduction and Motivation

Examples of major software failures

Overview of various software reliability models

Basics of architecture-based analysis and modeling

Impact of uncertain parameters

Example 1: Reliability analysis of IMS application

Example 2: Reliability analysis of virtualized application