Analysis and testing of Web applications


. Phd student: Filippo Ricca, Disi, Università degli studi di Genova.

. Advisor: Dott. Bruno Caprile, ITC-irst, Centro per la ricerca scientifica e tecnologica, Povo (Trento).

. Supervisor: Prof. Gerardo Costa, Disi, Università degli studi di Genova.


.

1. Aim of the Thesis

The production of high quality Web applications is a challenge. To improve quality factors of Web applications such as correctness, reliability, maintainability, usability, conformance to standards, etc. different methodologies and techniques are currently object of study. Some of them are focused on the forward engineering step, proposing models and formalisms aimed at supporting the design of Web applications. Others assume that a Web application already exists and support its analysis and testing. The goal of analysis and testing is to assess and to improve the quality of Web applications generated during development and evolved during the modification phases.

The aim of this thesis is to investigate, define and apply to Web systems a variety of analysis and testing techniques inspired by those used with traditional software. In order to provide empirical support to the proposed approach, a prototype tool implementing such analyses and supporting developers in testing activities will be developed and field-tested.

2. Progress

In this section we illustrate the progress done with reference to the research plan proposed in the document: phD thesis proposal . A summary of our researches in analysis and testing of Web applications is available in ricca2001e (chapter of book). Here main topics have been conveniently grouped.
a. Web application modeling

The first step toward analysis and testing is the definition of a set of models representing the various entities involved in Web applications and their mutual relationships (see figure below).

The model describing only static Web sites (see appendix i for the distinction between static Web sites and dynamic Web applications), presented in ricca2000a were taken as starting point for the thesis. Entities such as forms, dynamic links and dynamic pages were added to this initial model (ricca2001a) with the aim of extending known analyses and testing techniques to dynamic Web applications. This new model, represented using the standard Unified Modeling Language (UML), is not adequate to support Web application testing in cases in which the same server program behaves differently according to the interaction state (see appendix ii for further explanations). For this reason another model was devised, explicit-state model (ricca2001b). This differs from the previous, in that it unrolls server programs and dynamic pages with different behaviors into actually different entities.

b. Static analysis and transformation
Analysis results can be exploited to detect possible defects and anomalies, while transformation rules can be useful to improve the structure of Web applications as well as to help designers in the restructuring phase. To the structural (reaching frames, shortest path, ...) and historical analysis initially proposed in ricca2000a, we have added other interesting analyses (slicing and multilingual Web sites consitency) and a set of transformations. In ricca2001c an initial notion of slicing for Web applications was introduced. This technique, well known for traditional software system, can be exploited during several activities such as testing, debugging and understanding. The result of slicing a Web application is a "smaller" Web application which exhibits the same behavior as the initial Web application with respect to the information of interest. In tonella2001a and tonella2001b the problem of verifying the consistency between Web site portions devoted to different languages was investigated. Anomalies tipically occuring in multilingual Web sites include absence of pages in some languages, differences in the page structure in different languages, missing information and parts not translated. In these papers some algorithms for highlighting anomalies are proposed along with a technique for restructuring multi lingual web sites. A set of automatic and semi-automatic transformation rules improving Web application was described in ricca2001d. These rules work at two different levels: inter-page transformation and intra-page transformation. In the first case the rules exploit the model of the Web application to locate changes, and update the model when transformations are applied. In the second case, rules transform a single HTML page and the result is a new HTML page.
c. Testing
An incremental/iterative process model for the development of Web applications and a testing process of Web applications were proposed (ricca2001b). The most distinctive feature of the development process presented in this paper is the central role of the Web application model. Such a model is useful for different purposes: it helps understanding the architecture of the existing system when a requirement/design increment has to be introduced; it is the reference for analysis and it allows the automation of several activities related to white box testing, such as test case production, execution and coverage measurement. The testing technique proposed was successfully applied to several real world Web applications, among which Wordnet (www.cogsci.princeton.edu), Amazon (www.amazon.com) and FS-online (orario.fs-on-line.com/orario.it.html).
c. Tools
ReWeb and TestWeb, two research prototypes implementing the extraction of the model from existing Web applications (reverse engineering), providing the proposed static analyses and supporting white box testing technique, are described in detail in ricca2001. In ricca2000b a real Web application (www.ubicum.it) with structural problems was restructured using the analysis provided by ReWeb.

3. Next year work plan

All tasks of the first year outlined in the Work plan of the thesis proposal have been performed. Extension of the model to dynamic sites, representation of the model in UML, adaption of static analyses to the model, investigation of selected aspects of Web site testing (white box testing) and a real example of Web site re-structuring using ReWeb have been performed. The extension of the tool to download dynamic Web applications and the implementation of white box testing has been partially performed. The tools need more testing and tuning to the model proposed in ricca2001b. For year 2002 the remaining tasks are:

A few points may deserve clarifications: (1) the model proposed in ricca2001b while already well developed cannot be considered final at this stage of work. In particular if valuable input is expected from more extensive field testing and application of the model to statistical testing. (2) investigation of connections between access log and static analysis (point 3, second year of the work plan presented in the thesis proposal) and pattern matching (point 1 of the work plan) have been cancelled from the work plan, as suggested from the board of evalutation, to concentrate on the problem of Web testing. (3) Point 2 of the second year i.e. investigation of additional static analyses derived from software, has been partially performed in the first year of the thesis (Web slicing and Web sites consistency). These researches will be continued only if compatible with the final work plan described above.

4. Structure of the thesis

The structure of the thesis is very similar to chapter of book ricca2001e, that will be used as a starting point. The idea is to refine the theoretical part of the book chapter and to expand it with more examples and case studies. The part devoted to Web application testing will be also expanded. The thesis will be structured as follows:

Every chapter (except introduction and conclusions) will be organized into three sections. The first one, the more theoretical, will introduce the problems, possible solutions and techniques used. The second will present a case study related to the first part; the last will explain in detail algorithms and heuristics used. At the end of the thesis some appendixes will contain brief explanations of important concepts used in the thesis.

References

Appendix i: Web sites and Web applications

Static web sites are composed of only static pages while dynamic web applications are composed of static and dynamic pages. The content of a static page is fixed, usually written in HTML and stored in a repository on the server while the content of a dynamic page is computed at run time by the server program (script) and may depend on inputs provided by the user.

Appendix ii: Models

Consider the following example (already considered in ricca2001b), in which unnecessary details have been deleted for simplicity. It shows clearly how the model proposed in ricca2001 is not adequate to support Web application testing in cases in which the same server program behaves differently according to the interaction state.

The Figure shows an example of Web application for which both 'old' (left) and 'new' (right) models are given. The application consists of an initial static page H from which the user can navigate to a server program S through a link associated with the parameter state = 1. The server program (Script) S builds a dynamic HTML page, the content of which depends on the value of variable 'state' which is received by S. In particular, with state = 1, S builds a page containing one form which collects the values of variables x and y and transmits a value of state equals to 2 as a hidden variable. This is represented in the model as a submit link guarded by the condition (state=1). Then, the server program S is invoked for the second time, now with state = 2. The behavior in this situation is different from the previous one, and the output page contains two new forms, respectively devoted to collecting the values of a and of b, c, while it does not contain the previous form. This is the reason for the two submit links guarded by the condition (state=2). Finally, the server program S is executed again, either by the first active form (gathering a as input) or from the second one (gathering b and c). The result of this execution is still different and the dynamic page D that is built now does not contain any form (state is equal to 3 and therefore all conditions are false). Its content varies also in the two cases where either a or b, c are filled in by the user. The new model ricca2001b of this example of Web application is provided on the right of the Figure. Actual values of parameters, inputs and hidden variables are shown within square brackets. The server program S and the dynamic page D have been split into 4 pages, associated to the 4 different behaviors that may occur during an interaction, corresponding respectively to state = 1, state = 2, state = 3 and a gathered, and state = 3 and b, c gathered. No condition is attached to the edges of this model, since all condition-dependent behaviors have been separated explicitly. The values of input and hidden variables and of link parameters are sufficient to traverse a particular navigation path, which is feasible by construction, since specific values (or more generally equivalence classes of values) to be assigned to variables and parameters are determined during downloading and are stored with the model edges. All paths in this model (right) are feasible, while many paths in the other model are infeasible (e.g., every path going from H to S and then following any of the submit links with state=3). Since our technique for Web testing automatically selects a set of paths in the model (satisfying a given criterion), the requirement that such paths be feasible is an essential one.

Appendix iii: Statistical Testing

Statistical testing (chang2000, kallepalli2001) is a useful complement to structural testing, in that it accounts for the typical interactions with the Web site, rather than its structure and data flows. In order to apply statistical testing to a Web application, it is necessary to build its usage model. The usage model is a representation of the statistics involved in the executions of a given application and in the input values provided. A natural choice of usage model for a Web application is a Markov chain. Such a choice was inspired by the work by Whittaker and Thomason (whittaker1994) on traditional software systems. In fact, each HTML page can be seen as a state and hyperlinks in the page can be regarded as Markov chain edges leading to other states. The usage model of a Web application can therefore be obtained from its UML model. The only missing information to make it a Markov chain is an estimate of the transition probabilities to be associated with the edges. Values for such probabilities can be computed by using historical information, such as that contained in the log file. It represents the (conditioned) probability of navigating, at the next step, toward another page. The UML model enriched with transition probabilities can be exploited for statistical testing (see Figure). Test cases can be automatically generated according to the statistics encoded in the Markov chain. This is easily achieved by stochastically visiting the chain, i.e. by choosing which edge to traverse in accordance with the transition probabilities of outgoing edges. The resulting test suite complies with the statistics of the usage patterns and simulates a real usage of the Web application. When test cases are executed and failures occur, the classical measures of reliability can be made, by determining the Mean Time Between Failure (MTBF) and by estimating the probability of correct behavior within a given time interval (reliability models). Such measures can be useful to decide when to stop testing.