Measuring the Readability of Software Requirement Specifications: An Empirical Study 

 
Download Article

Requirements and specifications documents have become the grounding force of the modern software project. Their importance has been catapulted lately, as companies see them as both instruments that help to capture business rules and direct links to financial success or liability. Good requirements not only educate but they also keep all parties informed, bound and obligated throughout the project life cycle. Poor requirements, or inconsistent elicitation processes, often contribute to the development of undesirable products or to delivered functionality that does not meet the business need. A flawed system production could lead to significant financial exposure and possibly even the loss of significant sums of money to a provider or to the client—or to both. This reality has sparked a united call for IT-specific controls that ensure that requirements are gathered, created, maintained, validated, distributed, stored and versioned consistently by all involved parties.

As IT-specific controls have become critical, the role of the IT organization's compliance executive has increased in importance and in the number of responsibilities to which he/she is assigned. It is the compliance leader's job to eliminate inconsistent practices, to control potentially reckless power mongers, and to globally discourage fraud and criminal intent. Making control changes sooner rather than later can save a company many legal headaches and internal struggles. If quality analysis and assurance are to be a part of the audit process, then those features of the process must be made an integral part of the system being audited. This requires that everyone involved in the design and implementation must fully understand the requirements of the system.

This article looks at the issue of readability of specification documents. It presents an introduction of three common measures of readability. These measures have been applied to real-world test case documents, and the results are documented here, to:

  • Analyze the ability of the measures to appropriately score the reading levels of the documents
  • Analyze the agreement of the three measures as to a document's readability
  • Compare the readability score of the measure to the qualitative assessment of readability and usability of the documents tested

Software Requirements Specification

A software requirements specification (SRS) is basically an organization's understanding (in writing) of a customer's or potential client's system requirements and dependencies at a particular point in time (usually) prior to any actual design or development work. It is a two-way insurance policy that assures that both the client and the organization understand the other's requirements from a given perspective at a given time. The SRS document states in precise and explicit language the functions and capabilities that a software system (i.e., a software application or an e-commerce web site) must provide, as well as any required constraints by which the system must abide. The SRS document is often referred to as the "parent" document because all subsequent project management documents, such as design specifications, statements of work, software architecture specifications, testing and validation plans, and documentation plans, are related to it.1

The Auditor's Role

IT environments have continuously increased in complexity while organizations place ever greater reliance on the information produced by IT systems and processes. The recent emergence of regulations aiming to restore investor confidence has placed a greater emphasis on internal controls and often requires independent assessments of the effectiveness of internal controls.

Internal IT controls are categorized as general controls or application controls. Application controls are specific to a given application, while general controls define the environment in which IT system development occurs.

General controls seek to provide reasonable assurance regarding the integrity, reliability and quality of those systems and ensure that they are developed, configured and implemented to achieve management's objectives. These objectives are linked to application requirements during the requirements phase of the systems development life cycle (SDLC) through the creation of an SRS document.

SRS documents must clearly and unambiguously communicate to a variety of audiences. If this communication does not occur, the probability increases that the IT system developed will not achieve its desired results. One way to reduce the risk of misunderstanding is to measure the readability of the SRS and determine if its level of readability is consistent with its purpose.

Even though the leading software development/requirement management methodologies have strict guidelines, processes and artifact templates, there is no universally accepted technique that auditors, reviewers and clients can use to test the quality of the software requirements themselves. Further, since development methodologies do not include a readability review step, there may be no expectation that the SRSs will be tested for validity, usability or readability. Thus, the critical link between objectives and requirements may not be made, thereby exposing the organization to both financial and nonfinancial risk.

What Is Readability?

For the last 50 years, the education and journalism sectors have been bombarded by methods that claim to measure the efficiency and readability of documented text. Tests of readability are measurements of the ease or complexity associated with the reading of a text document. The tests are actually mathematical formulae that attempt to calculate the grade level or ease level of blocks of text, news articles, books and other writings.

Since readability scores are now easy to calculate by using computerized grammar programs, there is new enthusiasm surrounding their application. Some authors use them as measurement guidelines to help simplify their writing; others, however, are strongly opposed to their use and to their claim of legitimacy.

The International Reading Association and the US National Council of Teachers of English began an initiative 10 years ago to influence members against the use of readability tests to assess educational materials. During the same period, two government reports in England validated the accuracy and reliability of the tests.

Disputes about readability tests have arisen primarily because people make use of the results for different purposes and perhaps misuse them for self-serving directives. Prior to readability formulae, individuals were appointed to read works and make decisions on them. However, this method proved to be nonscientific, because human elements of discrimination and personal taste often persuaded reviewers. These efforts have since been characterized as extreme censorship or attacks against creativity and literary expression. But the question still remains:  Can writing be measured mathematically, or is the human element needed to truly gauge the effectiveness of written text?

Readability Measure (Models) Used in the Study

The Fogg Index

The Fogg Index is a method of analyzing written material to determine how easy it is to read and understand. The resulting score gives an indication of the grade level required by a reader to comprehend the writing. The steps used to calculate the Fogg Index are simple. In a document of at least 100 words, a count is taken of each of the words, sentences and complex words (greater than two syllables). The number of sentences is divided into the number of words, and the number of words is divided into the number of complex words. These last two results are added and multiplied by four.

Ideally, the score should be between 7 and 8 (grade level). Scores above 12 are considered too complex for most readers.

The F.K. Readability Tests

These tests were designed to indicate how difficult a written passage is to understand. There are two tests: the Flesch- Kincaid (F.K.) Reading Ease and the F.K. Grade Level. Although they both use the same measures, represented on different scales, the results of the two tests do not always correlate (e.g., a newspaper article with a better score on the F.K. Reading Ease test than another article may have a lower score on the test that measures grade level).

The F.K. Reading Ease scores text on a scale of 0-100. Higher scores indicate material that is easier to read, and lower scores indicate text that is more difficult to understand. The steps used to calculate the F.K. Reading Ease Score (FRES) are as follows:

  • Ensure that the document is at least 100 words in length.
  • Count the numbers of syllables, words and sentences.
  • Divide the number of sentences into the number of words to get the average sentence length (ASL).
  • Divide the number of words into the number of syllables to get the average syllables per word (ASW).
  • The FRES equals 206.835 – (1.015 x ASL) – (84.6 x ASW).

In an ideal test scenario, scores of 90 to 100 are considered easily understandable by an average fifth grader. Eighth and ninth grade students could easily understand passages with a score of 60-70, and passages with results of 0-30 are best understood by college graduates. Reader's Digest magazine has a readability index of about 65, Time magazine scores about 52, and the Harvard Law Review has a general readability score in the low 30s. Given the factors in the formula, one can see that the length of the words has a significant effect on the score that results from this test.

Many US governmental agencies, including the Department of Defense, use the FRES test as a standard. They require documents or forms to meet specific readability levels. Some states even require that all insurance forms score 40-50 on the Reading Ease test. This test has become so widely used that it is now incorporated in the extremely popular Microsoft Office tool.2

F.K. Grade Level

This measure translates a 0-100 score to a US grade level,3 making it easier for teachers, parents, librarians and others to judge the readability level of various books and texts. The grade level is calculated as follows.

In a sample document of at least 100 words:

  • Count the number of syllables, words and sentences in the sample
  • Divide the number of sentences into the number of words to get the ASL
  • Divide the number of words into the number of syllables to get the ASW
  • The F.K. Grade Level equals 0.39 x (ASL) + 11.8 x (ASW) – 15.59

A score of 9.2 indicates that the text is understandable by an average American ninth grader.

The F.K. Reading Ease score and F.K. Grade Level score are both available as tools in Microsoft Word.

The Research Study

Research Design

The following test case was constructed to prove or disprove that readability tests such as those described could be incorporated to help control and audit the quality and effectiveness of software requirements.

Several software requirements documents of varying complexity were chosen for the test. A selection of official insurance software and NASA software requirements was utilized. Some of the documents were acclaimed by peer groups, and some were dismissed as poor assessments of functionality. To mix things up, the formatting patterns of the documents were adjusted as well, to determine if this was a scoring factor.

The following readability measures were used to assess and score the documents.

  • Fogg Index
  • F.K. Grade Level
  • F.K. Reading Ease

Readers were also asked to give a qualitative assessment of overall readability.

Research Results

Figure 1 provides a table of the automated reading level complexity measurement factors for all test cases. These are various quantitative measures describing features of the 10 test cases. Each case consisted of a requirements document, which was analyzed using three methods: Fogg Index, F.K. Grade Level and F.K. Reading Ease.

figure 1

The Fogg Index and F.K. Grade Level results presented in figure 2 indicate the grade level required for reading comprehension of each of the test case documents. The F.K. Reading Ease is another measure that assesses a score rather than a grade level. Higher Reading Ease scores are supposed to correlate with ease of reading at lower literacy levels.

figure2

Test subjects (readers/analysts) also read the documents and commented qualitatively on readability and comprehension. Subjects included managers, QA, developers, and business users.

Figure 3 summarizes the accuracy of the reading measures scores when compared to the comments of the readers. The researchers were interested in seeing if "readability," as determined by a standard test, is indicative of the readability and utility of the document to its readers or users.

Figure 3

This table bears the same structure as figure 2, but the cell contents are qualitative indicators reflecting readers' evaluation of the actual readability perceived by them as compared to the quantitative measure of readability arrived at by mechanical application of the reading measures algorithms.

A + indicates substantial agreement between reading measures score and reader evaluation of text.

A +/- indicates agreement between reading measures score and reader evaluation of text by some reader types but not by others.

A - indicates substantial disagreement between reading measures score and reader evaluation of text.

Figure 4 summarizes the qualitative responses of the reader analysts with respect to the fit of the reading measures used to assess readability level as compared to the actual comprehensibility and utility of the documents to readers.

Figure 4

Summary and Conclusions

As the test cases clearly point out, readability formulae can be skewed by many factors, including the formatting of documents, the use of words in clean sentence structures and other issues. However, they can still provide clues to poor documentation, if used in appropriate areas. Readability formulae measure certain features of text that can be subjected to mathematical calculations. The results also show that not all features that promote readability can be measured mathematically. This undoubtedly includes comprehension as well as:

  • Choice to use language that is simple, direct, economic and familiar
  • The omission of needless words
  • The use of sentence structures that are evident and unambiguous
  • The organization and structure of material in an orderly and logical way

Test case 7, which had the highest grade level score under the Fogg and F.K. indices was also considered the model document by the readers. The qualitative results displayed in figure 3 also point to common mismatch between the usability assessments made by managers/developers as compared to business users.

Thus, readability formulae are considered to be predictions of reading ease but are not the only method required to determine readability and usability. Also, they do not help evaluate how well the reader will understand the ideas in the text. Software requirements must be evaluated by a stronger set of tools. Comprehensibility/usability of a document and general readability are not always viewed in the same light by readers. A fast and easy read is not always a useful read.

Audrey Owen, a renowned readability expert, states that when writing for her web site pages, she always uses the scores as guidelines and "writes with the lowest readability numbers possible to get the job done." She also points out that:

The low readability score of that page probably doesn't even occur to anyone who reads it easily. If the reader thinks about it at all, she probably just feels smart and competent because the material is relatively easy to understand when explained in simple terms.

However, when asked about the dilemma of writing critical software requirements for chosen audiences, Owen responded with the following:

One technique that works for some technical situations is to have sidebars with the information that only one group would need. In your case, make sidebars for either the developers or the clients, depending on the context. Another option is to create appendices or separate documents for either to up the score or lower it. As to how you test it, you just test it on real humans and ask comprehension questions to see if they really got it. It's a kind of focus group.4

In this study, more often than not, readability measure scores were found not to be realistic measures for judging overall quality of a software specifications document. These automated measures do not as yet work well with PDF documents. Readability test scores cannot account for missing or misconceived ideas presented in a text.

Discussion Topic:  Alternative Solutions?

Readability tests alone will not satisfactorily provide the control needed to protect the quality of the SRS. Owen suggests that, in this circumstance, additional help is needed, such as the replication of expression for specified audiences or perhaps even human/group critiquing. Each suggestion requires a great deal of manpower and time. Unfortunately, these are the two primary commodities that software developers usually run short of during the development process. However, given the potential risks and exposures of producing poorly documented requirements, the costs, scheduling woes, quality failures, low customer satisfaction and return on investment (ROI), developers may decide that a few more bodies to review requirements or artifact enhancements are indeed warranted.

Solutions that mix mechanization with human intelligence have as many critics as readability tests. However, modern-day options or computer-aided software engineering (CASE) tools, as they are commonly referred to, include artificial intelligence components and allow for a great deal of contextual manipulation.

In a 2001 edition of its widely viewed web-based project management reference, The Chaos Report, The Standish Group referred to requirements tools as follows:

These tools seem to have the biggest impact on the success of a project and if used as a platform for communication (among) all stakeholders, such as executive stakeholders and users, they can provide enormous benefits.

Critics of these tools often cite either the inadequacies or the complexities that are involved as barriers to a project's success and to timely delivery. They have been dismissed by many developers as useless. These developers assert that just because a sentence is structured correctly does not mean that it adds credibility or increases the communicative essence of a requirement. Also, some tools require a great deal of objectoriented knowledge and application. Unfortunately, this level of knowledge is often far too complex for the typical users assigned to vendor software projects.

The Answer

It seems that the actual solution likely depends on the nature of the provider (industry domain, knowledge capacity of staff, size of staff, resource allocations, timing, costs, etc.) ROI calculations must be obtained before a company can truly make an appropriate decision of this magnitude. But, this is a decision that can no longer be taken lightly. It is time for software solution providers to make serious strides to discover and adhere to a culturally suitable strategy that is both repeatable and auditable and that cost-effectively mixes human collaboration with mechanization to validate and control the quality of requirements.

The case tests indicate that the strategy needs to involve comprehension and contextual nuances, which stretch beyond readability measurements. But, the chosen method must be understood by or intuitive to the population that is charged to employ it. Also one must keep in mind that intelligent systems need intelligent input and design; thus, consultants may be required to help with this selection task. Most important, choosing the right strategy must be championed by stakeholders and clients. Recent court battles and fines will likely convince stakeholders to seek effective compliance measures. The initial effort may involve large capital costs; a significant time investment; and a proof-of-concept exploration, advanced training and the help of industry consultants. However, the end result of these endeavors could be well worth the effort.

Endnotes

1 Le Vie, Donn Jr.; Writing Software Requirements Specifications, www.techwr-l.com

2 Johnston, Michelle; "Executing an IT Audit for Sarbanes-Oxley Compliance," Financial Times, FT Press, 17 September 2004

3 Readers from other countries should note that these US grade levels may not correlate to their countries' own systems.

4 IT Audit Services, PricewaterhouseCoopers, www.pwc.com

References

Brooks, Frederick P. Jr.; "No Silver Bullet: Essence and Accidents of Software Engineering," IEEE Computer, vol. 15, no. 1, April 1987, p. 10-18

Gause, Donald C.; Gerald M. Weinberg; Exploring Requirements Quality Before Design, Dorset House Publishing, USA, 1989

Institute of Electrical and Electronics Engineers, IEEE STD 830-1993, "Recommended Practice for Software Requirements Specifications," 2 December 1993

Wiegers, Karl E.; Software Requirements, Microsoft Press, USA, 1999

Sayana, S. Anantha; "Auditing General and Application Controls," Information Systems Control Journal, vol. 5, 2002

Howard A. Kanter, Ed.D., CPA, CITP
is an associate professor in the School of Accountancy & Management Information Systems and the Kellstadt Graduate School of Business, DePaul University, where he is codirector of the Laboratory for Software Metrics. He has been published in a variety of academic, practitioner and online journals and is an ISACA academic advocate. In addition to his academic responsibilities, Kanter actively consults and does research on various issues related to the utilization of information technology in business and government, including the audit of computer-based accounting information systems.

Thomas J. Muscarello, Ph.D.
is an associate professor in DePaul University's School of Computer Science, Telecommunications and Information Systems, where he is codirector of the Laboratory for Software Metrics. He has an extensive background in business, government and academia, and he specializes in the areas of health care systems, knowledge management and data mining. He also has an extensive management background in strategic planning and business system methods, enterprise system metrics, and enterprise application integration. He is founder and served as executive director for the three-year run of the DePaul Technology Incubator facility.

Christopher Ralston
is a senior product manager with CS STARS/MARSH. Ralston has more than 10 years of experience in the insurance industry. His various roles have provided exposure to system requirements elicitation strategies, SDLC methodologies, risk management information systems, human interface design, project management and operations.

Authors' Note:

The authors would like to thank Kevin P. Stevens, D.B.A., CPA, director of the School of Accountancy and Management Information Systems, DePaul University (Chicago, Illinois, USA), for his guidance and support.


Information Systems Control Journal, formerly the IS Audit & Control Journal, is published by the ISACA. Membership in the association, a voluntary organization of persons interested in information systems (IS) auditing, control and security, entitles one to receive an annual subscription to the Information Systems Control Journal.

Opinions expressed in the Information Systems Control Journal represent the views of the authors and advertisers. They may differ from policies and official statements of the Information Systems Audit and Control Association and/or the IT Governance Institute® and their committees, and from opinions endorsed by authors' employers, or the editors of this Journal. Information Systems Control Journal does not attest to the originality of authors' content.

Instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. For other copying, reprint or republication, permission must be obtained in writing from the association. Where necessary, permission is granted by the copyright owners for those registered with the Copyright Clearance Center (CCC), 27 Congress St., Salem, Mass. 01970, to photocopy articles owned by the Information Systems Audit and Control Association Inc., for a flat fee of US $2.50 per article plus 25¢ per page. Send payment to the CCC stating the ISSN (1526-7407), date, volume, and first and last page number of each article. Copying for other than personal use or internal reference, or of articles or columns not owned by the association without express permission of the association or the copyright owner is expressly prohibited.