Beginning COBOL for Programmers - Introduction to COBOL
Return to Beginning COBOL for Programmers, Beginning COBOL for Programmers Index, Beginning COBOL for Programmers Table of Contents, Beginning COBOL for Programmers Preface, Beginning COBOL for Programmers - Introduction to COBOL
COBOL: COBOL Fundamentals, COBOL Inventor - COBOL Language Designer: 1959 by Howard Bromberg, Norman Discount, Vernon Reeves, Jean E. Sammet, William Selden, Gertrude Tierney, with indirect influence from Grace Hopper, CODASYL, ANSI COBOL, ISO/IEC COBOL; Modern COBOL - Legacy COBOL, IBM COBOL, COBOL keywords, COBOL data structures - COBOL algorithms, COBOL syntax, Visual COBOL, COBOL on Windows, COBOL on Linux, COBOL on UNIX, COBOL on macOS, Mainframe COBOL, IBM i COBOL, IBM Mainframe DevOps, COBOL Standards, COBOL Paradigms (Imperative COBOL, Procedural COBOL, Object-Oriented COBOL - COBOL OOP, Functional COBOL), COBOL syntax, COBOL installation, COBOL containerization, COBOL configuration, COBOL compilers, COBOL IDEs, COBOL development tools, COBOL DevOps - COBOL SRE, COBOL data science - COBOL DataOps, COBOL machine learning, COBOL deep learning, COBOL concurrency, COBOL history, COBOL bibliography, COBOL glossary, COBOL topics, COBOL courses, COBOL Standard Library, COBOL libraries, COBOL frameworks, COBOL research, Grace Hopper, COBOL GitHub, Written in COBOL, COBOL popularity, COBOL Awesome list, COBOL Versions. (navbar_cobol)
Beginning COBOL for Programmers - authored by Michael Coughlan
Beginning COBOL for Programmers, 10.1007/978-1-4302-6254-1_1, © Michael Coughlan, 2014
(1) PA, Ireland
When, in 1975, Edsger Dijkstra made his comment that “The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence,”1 he gave voice to, and solidified, the opposition to COBOL in academia. That opposition has resulted in fewer and fewer academic institutions teaching COBOL so that now it has become difficult to find young programmers to replace the aging COBOL workforce 2-3. This scarcity is leading to an impending COBOL crisis. Despite Dijkstra’s comments and the claims regarding COBOL’s imminent death, COBOL remains a dominant force in the world of enterprise computing, and attempts to replace legacy COBOL systems have been shown to be difficult, dangerous, and expensive.
In this chapter, I discuss some of the reasons for COBOL’s longevity. You’re introduced to the notion of an application domain and shown the suitability of COBOL for its target domain. COBOL is one of the oldest computer languages, and the chapter gives a brief history of the language and its four official versions. Later, the chapter presents the evidence for COBOL’s dominance in enterprise computing and discusses the enigma of its relatively low profile.
An obvious solution to the scarcity of COBOL programmers is to replace COBOL with a more fashionable programming language. This chapter exposes the problems with this approach and reveals the benefits of retaining, renovating, and migrating the COBOL code.
Finally, I discuss why learning COBOL and having COBOL on your résumé could be useful additions to your armory in an increasingly competitive job market.
COBOL is a high-level programming language like C, C#, Java, Pascal, or BASIC, but it is one with a particular focus and a long history.
COBOL’s Target Application Domain
The name COBOL is an acronym that stands for Common Business Oriented Language, and this expanded acronym clearly indicates the target domain of the language. Whereas most other high-level programming languages are general-purpose, domain-independent languages, COBOL is focused on business, or enterprise, computing. You would not use COBOL to write a computer game or a compiler or an operating system. With no low-level access, no dynamic memory allocation, and no recursion, COBOL does not have the constructs that facilitate the creation of these kinds of program. This is one of the reasons most universities do not teach COBOL. Because it cannot be used to create data structures such as linked lists, queues, or stacks or to develop algorithms like Quicksort, some other programming language has to be taught to allow instruction in these computer science concepts. The curriculum is so crowded nowadays that there is often no room to introduce two programming languages, especially when one of them seems to offer little educational benefit.
Although COBOL’s design may preclude it from being used as a general-purpose programming language, it is well suited for developing long-lived, data-oriented business applications. COBOL’s forte is the processing of data transactions, especially those involving money, and this focus puts it at the heart of the mission-critical systems that run the world. COBOL is found in insurance systems, banking systems, finance systems, stock dealing systems, government systems, military systems, telephony systems, hospital systems, airline systems, traffic systems, and many, many others. It may be only a slight exaggeration to say that the world runs on COBOL.
COBOL’s Fitness for Its Application Domain
What does it mean to say that a language is well suited for developing business applications? What are the requirements of a language working in the business applications domain? In a series of articles on the topic, Professor Robert Glass4-7 concludes that such a programming language should exhibit the following characteristics: It should be able to declare and manipulate heterogeneous data. Unlike other application domains, which mainly manipulate floating-point or integer numbers, business data is a heterogeneous mix of fixed and variable-length character strings as well as integer, cardinal, and decimal numbers.
It should be able to declare and manipulate decimal data as a native data type. In accounting, bank, taxation, and other [[financial]] applications, there is a requirement that computed calculations produce exactly the same result as those produced by manual calculations. The floating-point calculations commonly used in other application domains often contain minute rounding errors, which, taken over millions of calculations, give rise to serious accounting discrepancies.
The requirement for decimal data, and the problems caused by using floating-point numbers to represent money values, is explored more fully later in this book.
It should have the capability to conveniently generate reports and create a GUI. Just as calculating money values correctly is important for a business application, so is outputting the results in the format normally used for such business output. GUI screens, with their interactive charts and graphs, although a welcome addition to business applications, have not entirely eliminated the need for traditional reports consisting of column headings, columns of figures, and a hierarchy of subtotals, totals, and final totals.
It should be able to access and manipulate record-oriented data masses such as files and databases. An important characteristic of a business application programming language is that it should have an external, rather than internal, focus. It should concentrate on processing data held externally in files and databases rather than on manipulating data in memory through linked lists, trees, stacks, and other sophisticated data structures.
In an analysis of several programming languages with regard to these characteristics, Professor Glass6 finds that COBOL is either strong or adequate in all four of these characteristics, whereas the more fashionable domain-independent languages like Visual Basic, Java, and C++ are not. This finding is hardly a great surprise. With the exception of GUIs and databases, these characteristics were designed into COBOL from the outset.
Advocates of domain-independent languages claim that the inadequacies of such a language for a particular application domain can be overcome by the use of function or class libraries. This is partly true. But programs written using bolted-on capabilities are never quite as readable, understandable, or maintainable as programs where these capabilities are an intrinsic part of the base language. As an illustration of this, consider the following two programs: one program is written in COBOL (Listing 1-1), and the other is written in Java (Listing 1-2).
Listing 1-1. COBOL Version
PROGRAM-ID. SalesTax.
01 beforeTax PIC 999V99 VALUE 123.45.
01 salesTaxRate PIC V999 VALUE .065.
01 afterTax PIC 999.99.
Begin.
COMPUTE afterTax ROUNDED = beforeTax + (beforeTax * salesTaxRate)
DISPLAY “After tax amount is ” afterTax.
Listing 1-2. Java Version (from http://caliberdt.com/tips/May03_Java_BigDecimal_Class.htm )
public class SalesTaxWithBigDecimal
{
public static void main(java.lang.String[] args)
{
BigDecimal beforeTax = BigDecimal.valueOf(12345, 2);
BigDecimal salesTaxRate = BigDecimal.valueOf(65, 3);
BigDecimal ratePlusOne = salesTaxRate.add(BigDecimal.valueOf(1));
BigDecimal afterTax = beforeTax.multiply(ratePlusOne);
afterTax = afterTax.setScale(2, BigDecimal.ROUND_HALF_UP);
System.out.println( “After tax amount is ” + afterTax);
}
}
The programs do the same job. The COBOL program uses native decimal data, and the Java program creates data-items using the bolted-on BigDecimal class (itself an acknowledgement of the importance of decimal data for this application domain). The programs are presented without explanation (we’ll revisit them in Chapter 12; and, if you need it, you can find an explanation there). I hope that, in the course of trying to discover what the programs do, you can agree that the COBOL version is easier to understand—even though you do not, at present, know any COBOL but are probably at least somewhat familiar with syntactic elements of the Java program.
Detailed histories of COBOL are available elsewhere. The purpose of this section is to give you some understanding of the foundations of COBOL, to introduce some of the major players, and to briefly describe the development of the language through the various COBOL standards.
The history of COBOL starts in April 1959 with a meeting involving computer people, academics, users, and manufacturers to discuss the creation of a common, problem-oriented, machine-independent language specifically designed to address the needs of business8. The US Department of Defense was persuaded to sponsor and organize the project. A number of existing languages influenced the design of COBOL. The most significant of these were AIMACO (US Air Force designed), FLOW-MATIC (developed under Rear Admiral Grace Hopper) and COMTRAN (IBM’s COMmercial TRANslator).
The first definition of COBOL was produced by the Conference on Data Systems Languages (CODASYL) Committee in 1960. Two of the manufacturer members of the CODASYL Committee, RCA and Remington-Rand-Univac, raced to produce the first COBOL compiler. On December 6 and 7, 1960, the same COBOL program (with minor changes) ran on both the RCA and Remington-Rand-Univac computers.8
After the initial definition of the language by the CODASYL Committee, responsibility for developing new COBOL standards was assumed by the American National Standards Institute (ANSI), which produced the next three standards: American National Standard (ANS) 68, ANS 74, and ANS 85. Responsibility for developing new COBOL standards has now been assumed by the International Standards Organization (ISO). ISO 2002, the first COBOL standard produced by this body, defines the object-oriented version of COBOL.
Four standards for COBOL have been produced, in 1968, 1974, 1985, and 2002. As just mentioned, the most recent standard (ISO 2002) introduced object orientation to COBOL. This book mainly adheres to the ANS 85 standard; but where this standard departs from previous standards, or where there is an improvement made in the ISO 2002 standard, a note is provided.
The final chapter of the book previews ISO 2002 COBOL. In that chapter, I discuss why object orientation is desirable and what new language elements make it possible to create object-oriented COBOL programs.
The 1968 standard resolved incompatibilities between the different COBOL versions that had been introduced by various producers of COBOL compilers since the language’s creation in 1960. This standard reemphasized the common part of the COBOL acronym. The idea, contained in the 1960 language definition, was that the language would be the same across a range of machines.
COBOL ANS 74 (External Subprograms)
The major development of the 1974 standard was the introduction of the CALL verb and external subprograms. Before ANS 74 COBOL, there was no real way to partition a program into separate parts, and this resulted in the huge monolithic programs that have given COBOL such a bad reputation. In these programs, which could be many tens of thousands of lines long, there was no modularization, no functional partitioning, and totally unrestricted access to any variable in the Data Division (more on divisions in Chapter 2).
COBOL ANS 85 (Structured Programming Constructs)
The 1985 standard introduced structured programming to COBOL. The most notable features were the introduction of explicit scope delimiters such as END-IF and END-READ, and contained subprograms. In previous versions of COBOL, the period (full stop) was used to delimit scope. Periods had a visibility problem that, taken along with the fact that they delimited all open scopes, was the cause of many program bugs. Contained subprograms allowed something approaching procedures to be used in COBOL programs for the first time.
COBOL ANS 2002 (OO Constructs)
Object orientation was introduced to COBOL in the ISO 2002 standard. Whereas previous additions had significantly increased the huge COBOL reserved word list, object orientation was introduced with very few additions.
The Argument for COBOL (Why COBOL?)
As you’ve seen, COBOL is a language with a 50-year history. Many people regard it as a language that has passed its sell-by date—an obsolete language with no relevance to the modern world. In the succeeding pages, I show why, despite its age, programmers should take the time to learn COBOL.
Dominance of COBOL in Enterprise Computing
One reason for learning COBOL is its importance in enterprise computing. Although the death of COBOL has been predicted time and time again, COBOL remains a dominant force at the heart of enterprise computing. In 1997, the Gartner group published a widely reported estimate that of the 300 billion lines of code in the world, 240 billion (80%) were written in COBOL.9 Around the same time, Capers Jones10 identified COBOL as the major programming language in the United States, with a software portfolio of 12 million applications and 605 million function points. To put this in perspective, in the same study he estimated that the combined total for C and C++ was 4 million software applications and 261 million points. According to Jones, each function point requires about 107 lines of COBOL; so, in 1996, the software inventory for the United States contained about 64 billion lines of COBOL code. Extrapolating for the world, the Gartner estimate does not seem outside the realms of possibility.
Of course, the 1990s were a long time ago, and in 1996/97, Java had just been created. You might have expected that as Java came to the fore, COBOL would be eclipsed. This did not happen to any significant extent. Much new development has been done in Java, but the existing inventory of COBOL applications has largely remained unaffected. In an OVUM report in 2005,11 Gary Barnett noted, “Cobol remains the most widely deployed programming language in big business, accounting for 75% of all computer transactions” and “90% of all [[financial]] transactions.” In that report, Barnett estimated that there “are over 200 billion lines of COBOL in production today, and this number continues to grow by between three and five percent a year.”
Even today, COBOL’s position in the domain of business computing does not seem to be greatly eroded. In a survey of 357 IT professionals undertaken by ComputerWorld in 2012,2, 12 54% of respondents said that more than half of all their internal business application code was written in COBOL. When asked to quantify the extent to which languages were used in their organization, 48% said COBOL was used frequently, while only 39% said the same of Java. And as the 2005 OVUM report11 predicted, new COBOL development is still occurring; 53% of responders said that COBOL was still being used for new development in their organization. Asked to quantify what proportion of new code was written in COBOL 27% said that it was used for more than half of their new development.
Although only tangentially relevant to the issue of COBOL’s importance in business computing, one other item of interest came out of the ComputerWorld survey.2, 12 Responders were asked to compare Visual Basic, C#, C++, and Java to COBOL for characteristics such as batch processing, transaction processing, handling of business-oriented features, runtime efficiency, security, reporting, development cost, maintenance cost, availability of programmers, and agility. In every instance except the last two, COBOL scored higher than its more recent counterparts.
Finally, in a May 2013 press release, IBM noted that nearly 15% of all new enterprise application functionality is written in COBOL and that there are more than “200 billion lines of COBOL code being used.13”
Danger, Difficulty, and Expense of Replacing Legacy COBOL Applications
The custodians of legacy systems come under a lot of pressure to replace their legacy COBOL code with a more modern alternative. The high cost of maintenance, obsolete hardware, obsolete software, the scarcity of COBOL programmers, the need to integrate with newer software and hardware technologies, the relentless hype surrounding more modern languages—these are all pressures that drive legacy system modernization in general and language replacement in particular. How is it then that the COBOL software inventory seems largely unchanged?
When a legacy system is considered for modernization, a number of alternatives might be considered: Replacement with a commercial off-the-shelf (COTS) package
Complete rewrite
Wrapping the legacy system to present a more modern interface
Code renovation
Migration to commodity hardware and software
The problem is, experience shows that most modernization attempts that involve replacing the COBOL code fail. Some organizations have spent millions of dollars in repeated attempts to replace their COBOL legacy systems, only to have each attempt fail spectacularly.
Replacement with a COTS Package
Replacement is much harder than it seems. Many legacy COBOL systems implement functionality such as payroll, stock control, and accounting that today would be done by a COTS system. Replacing such a legacy system with a standard COTS package might seem like an attractive option, and in some cases it might be successful; but in many legacy systems, so many proprietary extensions have been added to the standard functionality that replacement is no longer a viable option. Attempting to replace such a legacy system with a COTS package will fail—either completely, causing the replacement attempt to be abandoned; or partially, leading to cost and time overruns and failures in functionality fit.
I know of one instance where a university attempted to replace a COBOL-based Student Record System with a bought-in package as a solution to the Y2K problem. Around September 1999, the school realized that, due to database migration difficulties, the package solution would not be ready in time for the millennium changeover. A successful Y2K remediation of the existing COBOL legacy system was then done, and this bought sufficient time for the new package to be brought on line. Even then, the package only implemented about 80% of the functionality formerly provided by the legacy system.
Complete Rewrite
A complete rewrite in another language is often seen as a viable modernization option. Again, in a restricted set of circumstances, this might be the case. When the documentation created for original legacy system is still available, there is no reason the rewritten replacement should not be as successful as the original. Unfortunately, this happy circumstance is not the case with most legacy systems.
These systems often represent the first parts of the organization to be computerized. They embody the core functionality of the organization; its mission-critical operations; its beating heart. When these systems were created, they replaced the existing manual systems. In the intervening years, the requirements, system architecture, and other documentation have long since been lost. The people who operated the manual system and knew how it worked have either retired or moved on. The rewrite cannot be treated as a greenfield site would be treated, where the requirements could be elicited from stakeholders. For all sorts of legal, customer, and employee reasons, the functionality of the new system must match that of the old. The only source of information about how the system works is embedded in the COBOL code itself. Extracting the business rules from existing legacy code, in order to specify the requirements of the new system, is a very difficult task. The failure rates for most legacy system rewrites are very high.
Automatic language conversion is often touted as a solution to the lack of architectural and functional documentation in legacy systems. You don’t have to know how the system works, goes the mantra; you can just automatically convert it into a more modern language. But converting legacy COBOL code is a much more difficult task than people realize.14 Even if the functionality can be reproduced (and this is highly problematic),3 the resulting code is likely to be an unmaintainable, unreadable mess. It is likely to consist of many more lines of code than the original15 and to retain the idiom or flavor of COBOL. Although such converted software may be written in the syntax of the target language, it will not look like any kind of a program that a programmer in that language would normally produce. Such automatically produced programs14 will be so foreign to those who have to maintain them that they are likely to be received with some hostility.
Some organizations advertise their ability to convert legacy COBOL to another language. This is a given; the questions are: how faithful is the conversion and how maintainable is the converted code? Few if any case studies (where they exist at all) mentioned by these organizations address the maintainability problems that may be expected of code produced by automatic language conversion. Although such conversions may alleviate the shortage of COBOL programmers, they probably cause an increase in maintenance costs. It is doubtful if any of these conversions can be deemed a success.
Approaches to legacy system modernization that involve replacing the COBOL code have not been very successful. They either fail completely and have to be abandoned, fail in terms of cost and deadline overruns, or fail in terms of not delivering on maintainability promises.
Wrapping the Legacy System
Most successful modernization efforts retain the COBOL code. Wrapping the legacy code solves interfacing problems but does not address the cost of maintenance, or hardware or software obsolescence problems. On the other hand, it is cheap, it is safe, and it provides an obvious, and immediate, return on investment (ROI).
Code Renovation
Code renovation addresses the cost-of-maintenance problem but none of the others. It is safe and has very good tool support from both COBOL vendors and third parties, but it does not provide an obvious ROI.
Migration to Commodity Hardware and Software
Migration involves moving the legacy COBOL code to modern commodity hardware and software. This approach has some risks, because the COBOL code may have to be changed to accommodate the new hardware and software. However, there is significant tool support to assist migration, and this greatly mitigates the risk of failure. Many case studies point to the success of the migration approach, as borne out by a 2010 report from the Standish Group.16 This report found that migration and enhancement “stands out as having the highest chance of success and the lowest chance of failure” with the new software development project “six times more likely” and the package replacement project “twice as likely” to fail as migration and enhancement.
Migration solves many of the problems with legacy systems. Obsolescence is addressed by moving to more modern hardware and software. General costs are addressed through the elimination of licensing fees and other costs (in one case study, replacing printed reports with online versions saved $22,000 per year).17-18 Maintenance costs are often also addressed because code renovation usually precedes a migration. However, interfacing with modern technologies might still be a problem, and there remains the problem of the scarcity of COBOL programmers.
Shortage of COBOL Programmers: Crisis and Opportunity
A major issue that prompts companies to attempt replacement of their legacy COBOL with some other alternative is the perceived scarcity of COBOL programmers. Harry Sneed states this baldly: “The reason for this conversion is that there are no COBOL programmers available. Otherwise the whole system could have been left in COBOL.3”He comments that COBOL “is no longer taught in the technical high schools and universities. Therefore, it is very difficult to recruit COBOL programmers. In Austria it is almost impossible to find programmers with knowledge of COBOL. Those few that are left are all close to retirement.” Because of their seniority, they are also more expensive than cheap, young Java programmers.
However, the problem is not that there are no COBOL programmers. Capers Jones estimated that there were 550,000 COBOL programmers in the United States to deal with the Y2K problem.10 Even now, Scott Searle of IBM estimates that the current worldwide population of COBOL programmers is about two million programmers, with about 50,000 of these in India.19 The real problem is that most of the population of COBOL programmers are nearing retirement age. This is a crisis in the making. As already discussed, it is dangerous and expensive to attempt to replace COBOL legacy systems; but when these COBOL programmers retire, who will maintain the legacy systems?
Legacy system stakeholders are gradually waking up to the problem. Since 2008, there has been a gradual increase in awareness of the need to do something about it. COBOL vendors have encouraged academic training of a new crop of COBOL developers. Micro Focus does this through its Micro Focus Academic Program and Academic Alliance programs, and an IBM initiative in this area has resulted in COBOL being taught in 400 colleges and universities around the world.19 In addition, the training companies and in-house training groups that traditionally were the main source of COBOL developers are once more starting to take up the strain. For example, the US Postal Service will start its own COBOL training program as its COBOL programmers retire,20 and the Social Security Administration (SSA)20 in the United States is going the same route. Manta Technologies is reported to be developing a COBOL training series consisting of nine or ten courses.21 The company hopes to complete the series by the end of 2013. Some COBOL vendors like Veryant22 are also providing training courses.
Motivational speakers are often heard to say that the Chinese word for crisis is composed of two characters that represent danger and opportunity. Although there seems to be some doubt about the veracity of this claim, there is no doubt that in the coming years the crisis caused by the tsunami of retiring programmers represents a golden opportunity for those who can grasp it. The number of students earning computing degrees fell sharply after the year 2000, and this led to a programmer shortfall that has made it a seller’s market for computer skills. But student numbers are recovering; and as the job market gets more competitive, having COBOL on your résumé may be a very useful differentiating skill—especially if it is combined with knowledge of Java.
COBOL: The Hidden Asset
The numbers supporting the dominance of COBOL in the business application domain sound incredible. Certainly, a lot of skepticism has been voiced about them on the Internet and elsewhere. But much of the skepticism comes from those who have little or no knowledge of the mainframe arena, an area in which COBOL is strong, if not supreme. You can gain an appreciation for the opposing points of view by reading Jeff Atwood’s post “COBOL: Everywhere and nowhere” and the associated comments. His comment that “I have never, in my entire so-called ‘professional’ programming career, met anyone who was actively writing COBOL code23” is indicative of the problem programmers often have when presented statistics regarding the importance of COBOL. Many of the comments that followed Atwood’s post reflected that disbelief; but as one commentator remarked, “You want to see COBOL? Go look at a company that processes payroll, or handles trucking, food delivery, or shipping. Look at companies that handle book purchase orders or government disbursements or checking account reconciliation. There’s a huge ecosystem of code out there that’s truly invisible to those of us who work in and around the Internet.24”
Many programmers with a conspiracy-theory bent attempt to prove the impossibility of the COBOL statistics by pointing to the number of lines of code that could be produced by programmers in the given time frame, or by pointing to the impossibility of maintaining the claimed number of lines with the estimated number of COBOL programmers. There are a number of answers to these points.
One answer is that the COBOL code inventory has been hugely bulked out by fourth-generation languages (4GLs) and other COBOL-generating software.25 4GLs were all the rage between the 1970s and 1990s, and many produced COBOL code instead of machine code. This was done to give buyers confidence that if the 4GL vendor failed, they would not be left high and dry. In many cases, the vendors did fail, and only the COBOL code was left. In other cases, the programmers took to maintaining the COBOL code directly, and it is now so divorced from the 4GL that there is no point in trying to return to the 4GL code.
Another answer is that programmer productivity seems high because many programs are simply near-copies of existing work. In a legacy system, the enterprise data is often trapped in a variety of storage technologies, from various kinds of database to direct access files and flat files. Nearly every user request to get at that data requires a COBOL program to be written. But these programs are not written from scratch. A programmer creates the program by using the copy, paste, and amend method. The programmer simply copies a similar program, make a few changes, and voilà: a new COBOL program and a big boost to apparent programmer productivity.
If the number of bugs found in legacy systems approached that found in newly minted systems, 2 million programmers might find it very difficult to maintain upwards of 200 billion lines of code. The fact is, though, that unless an environmental change or a user request forces a modification of a legacy system, not much maintenance is required. When a system has been in production for many tens of years, only the blue-moon bugs remain. There is an old joke that goes, “What’s the difference between computer hardware and computer software?” The answer is, “If you use hardware long enough, it breaks. But if you use software long enough, it works.” A real-world manifestation of David Brin’s26 practice effect, perhaps?
Blue-moon bugs are bugs that manifest themselves only as a result of the coincidence of an unusual set of circumstances.
A considerable amount of evidence points to the relatively bug-free status of legacy systems. For instance, when an inventory of software systems was taken in preparation for the Y2K conversion, it was discovered that it had been so long since some of the programs in the inventory had been modified that the source code had been lost. In the opinion of Chris Verhoef, “about 5% of the object code lacks its source code.27”
In his paper “Migrating from COBOL to Java,15” Harry Sneed mentions that 5 COBOL programmers were responsible for 15,486 function points of legacy COBOL whereas 25 Java developers were responsible for 13,207 function points of Java code. Although it might suit COBOL advocates to believe that COBOL developers are five times more efficient than Java developers, a more realistic explanation is that the legacy system had settled into a largely bug-free equilibrium while the newly minted Java code was still awash with them.
COBOL definitely has a visibility problem. The hype that surrounds some computer languages would have you believe that most of the production business applications in the world are written in Java, C, C++, or Visual Basic and that only a small percentage are written in COBOL. In reality, COBOL is arguably the major programming language for business applications.
One reason for COBOL’s low profile lies in the difference between the vertical and horizontal software markets. To use a clothing analogy, an application created for the vertical software market is like a tailored, bespoke suit, whereas an application created for the horizontal software market is like a commodity, off-the-rack suit.
Advantages of Bespoke Software
Why should a company spend millions of dollars to create a bespoke application when it could buy a COTS package? One reason is that because a bespoke application is specifically designed for an organization’s particular requirements, it can be tailored to fit in exactly with the way the business or organization operates. Another reason is that it can be customized to interface with other software the company operates, providing a fully integrated IT infrastructure across the whole organization. Yet another reason is that because the company “owns” the software, the company has control over it. But the primary reason for creating a bespoke application is that it can offer an enterprise a competitive advantage over its rivals. Because a bespoke application can incorporate the business processes and business rules that are specific to the company and that do not exist in any packaged solution, it can offer a considerable advantage over competing companies. Owens and Minor28-29 refer to the specific business rules and processes embedded in their bespoke applications as their “secret sauce.”
An example of the effectiveness of bespoke software is the software that first allowed an airline to offer a frequent-flyer program (air miles). That software conferred such an advantage on the airline that competitors were forced to catch up, and frequent-flyer programs are now almost ubiquitous.
Characteristics of COBOL Applications
Software produced for the vertical software market has characteristics that distinguish it from the commodity software you are probably more familiar with. This section examines some characteristics of COBOL applications that you may find surprising.
COBOL Applications Can Be Very Large
Many COBOL applications consist of more than 1 million lines of code, and applications consisting of 6 million lines or more are not considered unusually large in many programming shops: In “Revitalizing modifiability of legacy assets,30” Niels Veerman mentions a banking company that had “one large system of 2.6 million LOC in almost 1000 programs.”
The Irish Life Group, Ireland’s leading life and pensions company, is reported31 to have completed a legacy system migration project to rehost 3 million lines of COBOL code.
A Microsoft case study reported that Simon & Schuster had a code inventory of some 5 million lines of COBOL code.32
The Owens and Minor case study mentioned earlier reported that “the company ran its business on 10 million lines of custom COBOL/CICS code.29”
In his paper “A Pilot Project for Migrating COBOL Code to Web Services,” Harry Sneed reported a “legacy life insurance system with more than 20 million lines of COBOL code running under IMS on the IBM mainframe.33”
The authors of “Industrial Applications of ASF+SDF” talk about a large suite of mainframe-based COBOL applications that consist of 25,000 programs and 30 million lines of code.34
An audit report by the Office of the Inspector General in 2012 noted that as of June 2010, the US SSA had a COBOL code inventory of “over 60 million lines of COBOL code.35”
The Bank of New York Mellon is quoted as having a software inventory of 112,500 Cobol programs consisting of 343 million lines of code.2
Kwiatkowski and Verhoef report a case study where “a Cobol software portfolio of a large organization operating in the [[financial]] sector” consisted of over “18.2 million physical lines of code (LOC).25”
COBOL Applications Are Very Long-Lived
The huge investment in creating a software application consisting of millions of lines of COBOL code means the application cannot simply be discarded when a new programming language or technology appears. As a consequence, business applications between 10 and 30 years old are common, and some have been in existence for around 50 years.
A Microsoft case study on the Swedish company Stockholmshem noted that its computer system “was created in 1963 and had been expanded over the years to include roughly 170 online Customer Information Control System (CICS)/COBOL programs and 370 batch COBOL programs.36”
Kwiatkowski and Verhoef25 published a version log (reproduced in Figure 1-1) for a module in the software portfolio of a large [[financial]] organization that illustrates the longevity of COBOL programs. Each line of the log is a comment that shows a version number, the name of a programmer, and the date the software was modified. The log shows that maintenance of this module started in 1975. Nor was this the oldest module found. That honor belonged to a program that had been written in 1967. For some readers of this book, the software in this portfolio started life long before they were born.
Figure 1-1.COBOL module version log. Published in “Recovering Management Information from Source Code,” Kwiatkowski and Verhoef 25
The longevity of COBOL applications can also be held largely accountable for the predominance of COBOL programs in the Y2K problem (12,000,000 COBOL applications versus 1,400,000 C++ applications in the United States alone).10 Many years ago, when programmers were writing these applications, they just did not anticipate that the software would last into this millennium.
COBOL Applications Often Run in Critical Areas of Business
COBOL is used for mission-critical applications running in vital areas of the economy. Datamonitor reports that 75% of business data and 90% of [[financial]] transactions are processed in COBOL.37 The serious [[financial]] and legal consequences that can result from an application failure is one of the reasons for the near panic over the Y2K problem.
COBOL Applications Often Deal with Enormous Volumes of Data
COBOL’s forte is file and record processing. Single files or databases measured in terabytes are not uncommon. The SSA system mentioned earlier, for instance, manages over 1 petabyte (1 petabyte = 1,000 terabytes = 1,000,000 gigabytes) of data,38 and “Terabytes of new data come in daily.39”
Although COBOL is a high-level programming language, it is probably quite unlike any language you have ever used. A genealogical tree of programming languages usually places COBOL by itself with no antecedents and no descendants. Occasionally a tree might include FLOW-MATIC and COMTRAN or might show a connection to PL/I (because that language incorporated some COBOL elements). By and large though, COBOL is unique. So even though COBOL supports the familiar elements of a programming language such as variables, arrays, procedures, and selection and iteration control structures, these familiar elements are implemented in an unfamiliar way. It’s like going to a foreign country and finding that your rental car uses a stick shift and people drive on the other side of the road: disconcerting.
This section examines some of the general characteristics of COBOL that distinguish it from languages with which you might be more familiar.
The most obvious characteristic of COBOL programs is their textual, rather than mathematical, orientation. One of the design goals for COBOL was to make it possible for non-programmers such as supervisors, managers, and users to read and understand COBOL code. As a result, COBOL contains such English-like structural elements as verbs, clauses, sentences, sections, and divisions. As it happens, this design goal was not realized. Managers and users nowadays do not read COBOL programs. Computer programs are just too complex for most nonprofessionals to understand them, however familiar the syntactic elements. But the design goal and its effect on COBOL syntax had one important side effect: it made COBOL the most readable, understandable, and self-documenting programming language in use today. It also made it the most verbose.
It is easy for programmers unused to the business programming paradigm, where programming with a view to ease of maintenance is very important, to dismiss the advantage of COBOL’s readability. Not only does this readability generally assist the maintenance process, but the older a program gets, the more valuable readability becomes.
When programs are new, both the in-program comments and the external documentation accurately reflect the program code. But over time, as more and more revisions are applied to the code, it gets out of step with the documentation until the documentation is actually a hindrance to maintenance rather than a help. The self-documenting nature of COBOL means this problem is not as severe with COBOL as it is with other languages.
Readers who are familiar with C, C++, or Java might want to consider how difficult it becomes to maintain programs written in these languages. C programs you wrote yourself are difficult enough to understand when you return to them six months later. Consider how much more difficult it would be to understand a program that was written 15 years previously, by someone else, and which had since been amended and added to by so many others that the documentation no longer accurately reflected the program code. This is a nightmare awaiting maintenance programmers of the future, and it is already peeking over the horizon.
As a computer language, COBOL evolves with near-glacial slowness. The designers of COBOL do not jump on the bandwagon of every new, popular fad. Changes incorporating new ideas are made to the language only when the new idea has proven itself.
Since its creation in 1960, only four COBOL standards have been produced: ANS 68 COBOL: Resolved incompatibilities between different COBOL versions
ANS 74 COBOL: Introduced the CALL verb and external subprograms
ANS 85 COBOL: Introduced structured programming and internal subprograms
ISO 2002 COBOL: Introduced object orientation to COBOL
Enterprises running mission-critical applications are unsurprisingly suspicious of change. Many of these organizations stay one version behind the very slow leading edge of COBOL. It is only now that the 2002 version of COBOL has been specified that many will start to move to the 1985 standard. This is one reason this book mainly adheres to the ANS 85 standard.
Conscious of the long life of COBOL applications, backward compatibility has been a major concern of the ANSI COBOL Committee. Very few language elements have been dropped from the language. As a result, programs I wrote in the 1980s for the DEC VAX using VAX COBOL compile, with little or no alteration, on the Micro Focus Visual COBOL compiler. Java, although only created in 1995, is now on its seventh version and already has a very long list of obsolete, deprecated, and removed features. In the years since its creation, Java has removed more language features than COBOL has in the whole of its 50-year history.
COBOL is a simple language (until the most recent version, it had no pointers, no user-defined functions, and no user-defined types). It encourages a simple, straightforward programming style. Curiously enough, though, despite its limitations, COBOL has proven itself well suited to its target problem domain (business computing). Most COBOL programs operate in a domain where the program complexity lies in the business rules that have to be encoded rather than in the sophistication of the data structures or algorithms required. In cases where sophisticated algorithms are needed, COBOL usually meets the need with an appropriate verb such as SORT or SEARCH.
Earlier in this book, I noted that the limitations of COBOL meant it could not be used to teach computer science concepts. And in the previous paragraph, I noted that COBOL is a simple language with a limited scope of function. These comments pertain to versions of COBOL prior to the ANS 2002 version. With the introduction of OO COBOL, everything has changed. OO COBOL retains all the advantages of previous versions but now includes the following: User-defined functions
Cultural adaptability (locales)
Dynamic memory allocation (pointers)
Data validation using the new VALIDATE verb
Binary and floating-point data types
COBOL Is Nonproprietary
The COBOL standard does not belong to any particular vendor. It was originally designed to be a “machine independent common language8” and to be ported to a wide range of machines. This capability was demonstrated by the first COBOL compilers when the same program was compiled and executed on both the RCA and the Remington-Rand-Univac computers.8 The ANSI COBOL committee, and now the ISO, define the non-vendor-specific syntax and semantic language standards. COBOL has been ported to virtually every operating system, from every flavor of Windows to every flavor of Unix; from IBM’s VM, zOS, and zVSE operating systems, to MPE, MPE-iX, and HP-UX on HP machines; from the Wang VS to GCOS on Bull machines. COBOL runs on computers you have probably never heard of, such as the Data General Nova, SuperNova, and Eclipse MV series; the DEC PDP-11/70 and VAX; the Univac 9000s and the Unisys 2200s; and the Hitachi EX33 and the Bull DPX/20.
COBOL has a 50-year proven track record for application production, maintenance, and enhancement. The indications from the Y2K problem that COBOL applications were cheaper to fix than applications written in more recent languages ($28 per function point versus $35 for C++ and $65 for PL/1]]) have been supported by the 2012 ComputerWorld survey12 and the 2011/12 CRASH Report.40 When comparing COBOL maintenance costs to those of Visual Basic, C#, C++, and Java, the ComputerWorld survey reported that 72% of respondents found that COBOL was just as good (29%) as these languages or better (43%). Similarly, the CRASH Report found that COBOL had the lowest technical debt (defined in the report as “the effort required to fix problems that remain in the code when an application is released”) of any mainstream language, whereas Java-EE, averaging $5.42 per LOC, had the highest.
One reason for the maintainability of COBOL programs was mentioned earlier: the readability of COBOL code. Another reason is COBOL’s rigid hierarchical structure. In COBOL programs, all external references, such as references to devices, files, command sequences, collating sequences, the currency symbol, and the decimal point symbol, are defined in the Environment Division.
When a COBOL program is moved to a new machine, has new peripheral devices attached, or is required to work in a different country, COBOL programmers know that the parts of the program that will have to be altered to accommodate these changes will be isolated in the Environment Division. In other programming languages, programmer discipline might ensure that the references liable to change are restricted to one part of the program but they could just as easily be spread throughout the program. In COBOL programs, programmers have no choice. COBOL’s rigid hierarchical structure ensures that these items are restricted to the Environment Division.
Unfortunately, the leaders of the computer science community have taken a very negative view of COBOL from its very inception and therefore have not looked carefully enough to see what good ideas are in there which could be further enlarged, expanded or generalized.
Jean Sammet, “The Early History of COBOL,”
ACM Sigplan Notices 13(8), August 1978
The problem with being such an old language is that COBOL suffers from 50 years of accumulated opprobrium. Criticism of COBOL is often based—if it is based on direct experience at all—on programs written 30 to 50 years ago. The huge monolithic programs, the tangled masses of spaghetti code, and the global data are all hallmarks of COBOL programs written long before programmers knew better. They are not characteristic of programs written using more modern versions of COBOL.
Critics also forget that COBOL is a domain-specific language and criticize it for shortcomings that have little relevance to its target domain. There is little acknowledgement of how well suited COBOL is for that domain. The performance of COBOL compared to other languages in recent surveys underlines its suitability. The 2012 ComputerWorld survey12 compared COBOL with Visual Basic, C#, C++, and Java and reported that, among other things, respondents found it better in terms of batch processing, transaction processing, handling business-oriented features, and maintenance costs. Nor is this a one off: similar results have been reported by other surveys.
There is enormous pressure to replace COBOL legacy systems with systems written in one of the more fashionable languages. The many failures that have attended replacement attempts, however, have given legacy system stakeholders pause for thought. The well-documented dangers of the replacement approach and the relative success of COBOL system migration is leading to a growing reassessment of options. Keeping the COBOL codebase is now seen as a more viable, safer, cheaper alternative to replacement. But this reassessment reveals a problem. Keeping, and even growing, the COBOL codebase requires COBOL programmers, and the COBOL workforce is aging and nearing retirement.
For some years now, programmers have luxuriated in a seller’s market. The demand for programmers has been far in advance of the supply. But student numbers in computer science courses around the world are recovering from the Y2K downturn. As these graduates enter the job market, it will become more and more competitive. In a competitive environment, programmers may find that having a résumé that includes COBOL is a useful differentiator.
1. Dijkstra EW. How do we tell truths that might hurt? ACM SIGPLAN Notices. 1982; 17(5): 13–15. http://doi.acm.org/10.1145/947923.947924
doi: 10.1145/947923.947924 . Originally issued as Memo EWD 498. 1975 Jun.
2. Mitchell RL. Brain drain: where Cobol systems go from here. ComputerWorld. 2012 Mar 14. www.computerworld.com/s/article/9225079/Brain_drain_Where_Cobol_systems_go_from_here_
3. Sneed HM, Erdoes K. Migrating AS400-COBOL to Java: a report from the field. CSMR 2013. Proceedings of the 17th European Conference on Software Maintenance and Reengineering; 2013; Genova, Italy. CSMR; 231–240.
4. Glass R. Cobol—a contradiction and an enigma. Commun ACM. 1997; 40(9): 11–13.
5. Glass R. How best to provide the services IS programmers need. Commun ACM. 1997; 40(12): 17–19.
6. Glass R. COBOL: is it dying—or thriving? Data Base Adv Inf Sy. 1999; 30(1).
7. Glass R. One giant step backward. Commun ACM. 2003; 46(5): 21–23.
8. Sammet J. The early history of COBOL. ACM SIGPLAN Notices. 1978; 13(8) 121–161.
9. Brown GDeW. COBOL: the failure that wasn’t. COBOL Report; 1999. CobolReport.com (now defunct)
10. Jones C. The global economic impact of the Year 2000 software problem. Capers Jones. 1996; version 4.
11. Barnett G. The future of the mainframe. Ovum Report. 2005.
12. ComputerWorld. COBOL brain drain: survey results. 2012 Mar 14. www.computerworld.com/s/article/9225099/Cobol_brain_drain_Survey_results
13. Topolski E. IBM unveils new software to enable mainframe applications on cloud, mobile devices. IBM News Room. 2012 May 17. www-03.ibm.com/press/us/en/pressrelease/41095.wss
14. Terekhov AA, Verhoef C. The realities of language conversions. Software, IEEE. 2000; 17(6): 111,124.
15. Sneed HM. Migrating from COBOL to Java. ICSM 2010. Proceedings of International Conference on Software Maintenance; 2010; Timisoara, Romania. IEEE; 1-7.
16. The Standish Group. Modernization: clearing a pathway to success. Report. Boston: The Group; 2010.
17. Organizational tool manufacturer cuts costs by 94 percent with NetCOBOL and NeoTools. Microsoft. 2011. www.gtsoftware.com/resource/organizational-tool-manufacturer-cuts-costs-by-94-percent-with-netcobol-and-neotools/
18. Productivity tools maker cuts costs 94% with move from mainframe to Windows. Microsoft. 2009 Jul. www.docstoc.com/docs/81151637/Daytimer_MainframeMigration
19. Waters J. Testing mainframe code on your laptop. WatersWorks blog, Application Development Trends (ADT). 2010 Jul 27. http://adtmag.com/blogs/watersworks/2010/07/ibm-mainframes-cobol-recruits.aspx
20. Robinson B. COBOL remains old standby at agencies despite showing its age. Federal Computer Week. 2009 Jul 9. www.fcw.com/Articles/2009/07/13/TECH-COBOL-turns-50.aspx
21. Thomas J. Manta’s IBM i COBOL training trifecta. IT Jungle. 2012 Oct 22. www.itjungle.com/tfh/tfh102212-story10.html
22. Veryant announces new COBOL training class. Veryant. 2012 Apr. www.veryant.com/about/news/cobol-training-class.php
23. Atwood J. COBOL everywhere and nowhere. Coding Horror. 2009 Aug 9. www.codinghorror.com/blog/2009/08/cobol-everywhere-and-nowhere.html
24. Campbell G. 2009 Aug 10. Comment on Atwood J. COBOL everywhere and nowhere. Coding Horror. 2009 Aug 9. www.codinghorror.com/blog/2009/08/cobol-everywhere-and-nowhere.html
25. Kwiatkowski ŁM, Verhoef C. Recovering management information from source code. Sci Comput Program. 2013; 78(9): 1368-1406.
26. Brin D. The practice effect. 1984. Reprint, New York: Bantam Spectra; 1995.
27. Verhoef C. The realities of large software portfolios. 2000 Feb 24. www.cs.vu.nl/∼x/lsp/lsp.html
28. Case study: Owens & Minor. Robocom. 2011. www.robocom.com/Portals/0/Images/PDF/Owens%20&%20Minor%20Case%20Study.pdf
29. Medical supply distributor avoids costly ERP replacement with migration to Windows Server and SQL Server. Microsoft. 2010 Feb. www.docstoc.com/docs/88231164/Medical-Supply-Distributor-Avoids-Costly-ERP-Replacement-with
30. Veerman N. Revitalizing modifiability of legacy assets. J Softw Maint Evol-R. 2004; 16: 219–254.
31. Holloway N. Micro Focus International plc: Irish Life delivers cost savings and productivity gains through application modernzation program with Micro Focus. 4-Traders.com . 2013 May 30. www.4-traders.com/MICRO-FOCUS-INTERNATIONAL-12467060/news/Micro-Focus-International-plc-Irish-Life-Delivers-Cost-Savings-and-Productivity-Gains-through-Appl-16916097/
32. Mainframe-to-Windows move speeds agility up to 300 percent for global publisher. Microsoft. 2007 Sep. www.platformmodernization.org/microsoft/Lists/SuccessStories/DispForm.aspx?ID=6&RootFolder=%2Fmicrosoft%2FLists%2FSuccessStories
33. Sneed H. A pilot project for migrating COBOL code to web services. Int J Softw Tools Tech Transf. 2009; 11(6): 441–451.
34. Brand M, Deursen A, Klint P, Klusener AS, Meulen E. Industrial applications of ASF+SDF. Amsterdam, The Netherlands: CWI; 1996. Technical report. Also Wirsing M, editor. AMAST’96. Proceedings of the Conference on Algebraic Methodology and Software Technology; 1996; Munich, Germany. Springer-Verlag; 1996.
35. Social Security Administration. The Social Security Administration’s software modernization and use of common business oriented language. Audit Report. Office of the Inspector General, Social Security Administration. 2012 May. http://oig.ssa.gov/sites/default/files/audit/full/pdf/A-14-11-11132_0.pdf
36. Property firm migrates from mainframe to Windows, cuts costs 60 percent, ups speed. Microsoft. 2006 Jul. http://cloud.alchemysolutions.com/case-studies/Watch-Stockholmshem-describe-the-modernization-experience
Or www.gtsoftware.com/resource/property-management-firm-migrates-from-mainframe-to-windows-cuts-costs-60-percent-ups-speed/
Or http://download.microsoft.com/documents/customerevidence/27759_Stockholmshem_migration_case_study.doc
37. Datamonitor. COBOL—continuing to drive value in the 21st century. Datamonitor; 2008 Nov. Reference code CYBT0006.
38. National Council of Social Security Management Associations Transition White Paper. 2008 Dec. http://otrans.3cdn.net/bfb27060430522c5ae_n0m6iyt3y.pdf
39. Hoover JN. Stimulus funds will go toward new data center for Social Security Administration. InformationWeekUK. 2009 Feb 28. www.informationweek.co.uk/internet/ebusiness/stimulus-funds-will-go-toward-new-data-c/214700005
40. Executive Summary—The CRASH report, 2011/12. CAST. 2012. www.castsoftware.com/re[[search-labs/crash-reports