Unbreakable: computer software

Embedded-computing pioneer QNX makes software systems that keep going, and going and going.

In the computer marketplace, the future is anywhere but your desktop. Embedded systems, the computers in cellphones, cars, elevators, televisions, medical equipment — pretty much anything with an On switch, these days — far outnumber general-purpose computers. According to the Semiconductor Industry Association, based in San Jose, Calif., 96% of all processors made today are destined for embedded devices. Over seven billion embedded processors — more than there are humans on earth — were sold last year.

As they accumulate, it seems less and less like they're embedded in our world and more and more like we're embedded in theirs. Get into a new car and you'll be surrounded by at least 30 (in luxury models, more like 100) embedded systems controlling everything from airbags and brakes to navigation and in-car multimedia. Embedded systems control the robots that assemble that car and the generating stations that power the robots. They control traffic lights on city streets and freeway monitors that warn you of congestion ahead. They're making “ubiquitous computing,” a term coined in 1988 to describe a future in which computers are seamlessly integrated into almost everything around us, a reality. As the devices have become more powerful, the processing has evolved from simple monitoring and switching (think thermostat) into true computing: running multiple processes simultaneously, updating themselves, communicating with other devices (think GPS unit). All of that activity must be overseen by an operating system, or OS.

Like Windows? Yes, except that in the middle of surgery, rebooting your anesthesia monitor is not an option (and gives “blue screen of death,” the popular term for a Windows crash, a whole new meaning). It is possible, however — Microsoft users, brace yourselves — for complex operating systems to run non-stop for years, even decades, without a single crash, lock-up or reboot. And when it comes to reliability, nothing outperforms Neutrino, an embedded operating system developed by QNX Software Systems of Kanata, Ont.

One customer has run QNX for more than 20 years without a reboot, and 10 years of non-stop operation is common. “QNX systems just stay on, never needing rebooting, never failing,” says Richard Chylinski, a software development manager with Delcan Corp., an engineering firm in Markham, Ont. Delcan uses QNX in computers deployed along highways, bridges and tunnels to monitor and control traffic. “When they put our systems in, people know they'll work.”

QNX was about reliability from the start. Dan Dodge and Gordon Bell met as undergrads at the University of Waterloo when a mutual friend, noting that both were building computers in their dorm rooms, introduced them. They became friends. In 1980, betting that the future was in embedded systems, the new graduates started QNX. “We said if we're going to go after the machines that run the industrial automation process of a country, run the transportation systems, are used in medical instrumentation, then failure had to be our No. 1 concern,” says CEO Dodge, 52. “What we did differently from just about everybody else is we designed an operating system which has the capability to recover from faults. We call it self-healing technology.”

The pair did it by basing their OS on a radical design. An operating system's heart is its kernel, the software that manages resources such as the CPU, memory and applications. Most operating systems, then as now, were based on a “monolithic” kernel that contains code for file-handling, network access and the various device drivers that interface with discs, screens, mice and other hardware. Operating systems crash when something goes wrong in the kernel, and in a monolithic kernel there is an awful lot to go wrong.

The QNX approach was to write a “microkernel” that handled only the bare minimum of tasks and turn all those device drivers and file handlers into completely separate, external tasks. Pulling everything possible out of the kernel meant that if one of the now-external processes died, the kernel would not only keep running, but it could restart the failed software in a few microseconds, the feature Dodge calls self-healing. “Cars, for example, are very hostile environments for computer systems,” says Paul Leroux, a former technology analyst at QNX and now its public relations manager. “You have all kinds of electromagnetic interference, and when the driver goes out of its mind” — he's referring to a piece of software, not the person behind the wheel — “and it's playing an MP3, the operating system can just restart it. You may hear a slight click, but it keeps running.” Leroux tells of QNX customers who thought their systems were performing flawlessly until it occurred to someone to glance at error logs. Only then did they realize programs had actually been failing continually. Each failure would have crashed a monolithic OS, but the QNX microkernel simply restarted the culprit without a hiccup.

A microkernel has other advantages. Software components that aren't needed can be “unplugged” to save memory. Drivers, since they're separate from the kernel, can be updated on-the-fly, without stopping and restarting the system. That can be significant in industrial automation, for example, where shutting down a production line is expensive. Finally, there's a consistency factor. “When you're working with any other approach, like Linux or Windows, the kernel's always changing. As soon as anyone starts a new driver, they've changed the kernel — it's never stable,” Leroux says. “When you're using our microkernel, you're using the same binary we're running and testing here all the time, and the same one all the other customers are using.”

Throughout the 1980s and 1990s, QNX quietly grew, along with its reputation for handling hard real-time applications. “Hard” means that although the OS might be receiving a billion signals a second from the device in which it is embedded, missing a single one would lead to catastrophic failure. In other words, the system must perform with predictable faultlessness, and millions of QNX systems around the world are doing just that. They keep French high-speed trains from derailing on tight bends, they monitor nuclear reactors, they're at the heart of air-traffic control systems, laser eye-surgery units, and electronic gambling machines in Las Vegas. (OK, so not all catastrophic failures are equally catastrophic.) QNX controls the Cisco CRS-1, the world's highest-capacity Internet router, camera systems on the space shuttle and the international space station, and GMount, the world's most southerly telescope, located 500 metres from the South Pole.

QNX has 265 employees and offices in Britain, France, Germany, Japan, Korea and the U.S. Annual revenue is estimated (the company doesn't disclose financials) in the $30-million range, which puts it, Leroux says, in the Top 5 vendors of embedded operating systems. Bell retired in 2005, but Dodge stays close to his roots: software design. “I am involved in every major technical initiative,” says Dodge, who is not only CEO but also chief technical officer. “I still write code, sometimes significant pieces of code within the operating system. I love to do it. It keeps me sane. I don't want to become one of those companies where the people at the top don't really understand what they sell.”

QNX's largest user base is in industrial automation, which still supplies most of its revenue. But the fast-growing networking and automotive markets will soon change that. By 2015, an astonishing 35% of the value of a new car will be in its electronics. QNX would like to dominate that market — but so would Microsoft. “The thing you learn when you've been in business this long is that Microsoft — never — gives — up,” Dodge emphasizes. “They have the financial resources. They can fail again and again, and they keep coming back. The first OS they brought into the automotive market, it was awful. What did they do? They rewrote it and brought a second version in. It was bad, but it wasn't as terrible. So they scrapped that and brought out a third.”

Over the years, Dodge and Bell turned down a parade of suitors that Leroux calls “a who's who of the IT business.” The threat from Microsoft forced them to reconsider. In October 2004, QNX accepted a US$138-million buyout from Harman International Industries Inc., an audio (Harman Kardon, JBL) and electronics giant based in a Washington that was looking for a software partner in the embedded automotive market. “Now I've got the dollars to compete with Microsoft,” says Dodge.

So far, the move has worked out. “We went head-to-head against Microsoft in automotive and they pretty much laughed at us,” says Dodge. “At the end of the day, we've bested them. We're actually shipping more car models than they are and we have tremendous traction out there.”

With Neutrino in 140 models, QNX is currently the acknowledged market leader for high-end automotive applications, according to Robert Cavin, an analyst with Frost & Sullivan, a consultancy in San Antonio, Texas. “These things have to operate no matter what,” Cavin says. “The reliability of QNX's operating system is the driving force for their penetration into various vehicle systems.”

Neutrino's reliability, its ability to self-heal and its performance in hard real-time applications are direct outcomes of Dodge and Bell's faith in a still-experimental OS design. Their decision, taken at a time when the term “microkernel” had not yet been coined, led to the world's first commercial microkernel operating system. But why does it remain the only one? “The big OS vendors are so totally invested in their old architectures, they would have to start from scratch, and none of them to date have been willing to make that investment,” Dodge says. If he were Bill Gates, he adds, “I would have made that gamble and gone for it.”

Bill Gates, on the other hand, is probably content to rely on what experience has proven time and again: having the best technology, even for a technology company, is not the only path to success.