NEC Research Institute: An Industrial Lab with a Basic Mission

December 16, 1998

C. William Gear, a former president of SIAM, has been president of the NEC Research Institute since 1992. Institute researchers, he says, "work on what they choose; they're scientists." The lack of direction notwithstanding, mainstream and "blue-sky" focus areas have been identified for both of the institute's divisions-computer science and physics.
In 1989, Japan's NEC Corporation established a basic research facility---the NEC Research Institute---just outside Princeton, New Jersey. SIAM, in nearby Philadelphia, followed developments with particular interest. The new institute's mission---to provide fundamental research to its giant computer and communications parent company---made it one of a (diminishing) handful of corporate research labs in the U.S. dedicated to unfettered basic research.

One of the key players at NEC Research from the beginning, moreover, was C. William Gear, SIAM president in 1987-88, a former SIAM vice president for publications and editor-in-chief of SIAM Journal on Scientific Computing, and (in 1971) developer of DIFSUB, the automatic integration program for ODEs described in the issue of SISC dedicated to Gear on his 60th birthday as "undoubtedly the most important ODE program ever written and [possibly] the most significant single numerical program ever written." In January 1989, having retired from the University of Illinois at Urbana-Champaign after 28 years, five as chair of the computer science department, Gear moved to NEC Research as vice president for computer science.

Interest in NEC Research has been widespread, and intense, from the beginning. Most obviously, a major corporation was announcing its intention to build, from scratch, a world-class basic research lab. The goal was a steady state of 128 employees, with the technical members of the staff evenly distributed between Gear's computer science division and the physical science division. With the NEC Research Institute, some extremely attractive research positions had been created, with a key word in the job description being "freedom."

Seeking a first-hand glimpse of the research under way at the now nearly ten year old institute, which has been a corporate member of SIAM since 1990, SIAM News paid a visit one day in early July.

Day-After-Tomorrow Research
The day begins in Gear's office with a brief overview of the institute. As an independent research company that is wholly owned by NEC, Gear explains, NEC Research has a single contract, to provide basic research for its parent company. "NEC is big on research for today, tomorrow, and the day after tomorrow," he says; "we're for the day after tomorrow." NEC funds the lab "to be told what they want," Gear says; "basic research protects the company from surprises." Later in the day, computer scientist Peter Yianilos will elaborate: "NEC wants at least to have a small part in things if the world changes. . . . It's a low-probability event, but they're interested in being in on it."

NEC puts about 10% of its revenue into R&D, most of which is done within the business groups. A central R&D group accounts for about 1% of corporate revenue, of which about one tenth is devoted to basic research; the NEC Research Institute, Gear estimates, receives about two thirds of the portion earmarked for basic research.

Although there is no model for a corporate research lab, NEC Research is clearly patterned to some extent on the former AT&T Bell Laboratories. As long-time Bell Labs researchers, two of the prime motivators, Dawon Kahng and Michiyuki Uenohara, were well aware of how basic research had paid off for AT&T. Uenohara left Bell Labs for NEC, in Japan, where he eventually became head of R&D; he was able to convince NEC that a basic research lab was important to some of its long-term goals, in particular more intelligent computing and communications. He enlisted the help of his friend Dawon Kahng, by then retired from Bell Labs, to plan the new laboratory. Given the traditional U.S. strength in basic research, the institute was established in the U.S., with Kahng as president. Kahng died in 1992, and Gear became president, the position he holds today.

The institute operates on a "rolling five-year plan," Gear says; he meets with the parent company every fall to decide on the current plan. Asked whether he speaks Japanese, he replies that he is studying the language, but only for personal reasons. The parent company is adamant, he says, that NEC Research be an American company.

Today, NEC Research employs approximately 40 principal scientists and an equal number of collaborating scientists, divided just about equally between computer science and physical science. Virtually all of the scientists have PhDs. Physicists tend to come in at a more senior level than the computer scientists, many of whom are hired just after receiving their doctorates and, Gear complains good-naturedly, "in some cases don't know how good they have it."

And how good is that? Institute researchers "work on what they choose," Gear says; "they're scientists." Each principal scientist receives a budget, depending on his or her level. Management gives additional funds to areas it considers especially promising. Researchers get travel budgets; they are urged but not required to collaborate with scientists at the parent company in Japan and at other institutions. All results, Gear emphasizes, can be published in the open scientific literature.

The hope, Gear says, is that the work of NEC Research will contribute in the long term to NEC, with the contributions mainly taking the form of new core computer and communications technologies rather than new products. Two broad areas, one fairly mainstream and the other in the "blue-sky, wishful-thinking" category, have been identified for each of the two divisions of NEC Research; the goal is for the institute "to be the best there is" in those areas. In computer science, the core area is distributed and parallel computing, including languages and systems; the blue-sky area is intelligence. "AI has moved a lot in the last few years," Gear says; in fact, the current vice president for computer science, David Waltz, is president of the American Association for Artificial Intelligence. In physics, the core and far-out areas of emphasis are nanoscience and biophysics (largely sensory systems of insects), respectively.

For SIAM News, Gear has scheduled visits with researchers working in a variety of areas, both in computer science and in physics.

High-profile Web Analysis
As requested by SIAM News, the visit includes conversations with computer scientists Steve Lawrence and Lee Giles, whose recent work has drawn unexpected media attention, from, among many others, The New York Times, The Wall Street Journal, and National Public Radio. In a rigorous statistical study of the Web, Lawrence and Giles analyzed how much of the Web was indexed by six commonly used search engines. In the process, they found the Web to contain far more pages---about 320 million---than commonly believed. (The largest previous estimate was about 200 million pages, although, Lawrence points out, providers of search engines obviously have a vested interest in lower estimates, which make their coverage look better.)

NEC Research computer scientists Steve Lawrence and Lee Giles assessed the performance of common Web search engines and in the process estimated the size of the Web at about 320 million pages (as compared with the largest previous estimate of 200 million pages). They are also the developers of an autonomous version of the Science Citation Index.

Lawrence, one of the computer scientists hired at the very beginning of his career (and, Gear's comments notwithstanding, highly appreciative of the benefits of working in a long-term research environment), received his PhD from the University of Queensland in 1996 and has been at NEC Research ever since. Interested in machine learning, neural networks, and information retrieval, he tells SIAM News that he has "always wanted to study the Web."

Lawrence and Giles proceeded by taking a large number of queries posed by NEC researchers to the internal NEC search engine Inquirus (itself an earlier project, for which Lawrence, drawing on his background in machine learning, had written the code); applying a consistent relevance measure and normalizing URLs, they evaluated the six search engines: HotBot, Alta Vista, Northern Light, Excite, Infoseek, and Lycos. "We found an order of magnitude difference in what the engines covered," Lawrence reports; HotBot, at one end of the spectrum, indexed 34% of indexable pages, and Lycos, at the other, only 3%.

By combining the engines, a user can get 3.5 times as many pages, but still only 60% of the indexable Web. Because the Web is constantly growing, the engines face a gargantuan task. They also found that the indexing patterns of the engines vary widely over time, suggesting that the best engine for searching for recent information also varies over time. Users, moreover, complain more frequently about being inundated with information, and about being provided with invalid links, than about missed material.

Inquirus, the internal NEC program, is actually a metasearch engine. As such, it goes to engines, downloads the pages that match a query, and analyzes the context in terms of a readily computable "relevance measure," involving the proximity of the various search terms within the pages, to rank the pages found by the various search engines. Inquirus, moreover, can answer queries phrased in certain ways, e.g., "Where do rainbows come from?" and "What does NASDAQ stand for?" Lawrence has hand-coded about ten such questions, to which Inquirus, as demonstrated to SIAM News, provides rapid, specific answers.

Isn't this a commercializable product? Yes, says Lawrence, although it might be quite involved-agreements would have to be made, for example, with search engines. Another project on which he and Giles have been working, however, seems to have even greater commercial potential.

Called CiteSeer, the new "product" is an autonomous version of the Science Citation Index; it goes to the Web and downloads pdf and PostScript files. It can search for an author, and it can go into citations, getting the actual context from the article, the paragraph in which the citation is made. Compared with the Science Citation Index, Lawrence points out, CiteSeer is cheap and, through agreements with publishers of journals and conference proceedings, could be more timely and more comprehensive; it can give authors better feedback on their publications. Although, again, it would seem to be a commercializable technology, the plan is to give it away: There's a limited market, Lawrence explains; it's really interesting only to scientists, and charging for it would limit its impact. A recent paper on CiteSeer, published in the proceedings of the 1998 ACM Conference on Digital Libraries, was one of six papers shortlisted for the conference's best-paper award. "Web research and development is contemporary and popular," says Lee Giles, Lawrence's partner in this work, who in a separate visit with SIAM News briefly describes some of his other research, much of it done in collaboration with people at Princeton University, Rensselaer Polytechnic Institute, the University of Pittsburgh, and the University of Maryland.

Summarizing the attitude at NEC as "do what works for you," he says that he has also worked a lot recently with Italians, at the Universities of Florence and Pisa, and with Australians, at the Universities of Queensland and Sydney and the Queensland University of Technology. The commonality, says Giles, is an interest in learning.

Giles has a background in physics and electrical engineering and has taught electrical engineering and computer science (most recently at Princeton and Pisa); immediately before moving to NEC Research, he managed programs in neural networks and AI and in optics in computing at the Air Force Office of Scientific Research.

As a senior scientist, Giles is willing to comment on the motives of his current employer: "What NEC really wants," he tells SIAM News, "is papers, patents, prizes; tech transfer would also be very nice." If anything is wrong with the way the lab is set up, he continues, it's in intellectual property rights---it would be nice for researchers to share in any financial rewards from patents.

Giles has delivered both papers and prizes during his nine years at NEC. He and collaborators at Pittsburgh received the 1996 International Neural Networks Conference best paper award for a paper on prediction of multiprocessor memory-access patterns. In addition, he and co-authors at Princeton and Slovak Technical University recently received the IEEE Transactions on Neural Networks paper of the year award for 1996 for the paper "Learning long-term dependencies in NARX recurrent neural networks." That work, he explains, is concerned with basic issues in memory and temporal modeling and their effects on computing performance, especially for neural networks.

Building on the work of H.T. Siegelmann and Eduardo Sontag, who showed that networks with feedback have at least Turing equivalence, the prize-winning paper shows that putting in memory really helps in training with gradient-based methods---"instead of just following the gradient at each place, you hold previous gradients in memory." "We're investigating temporal questions, trying to learn long-term temporal dependencies," Giles says. His conclusion: "Use as much memory as you can afford"; the storage of past gradients gives "jump-ahead connections, which lets you skip steps in your search." Giles also does some "optics in computing" research and has worked on time-series predictions in financial markets; usually this work takes the form of trying out new models with relatively simple time series to see whether they will be beneficial. "We're in the methods rather than the financial prediction business," he comments; learning is an approach that will allow successful predictions.

Giles, who clearly has "a lot of freedom at NEC," thinks he would have been a little more focused at a university, where research can often be determined by the grants awarded.

Blue-sky Computer Science
Warren D. Smith, an applied mathematician (his Princeton dissertation was in computational geometry), has been at NEC since the early '90s. A theme of his work has been the ways in which the laws of physics are affected by the foundations of computing and, conversely, the ways in which the laws of physics limit (or enhance) the performance of computers. Smith is now teaching a course on this subject at Princeton.

He tells SIAM News about his work on Church's thesis, beginning by pointing out that N mutually gravitating point masses in a plane can be regarded as a "natural computer," accepting as input its own initial position in phase space and generating as output trajectories in that space. Building on work of Joseph Gerver, Smith showed that any one of uncountably many topologically distinct trajectories can emerge after a fixed amount of time, while a given Turing machine can perform only finitely many alternative computations in a finite amount of time. Turing machines would thus seem to be less versatile.

Smith argues, however, that this conclusion is premature, for two reasons: (1) The centers of mass of physical bodies cannot really be brought into arbitrarily close proximity, and (2) Newton's laws of gravity and motion are not correct; in particular, it is impossible to exceed the speed of light. If the classical N-body problem is altered in such a way that speeds are limited by the speed of light, while masses coalesce whenever they approach to within half of their Schwarzschild radii, then obviously the number of topologically distinct trajectories attainable in finite time is actually finite. In such a model, Smith confirms Church's "strong" thesis by demonstrating that a conventional computer can simulate the motion with only "polynomial slowdown." This work was done by combining some new and old ideas about rigorous numerical solutions of systems of differential equations.

Smith has also worked on the traveling salesman problem and a computer science perspective on synthetic chemistry. Computer scientist Eric Baum, originally a physicist, has been at NEC Research for nine years. "Even here," he says, "I'm probably at the blue-sky end of things; the whole place is blue sky, although maybe less than it used to be." Over the last three or four years, Baum has developed a new "economic" approach to machine learning, designed to obtain results similar to those obtained with John Holland's original "genetic algorithms" and more recent "classifier systems" with significantly less computational effort.

Unlike many NEC researchers, computer scientist Peter Yianilos has plenty of real-world ties, having joined NEC about seven years ago from a career in industry. Franklin, the company he founded with Elwyn Berlecamp, is "a strange group of mathematicians" doing business, which is to design and produce hand-held spellers and dictionaries; most spell checkers, Yianilos thinks, still use his algorithm. "Being in business makes you appreciate research a lot more," he points out.

Happy at the institute because he "can work on many different projects," Yianilos tells SIAM News about an in-house R&D exhibition at NEC in Japan (held mainly to let the corporation's business groups know about new research developments) in which he participated a few years ago. For a similar exhibition this summer, he is planning to present intermemory, a design for a distributed process that can run on machines anywhere. With intermemory, Yianilos explains, data are smeared out over many processors; the data are encoded and scrambled and then paquetized, but "if you can get half back, you can retrieve all of it." Intermemory provides "incredibly survivable, secure storage," he says; "an adversary would have a hard time wiping it out." People---publishers, for example---could band together to take advantage of it.

Another of his current interests is DNA, specifically the extent to which sequences of natural DNA can be modeled and compressed. DNA turns out to be relatively incompressible, close to being a random signal. Yianilos and a student achieved the best available compression---counting the frequencies of the four nucleotides, or bases, that make up DNA---which was about 2.5%. With the addition of text compression, they got up to 5%. At the end, they got 15%, a result he assesses as small, but not meaningless.

The ability to compress something corresponds directly to our understanding of it, Yianilos says; if we know what's coming next, we can code the knowledge in very few bits. "Either we're ignorant and the signal is really ordered," he suggests, "or it really is a highly random signal." The answer, he believes, is far closer to the second possibility; the question is: Why? Why is this healthy for organisms?

If DNA is really as incompressible as it seems, it may be because large segments of it are indeed random, and therefore unimportant. In that case, substantial portions of the human genome---the whole of which consists of about 4 billion of the bases A, C, T, and G---could be replaced by other random sequences without altering the salient characteristics of the offspring.

It's possible, Yianilos suggests, that the proteins coded for by DNA are just scaffolding---that very few little parts are really important. His hypothesis is that "the string is incompressible because DNA is highly random [with the sequences that code for proteins actually being closer to random than the sequences known as "junk" DNA], which in turn is a result of the small amount of a protein that's actually important."

NEC Physicists Look at Biological Systems
Also looking at proteins are NEC Research physicists Ned Wingreen and Chao Tang. Using DNA analysis, they point out, researchers can easily determine the sequence of amino acids in a given protein (three DNA bases code for a specific amino acid). How the amino acid chain folds into three-dimensional shapes, however, is not known.

At NEC, the two physicists, who knew each other slightly when they arrived at NEC seven years ago, have been applying their training as physicists to the protein-folding problem, taking an approach quite different from the experimental approach of biologists. What are some of the (approximately 5000) shapes that have been observed? Tang and Wingreen ask. Why are the shapes so regular? Why are certain shapes preferred? Why aren't there 100,000 (the approximate number of known proteins) shapes?

Tang and Wingreen have built a model, in which amino acids are divided into two groups-the hydrophobic and the polar (less hydrophobic). A typical protein, they explain, contains 100-200 amino acids, with the most hydrophobic located in the interior of the molecule, as far as possible from the watery environment.

Tang and Wingreen elected to study protein-like structures consisting of a "mere" 27 = 3^3 amino acids. Each string of 27 polar and hydrophobic amino acids can be "folded" into the shape of a self-avoiding random walk on the cubic (3D) lattice, and (with all but nearest-neighbor interactions ignored) a potential energy can be associated with each string/shape pair. If symmetries are taken into account, there are 51,704 distinct self-avoiding random walks that do not escape from a (compact) 3 � 3 � 3 cube. By complete enumeration, Tang and Wingreen concluded that only 4.75% of the 2^{27} = 134,217,728 such strings correspond to unique minimum-energy shapes, and elected to ignore all others on the ground that the corresponding molecules would prove thermodynamically unstable.

By counting the number N(S) of strings that correspond to a particular "ground state" S, they then discovered that some admissible shapes S correspond to far more sequences than others. Indeed, in contrast to the 4256 shapes S for which N(S) = 0, there is one S for which N(S) = 3794. Finally, they conjectured that the same hydrophobic forces would cause the longer sequences that occur in nature to assume shapes similar in certain respects to those preferred by 27-element sequences.

Study of the way information coded in one dimension (in DNA and proteins) is related to its expression in a three-dimensional structure (a folded protein) may be relevant to computation and communication, Tang and Wingreen point out-hence NEC's interest. The two physicists, who doubt very much that they would have been able to work on protein folding in an academic environment, are now ready to drop the lattice and look at real proteins.

Next on the SIAM News tour is an NEC Research physicist who works on the real side of biophysics-vision in the fly. Rob de Ruyter, who has been at the institute for five and a half years, also briefly mentions a few of the other biophysics projects under way at NEC Research: olfaction in locusts and the escape response in cockroaches (how they encode patterns of airflow that warn them of impending danger).

Working with a theorist, de Ruyter is exploring neural coding and neural computation to learn how living organisms cope with noise. De Ruyter's contributions begin with a plastic screen-covered bucket on the windowsill of his office, which turns out to contain a collection of New Jersey flies.

By inserting a probe into the visual brain of a living fly and recording signals from a motion-sensitive neuron, de Ruyter has learned how the fly's brain "distills" motion information from the signals entering the photoreceptors of the fly's eye. And by decoding the neuron's messages, he has been able to show that the fly performs this computation in a statistically efficient way.

Finally, by mounting a "wired" fly on the front of an ordinary bicycle helmet, he has made it possible for the wearer to walk around outdoors and record the response of the fly while it sees natural scenes. A demonstration is unavailable on the occasion of SIAM's visit, due to the recent and untimely death of the current subject.

Ten Years, At Least Ten Research Successes
Looking ahead, Gear is planning to present the "top ten" research successes at NEC Research's tenth-anniversary open house, to be held on May 3, 1999, at the institute. While these projects have not yet been selected, they will range from significant theoretical results in physics and computer science-the sort that don't interest the popular press, such as results on the traveling salesman problem and protein folding-to results that have attracted a lot of attention or have potential for company impact. One item sure to be featured is the Othello program that defeated the world human champion in a match held at NEC Research; although it received almost no attention in the U.S., the match was a featured event in Japan. The money for the match, Gear says ruefully, "was the best ten grand I ever spent." Gear will undoubtedly also highlight Signafy, the spinoff company that has commercialized digital still-image and video watermarking methods developed at NEC Research.

Meanwhile, summarizing the researchers' point of view, Peter Yianilos views NEC Research as a "good organization, from a human standpoint. . . . Starting with Bill Gear, they appreciate research-that's the tone."

Commenting, like almost everyone SIAM News has met, on the Japanese economy and the implications of the current crisis for the institute, Yianilos says that people "are hearing 'no' for the first time," on requesting a piece of equipment, for example, "and that's due only in part to the Japanese economy." NEC Research is a very well run place, with a lot of flexibility built into the budget. What's happened, he says, is that "We've finally grown up."

Donate · Contact Us · Site Map · Join SIAM · My Account
Facebook Twitter Youtube linkedin google+