![]() |
| UC San Diego professor Yuanyuan (YY) Zhou specializes in making computers less vulnerable to attacks and malfunctions. |
She specializes in making computers safer and more reliable.
Yet Yuanyuan (YY) Zhou is also a maven of reliability in another sense:
Securing grant funding for the University of California, San Diego.
While only on the UC San Diego faculty for a little more than a year,
she has won National Science Foundation (NSF) support as principal
investigator (PI) on four projects, and co-PI on a fifth. And three of
the projects kicked off in just the past six weeks.
The recent projects aim to make computer systems more reliable by
detecting software bugs more efficiently, creating automated logs to
diagnose software issues and using software and system components to
adapt to the variability in manufactured computing systems.
“People always ask why their systems fail, and the computer industry is
starting to pay attention to this reliability issue,” explains Zhou,
who joined the UC San Diego Department of Computer Science and Engineering in
the summer of 2009.
“Fundamentally, my research is about making computer systems less
vulnerable to attacks so they crash less. When Windows crashes, it’s
more of an inconvenience, but when an e-commerce site crashes it can
really be a problem.”
Zhou is the first holder of the Qualcomm Endowed Chair in Mobile
Computing. The chair is one of four established in the Jacobs School of
Engineering through Qualcomm’s original $15 million commitment to the
California Institute for Telecommunications and Information Technology
(Calit2).
After earning her Ph.D. from Princeton University, Zhou worked for two
years as a research scientist at NEC. She then taught at the University
of Illinois at Urbana-Champaign (UIUC) from 2002-09.
While at UIUC, Zhou co-founded her second startup company, Pattern
Insight, with several members of her research team. Now based in
Mountain View, Calif., the company has already begun shipping its first
project — a search-and-analysis suite for source code that helps
software development teams with the challenges of managing large code
bases. Its solutions, based on Zhou’s research at UIUC, have
been deployed in large companies including Cisco, Qualcomm, Juniper
and Tellabs. In 2008, Intel licensed an innovation related to
multi-core processors developed by Zhou and her students.
Zhou remains chief technology officer at Pattern Insight in her spare
time — and with all of her work at UC San Diego to keep her busy, there's not
much of it. Her UC San Diego grants as solo investigator total more than $1.6
million, and Zhou will be responsible for a portion of a $10 million,
Calit2-based project just getting underway this month.
Bug detector
Detecting computer bugs is crucial in the fight for system reliability,
Zhou says, which is why she was granted $430,000 from NSF to study
ways that software and hardware can be used to detect bugs, especially
those in parallel and distributed programs.
“Right now,” Zhou says, “cell phones, laptops and desktops have
multicore processors, but to take advantage of this kind of processing,
programs need to be concurrent. Writing these programs is difficult and
error-prone, and this has been a major headache for industry.
Detecting and preventing these bugs from doing damage has become an
increasingly important and urgent issue.”
To improve the correctness of parallel and distributed software, Zhou
proposes a novel and widely applicable invariance, called data-flow
invariance, which can be used to detect various types of software bugs
and make software more reliable and secure.
“I strongly believe that this research can effectively improve our
understanding of this challenge, provide substantial tools support to
software development and greatly improve software quality and system
robustness,” she adds.
As the recipient of the Committee on the Status of Women in Computing
Research (CRA-W) Anita Borg Career Award for her contribution to women
in computer science, Zhou notes that her project also incorporates
various educational and outreach activities for students, especially for
women in computer science programs.
Troubleshooter
Another strategy Zhou proposes for coping with computer crashes is to
diagnose the problem at the source through automatic log inference and
informative logging. NSF granted Zhou another $470,000 to research ways
to enable developers to quickly troubleshoot production-run failures
and shorten system downtime.
“When a crash happens, you don’t want to have to send your cell phone
or computer back to the manufacturer, because that takes valuable time
and might compromise private data,” notes Zhou. Not to mention, she
adds, the vendor “might not be able to replicate the problem in-house,”
just as a patient cannot often replicate a health problem to
understand the possible root cause.
“And customer support is expensive for the cell phone companies as
well,” she adds. “With Motorola, every support call I make to them costs
$300 on average, and if they can’t figure out the problem, they have
to send a complete replacement.”
Instead, Zhou proposes a method for quickly identifying root causes of
the system malfunction and releasing patches to fix it, consequently
reducing the amount of time the system is down, and ideally sparing the
consumer any hassle whatsoever.
She says that industry leaders like Motorola, Dell, Sony and Cisco
Systems have already started to push so-called “call-home capability,”
which equips cell phones, laptops, and desktop computers with the
means to call support centers automatically when a problem arises.
“But right now, the support center still has to call the user, diagnose
the problem, and then fix it,” she says. “Eventually, we hope the
final step will be automated to the point where the computer will
predict a failure is happening , ‘calls home’ and then automatically
self-heals without the user even noticing that anything was wrong at
all.”
Expeditions in computing
Zhou is also part of a large ensemble of researchers at UC San Diego
and five other universities on a third new NSF grant. The $10 million
project is part of the foundation’s Expeditions in Computing program,
and it is directed by CSE professor (and chair) Rajesh Gupta, with
Zhou and four other co-PIs at UC San Diego and eight co-PIs
distributed across UCLA, UC Irvine, Stanford, University of Michigan
and UIUC. The so-called Variability Expedition proposes to re-think and
enhance the role that software can play in a new class of computing
machines that are adaptive and highly energy efficient. The idea is to
use system components — led by proactive software — to routinely
monitor, predict and adapt to the variability in manufactured computer
systems.
Says Zhou: “It represents not only a way to deal with hardware
reliability, but a chance to rethink software architecture. If software
is designed in a way where the software can automatically adapt to the
changing execution environment including the underlying hardware, the
software itself is more reliable, and is robust to errors and
variations in not only hardware but also software itself. Cell phones,
for example, need to adapt to constantly changing environments — not
just to the physical environment like extreme heat or cold, but to
various applications and devices manufactured by different companies. For this reason, it would be useful for the software stack to be adaptive.”
Zhou predicts that as a result of current research in the field,
computer systems will become markedly more adaptive and reliable in as
little as five years — and consequently, the nature of information
technology support staffs will evolve as well.
“In the past, people focused a lot on the features and performance of
computers, but it’s gotten to the point where the performance isn’t that
bad for most apps,” she continues. “I think IT staffs will be
consolidated because with automation, we’ll need fewer people to do
basic level calls. The people doing the support side of things will
become more expert.”
She notes that since all the apps will be running ‘in the cloud’
instead of on individual cell phones, “there will still be a need for
planning for resource allocation, only these companies won’t be dealing
with separate individuals, but with datacenters.”
In addition to the three NSF grants awarded in the last six weeks, Zhou
is currently working on two projects funded by NSF in summer 2009 after
she arrived at UC San Diego. A $569,000 award is allowing her to work
on a novel approach to automatically perform on-site diagnosis of a
software failure right at the moment of the failure, and provide
programmers a detailed diagnosis report. Also launched last summer: a
project to improve storage system performance, dependability and
manageability using system mining techniques.
Zhou will give a talk on cloud computing this month at the National
Academy of Engineering’s “Frontiers of Engineering” conference, where
she’ll discuss the impact of cloud computing on transparency.
“With cloud computing, the data is no longer on the device itself, so there is less transparency, and these abstractions make application testing, diagnosing and troubleshooting much harder because it’s harder to see the physical layers.
“Although the benefit of the cloud is elasticity — you can scale down
or scale up and pay for the bandwidth you use — many apps are not
designed for this and can easily break,” she continues. “We need to
begin asking: What is the difference between traditional app development
and what is needed now?
"Maybe the cloud infrastructure provider will need to start building in development, testing, deployment and diagnostics to enable more applications in clouds.”


