A numbers guy
Beep. Beep. Beep. The student's arm flails toward the nightstand, trying to whack quiet the annoying sound of the alarm. It's Saturday. No time to roll over or snooze. Need coffee, soda, anything with a large jolt of caffeine helps ease into what's ahead. Get dressed. Bolt out the door. Sprint to a room filled with computers in the building phonetically known as "eye-ack" or IACC or its more formal title of Industrial Agriculture and Communications Center at North Dakota State University.
At the same time, someone else greets this morning with graceful anticipation. He's been planning for this event all week, spending the last seven days figuring out what to say. It's something he and his students never miss, these Saturday research sessions from 9 a.m. to 2 p.m., even if students receive no college credit for it. They show up. Just as they have on every Saturday for about the last 25 years. The names of students may have changed from year to year. But one thing is constant. "I had some very bright students. Just that interaction, knowing it's coming on Saturday, was just very exciting and a lot of fun. I looked forward to those. That was the highlight of my week," says Bill Perrizo, Distinguished Professor of Computer Science at NDSU, smiling.
Students remember the sessions too. Taufik Abidin, now a senior software engineer at Ask.com, says Perrizo always brought a dozen donuts for students who attended. Elizabeth Wang, now an assistant professor at Waynesburg University in Waynesburg, Pennsylvania, recalls his commitment. "In order to expose us with some frontier research areas, he could think about them over and over again before Saturday. We don't know how much sleep he lost while he was working on the new problems. And we don't know how many holidays he spent in his office doing research." Fei Pan at the University of Southern California says Perrizo trains his students to brew ingenious ideas and be creative in their research.
Perrizo has advised at least 25 Ph.D. students and about 60 master's degree students from around the world. Add that to the literally thousands of undergraduates he's taught in 36 years at NDSU and Bill Perrizo has touched a lot of lives.
Though he's had to curtail the Saturday research sessions during the past year due to health issues, it hasn't stopped this numbers guy from pursuing other research opportunities, like the one with many zeros after it. His current research focuses on pursuing the $1,000,000 Netflix Prize.
The nearly 11-year-old online DVD rental service began its five-year contest in 2006. "In the summer, I bet I put in 70 hours a week working on it. Sometimes I am here at four in the morning," says Perrizo. "I get an idea. I'm in data mining, which is what this is. This is going to be the benchmark for all data mining research for the next 20 years."
The problem, which is being chased by about 25,000 teams worldwide, involves creating a computer algorithm that will accurately suggest movies you may want to watch, based on your ratings of previous movies you've rented. Netflix already uses a program called Cinematch to do this, but it seeks a system that will beat the accuracy of its current system by 10 percent. Perrizo admires the company's contest approach. "That probably is 50,000 scientists worldwide working on this problem for five years. Now, you do the math. How much is that an hour? About a penny an hour? You can't buy scientists for a penny an hour," he chuckles. "That's smart."
Netflix has provided contestants with a data set of 100 million movie ratings by its customers. If you are a data miner, being awash in that amount of data might be close to achieving nirvana. "It's the only data set in data mining we've ever had that's real life, massive and as challenging as you want it to be," says Perrizo. When there's a bazillion numbers, finding the patterns and useful information in all those digits becomes problematic. Think of trying to find your car in the Metrodome Stadium when you don't remember where it's parked or locating a group of specific grains of sand in the Sahara Desert.
"Data mining is what we call ad hoc querying. It's not so crystal clear what you're after. You have a feeling that there's valuable information in this data set. You want to find it, but it's not so clear what it is. That would be data mining," says Perrizo. Maybe it could be characterized as a technologically superior approach to detective work, like a gumshoe with a hunch who looks for details and patterns that provide useful information to solve a case.
Experts such as Perrizo use the sets of numbers and write algorithms that are almost like global positioning systems to find information in treasure troves of data. In everyday life, a recipe, for example, is just an algorithm for producing a food product. And an algorithm is simply a recipe for how you want to do something in a computer program.
In the Netflix contest, thousands of computer supercoders and others work to write algorithms that will boost the success of Netflix's Cinematch by 10 percent. So far, a team called BellKor from ATT Labs boosted it by 8.43 percent. Netflix also awards a $50,000 annual progress prize to the team leading the pack to solve this business intelligence problem. For the ultimate winner of the contest, Netflix requires a royalty-free but non-exclusive license to use the software.
Suppose somebody does win the prize over the five-year period. According to contest rules, everyone else gets 30 days to beat them, which Perrizo says, would lead to a feeding frenzy of science.
"I would be a fool not to be working on this because this is where it's at for data mining researchers right now," says Perrizo. "I've gotten into it, I'm afraid. I can't let it go now."
Perrizo's unique approach to data mining involves vertically structuring data, then writing computer programs that will efficiently, accurately and elegantly mine the data for useful information. His name for vertically structuring the data is P-Trees which stands for predicate trees - not for Perrizo. "But if people make that mistake, that's OK with me," he says with a wry smile.
He points out that today all data is horizontally structured. Think of a spreadsheet with rows of names, addresses, numbers and other data stretching from left to right. For a computer to process such data, it painstakingly looks at the first piece of data, then the next and the next and the next until it finds what it's looking for. "That works fine unless the depth of it is like it is with Netflix - a hundred million or a billion records deep," says Perrizo. "Well, you have to look at every one of them one at a time. It takes forever."
With Perrizo's method, the data is turned on its head, sliced up to change it into a vertical structure resulting in long, skinny pieces of data that are then compressed into a predicate tree to better manage it. "That wouldn't be all that useful if you had to uncompress every time you wanted to process the information. But we don't. We can process the compressed trees."
He thinks his two years of work can keep his team, called P-Tree Code Monkeys, in the Netflix race. He remains undaunted by the David vs. Goliath nature of his quest. Others participating in the contest have room upon room filled with computer servers. Although Perrizo uses a personal computer as well as NDSU's Center for High Performance Computing, it sometimes takes four days for an operation to process. He is nothing short of gleeful about the upcoming expansion of the high performance computing center, which will increase the number and power of processors available. "Now I'll be able to do something," he says, with a broad smile. "It'll be 600 times faster than what I've been able to do. What would have taken me six days will now take me a hundredth of a day."
He also mentions that he doesn't think he'd have a shot at the Netflix prize if it weren't for colleague Greg Wettstein, systems administrator of the computing center. "In my opinion, he's one of the best systems programmers in the world. It's a rare talent to be a systems programmer at the level that he can do it. He can set up an environment for me to do my applications programming that's absolutely world class. I call him a coding savant."
He asked Wettstein to take a look at a computer program he'd written that was giving him some trouble. "I think in five minutes he said, 'Have you got some GO TOs in here?' And I had a couple. It's 10,000 lines of code. I don't know how you can look for a few minutes and figure that out," says Perrizo with admiration. Wettstein appreciates Perrizo's approach to personal and professional challenges, saying it's based on highly reasoned and analytical assessments of a situation.
So Perrizo continues to work on what it may take to win the Netflix Prize. He is already a University Distinguished Professor. He's published more than 200 refereed publications. Early in his career, he received multiple grants to work on an Air Force project designing a worldwide information system for the U.S. Department of Defense and its allies. "It was a pretty ambitious project which actually failed - not because of me," he laughs. His accolades include winning the 2006 Knowledge Discovery and Data Mining Cup, a contest once characterized as the "Holy Grail" of Computer Aided Detection to find pulmonary embolisms or blood clots from radiological imagery.
As for any "aha!" moments or major discoveries in his research career, "I get one every week. But 99 percent of them don't work out," he says. Some did. Perrizo holds a patent for his vertically structured data approach to database and data mining. He holds another patent for concurrency control, which, in a database, is like a traffic cop, making sure that one computer user's activity doesn't affect another's. "Maybe at this point, you have a little flexibility to look at 'what would I do to top things off here?' " says Perrizo. Thus, the quest for the Netflix Prize. "The person who wins is going to be the renowned data mining researcher for a long time. And the million dollars probably doesn't hurt either."
How Perrizo got to this point in his career - with humor and dedication - seems to parallel the rest of his life. His mother was a teacher and his father a farmer in southern Minnesota. Somehow, he became a mathematician. "I just always liked numbers. Give me a problem that's hard to do in mathematics. I just love to go after it." He had some catching up to do academically since he didn't have a senior algebra course with his 15-member high school graduating class. But when he went to the University of Minnesota, his Ph.D. committee member, Len Shapiro, remembers him well.
Perrizo later encouraged Shapiro to join NDSU as chair of the computer science department. Although Shapiro would later move on to Portland State University, he clearly outlines Perrizo's achievements. "Bill's contributions in the areas of transaction processing, query processing, data mining, distributed databases and bioinformatics are outstanding and have advanced the cause of science in many significant ways." There's another aspect Shapiro appreciates. "I most admire Bill's ability to juggle his impressive professional life and still maintain deep and loving family relationships."
Perrizo speaks with obvious pride about his wife who is a part-time teacher at a private school and his three grown kids. One daughter manages all the domestic violence homeless shelters in New York City. Another daughter is a professional actress who's appeared in Broadway shows and now lives on the West Coast. His son in Minneapolis is a massage therapist. Not a computer scientist among them. "No, not even a hint of anything scientific!" he replies with mock exasperation.
He'll debate with his daughters, both of whom are in the humanities. "They say, 'You can't use numbers to do everything.' And I say, 'Au contraire.' We always use numbers. We always end up with the absolute quantification of yes or no. That's 0 or 1. We make decisions. We say yes or we say no. That's absolute quantification. So why are you saying I can't get from this massive accumulation of data to that ultimate quantification through numbers? I should if I can. If I can't, I'll use art to decide," he says with passion. "So it's an argument between artists and scientists. In my opinion, scientists are right because everything we do is a decision." But even this self-professed numbers guy will concede one point to his kids. "Now that doesn't mean that sometimes I'll wake up in the morning and the entire solution to a problem will be there, like a work of art. It would take me weeks to sequence it or write it down."
That passion for numbers is something Perrizo clearly passes on to his students. He looks for students with an innate drive to see a problem, solve that problem and make a contribution. "People remember their teachers and generally, their careers are shaped by not only the choice of what career they go into, but the quality of their career is shaped by their teachers," he says. Mementos from former students line the bookshelves in his cramped office, like an international travelogue - wooden owls and tea from China, sandalwood from India, and folk art from Sri Lanka and Bangladesh are just a few. Like endless lines of computer code swimming with speed and elegance across a screen, invisible strings still connect him with former students, now on their own research quests.
His former students are in Lebanon, China, Bangladesh, Sri Lanka, India, Alaska, Arizona, Minnesota, Pennsylvania, Arkansas, Washington and elsewhere. Imad Rahal, now an assistant professor at St. John's University, Collegeville, Minn., had Perrizo as his master's degree and Ph.D. adviser. Rahal says he admires "his strong belief in himself and his advisees, his patience and willingness to go the extra mile for his advisees, his love for life and the support that he provides for his students looooong after they graduate. He is the person who made a researcher out of me. He instilled the love of research in me and showed me that I can do it."
Wang notes that Perrizo challenged her as her adviser and expected hard work. But she also remembers his compassion. When stranded in China in 2003 due to international travel restrictions, she contacted her adviser. "Dr. Perrizo wrote a very touching reference letter for me and also asked a senator to write a letter to the U.S. Consulate in Beijing. As a result, not only I, but also my son, were able to come back to Fargo so that I was able to continue and finish my Ph.D. I don't know what would have happened without Dr. Perrizo's help."
As his students continue to build successful research careers, Perrizo continues his pursuit of the Netflix Prize. If you tried to characterize Perrizo's work as a movie in the Netflix collection, the classic "It's a Wonderful Life," springs to mind. With emphasis on the main character played by actor Jimmy Stewart who is at once approachable, smart, humorous, compassionate and determined, that might offer a glimpse of Bill Perrizo. The 65-year-old will continue the research quest posed by the Netflix five-year contest into 2011, even as he works around his schedule of teaching, chemotherapy and doctor appointments.
Whether Perrizo wins the million dollar Netflix prize - well, it would be nice - but maybe the journey matters more than the destination. Mathematical proof of his success already exists. There's proof in his legacy of distinguished research, of successful students, and of a loving family.
Over the past year, Perrizo curtailed his marathon running. He used to do some woodworking and remodeled nearly every square inch of his old house. Now he's becoming a coffee connoisseur. But in self-effacing fashion he points out, "I guess my life's pretty boring, actually."
As German colleague Walter Dosch at the University of Luebeck notes, "Despite his scientific success, he remains a modest and open-minded person with a good sense of humor." Dosch served as a board member with Perrizo in the International Society for Computers and their Applications. "I have no particular story to tell about Dr. Perrizo. His lifework is a story by itself."
-- Carol Renner