e;
Interview with Professor Irving Biederman
Professor Iriving Biederman is a Professor of Neuroscience at the University of Southern California. His work deals with determining how the human brain interperts the information it recieves, or how we "get mind from brain," as he likes to put it. Further information on Professor Biederman and his work can by found at the website for the Image Understanding Lab.
E: To start out with, what are your day to day responsibilities? Do you spend most of your time teaching, or do you spend most of your time doing actual research?
B: Probably more of my time is spent doing research, or thinking about research -- and a distressingly large percentage of my time is involved in administrative activities.
E: Your work deals with object recognition?
B: Shape recognition in general, but objects, seeing, and face recognition in particular.
E: When did you first get started in doing work with images?
B: When [my brother] was eleven, he got me a birthday present -- and I was eight, he's three years older than me, my brother -- a subscription to Scientific American, which he really wanted for himself. So we would fight when it would come, as to who would read it first. But I got interested in the articles -- some of it I couldn't understand -- in the articles on perception where you could try out some of the illusions and some of the effects on yourself, and so it was an interest I just retained. I majored in psychology as an undergraduate, and it was just the thought of trying to understand how you got mind from brain seemed like such a delicious problem that it was easy to follow that. I've been forunate things have gone well.
E: I was reading through some of the stuff on your website, and it was talking about geons, and talking about edge perception to recognize objects...
B: Yeah, so if you think about what you could do with a good line drawing of an object, which is, in fact, you could recognize it very well. This line drawing simply captures the edges in an image that correspond to discontinuity -- the sharp differences in depth -- so that if you look at my arm here, there's a jump at this -- if you're drawing it you draw an edge over here -- there's a sharp jump in depth from where your viewpoint grazes this tangent surface of my arm to the background. You also have one other type of discontinuity, it's a discontinuity in surface orientation, so here we're at one point over here, and it's not like there's a big jump in depth but a big jump in orientation. So those are the only two types of edges that are really important. We have other types of edges that could be coloring, or texture on the surface, or luminance hotspots -- so if you shine a flashlight or a spotlight on the object you get a hotspot -- but the brain seems to know what is the actual structure of the object in terms of the rotation and depth discontinuities and what is just surface variation. Anyway, from those two types of edges you can activate these geons which are shape primitives, much the way phonemes are speech primitives for speech recognition. So we have these shape primitives like cylinders and bricks and curved cylinders and wedges and so on -- about fifty of those can account for virtually all aspects of object recognition other than the surface differences like color and texture.
E: Now, this work, are you approaching it in sort of a theoretical science approach, or are you looking to it for a specific technological end? Are you looking for an implementation or just pursuing it for the science?
B: To understand how we get mind from brain is so interesting that, yes, that's my passion. We're pursuing it on many different levels, that is: a behavorial level so we do experiments on psychophysics, we do what are called single neuron recordings in monkeys so as they are looking at different types of images designed to test aspects of the theory, we record from these neurons, and have now a pretty good idea for how the brain codes images. We also do some work with patients, individuals who have specific types of lesions -- so for example we're studying an individual now who is a prosopagnosiac, that means he can not recognize faces though he has no trouble recognizing objects -- and then we're starting some experiments in functional imaging, so using fMRI we get differences in the signal depending on the type of perceptual or cognitive task someone is doing, and from that we can infer how the brain is doing these various types of tasks or processing the information in different ways. So we have essentially a multi-faceted attack on this problem, but the great motivation comes from simply understanding it, though there are enormous applications to it if we could solve the problems.
E: Right. There are a ton of applications that could benefit from computers actually being able to recognize what it is they're seeing.
B: Exactly. For face recognition, actually, there's a system developed here at USC that won a national competition for best face recognizer. It's done by a colleague, Christoph von der Malsburg, and we've been using to see whether the theory upon which that was based is a good theory of human face recognition, and it is, at least to our best accounting. And so an application of that, which is currently already being used would be to detect potential undesirables at points of entry to the US.
E: I know there was a system like that used in Tampa Bay at the Super Bowl, do you know if that was...
B: I think that might have been another system, a competing one but somewhat similar.
E: Also, I was wondering -- and this is sort of irrelevant to the interview, but I was curious -- once you know how the brain stores an image, is there a chance that will be a good way to not only recognize images, but reconstruct them. For instance, instead of, say, bitmap image formats, could we be able to store edges and such and be able to recreate an image from that?
B: That's an interesting and very sophisticated question. It turns out that our memory is not nearly as detailed as we think subjectively. When people imagine something, it turns out there's much less detail there than they think. Let me give you an example, imagine the last party you were at. Can you get a picture of the scene there and some people there? Pick out a single person and get an image of that person. What color shirt are they wearing?
E: I have no idea.
B: It's interesting, until that time the person probably wasn't bare chested, nor was... It's not like if you don't get the shirt it's in tatters, or all ripped up the way you might think of a photograph being randomly mutilated -- it's gone completely. Now you might have remembered the shirt or blouse the person was wearing, but if you don't it's gone completely. That indicates -- that and of course there are a lot of experiments, but just a subjective experience indicates -- that to a large extent our memories are categorized. This is quite unlike a bitmap. If we had a bitmap we could expect, yeah, we'll lose some of the pixels, but we'll see a lot of it, so it'll be like we took some of the pixels off the shirt, but we'll be able to see some of them and know it's red, for example, for instead we lose it all. So that suggests that we're not really coding the information in enough detail to reconstruct the image, even though subjectively we might think we have a complete image of the event or the object. So we code only some of it and not all.
E: So it's like the brain is storing a framework of the image, and then selectively filling in detail.
B: Or maybe when you image it, it may fill in detail in terms of what it knows about the world, so it assumes certain things. So I can't see your feet now, but I assume that you have them, and that they're resting on the floor. It's the same way that you can't see the bottom of my chair, but you assume it's resting on the floor and not floating in midair or something like that. There are some assumptions we make, and often they're acurate, but we don't seem to have detailed images, so if we played back what we do remember, we wouldn't be able to reconstruct the scene. Instead we seem to remember what's important, and the rest we don't code. And we could fill in... Imagine, for example, that person was wearing a green shirt that's at the party that you were imaging, or a red shirt, and you could do that. Knowing now that you're just kind of adding to what you know, you could do it, but it's not really in your memory.
E: Switching gears a little, how would you define science?
B: Well actually I don't. There are enormous numbers of possible definitions, and we don't have one kind of formal one, but the ones that are reasonable, I think, extend from "science is what scientists do" to "the process by which you try to make inferences about the world under controlled observations using rational analysis and inferential machinery." So that's probably the best one to describe what it is that scientists are doing.
E: What do you consider to be the difference between science and technology?
B: Well, actually, often they blur. Technology, though, is often concerned with meeting some practical end and not so concerned with what the principles on which let's say that product, or whatever the device is, have been made. Science is devoted to understanding just what it is that could have led to the phenomenon in question. Often science is experimental in a sense that you're comparing under controlled conditions, and technology for the most part takes what's known and tries to apply it without necessarily getting new knowledge. Though, there is the kind of practical knowledge you get when you're trying to build something -- what works, what doesn't work -- so these become kind of informal experiments that engineers at companies are doing as they get experience. One of the things that is often done if a design is unsuccessful -- like a bridge collapse -- is there's a failure analysis and engineers try to understand why the bridge failed, what happened, and that way they're learning. So they have an experiment, maybe, that wasn't meant to be an experiment, but doing it there's often intense scrutiny and analysis about what it was, just trying to understand why it didn't work, and then you make it better. You saw that when the Challenger blew up, that there was this intense scrutiny of the O-Rings, and everything else, and they learned from that.
E: What is the process you use in doing science? How does science as you do it relate to, for instance, the scientific process they teach in high schools -- the sort of problem, research, hypothesis?
B: Well, let me start with the second as you specified it in more detail. Actually, the way it goes is -- who is it, that science fiction writer, I'm blanking on his name. He said, "Most often, great scientific discoveries are preceeded not by 'Eureka!', but by 'Gee, that's funny.'" [Ed. - Quote is by Isaac Asimov.] You get something, just a "what about that?", a kind of pique of curiosity. Or it may be you're listening to something, or thinking about something, and you wonder, "What happens if you did this and this," and it starts with that, an intuition. And then you set up the machinery they describe in junior high schools, so you then set it up to test in a rigorous way, but the real action comes in many respects from that first insight, that creativity. And often then you know when you're right, and you use the rest of it to confirm it. It's almost, often, the ideas are too elegant not to be right, as Einstein said something to about that, that the universe is going for beauty essentially. So that's often how it proceeds. It's not that you have the whole field at your hand and say, "Oh, this needs to be proved here. Let me do it this way," as if you had some machine that could generate hypotheses. There are an infinite number of things it could generate, only some of which are interesting. There is this highly subjective creative component that often preceeds what you think of as the differential machinery, though mastery of the field is really vital, so you don't get people doing great science on the get-go. You get people who are immersed in the field and have the expertise, and then you see what needs to be done.