Robots Suck
Why is AI at the human-equivalent of a fully-grown (literate) adult linguistically but a toddler physically?
AI can do all types of wonderful things. From creating art, to writing (bad) poems, to making super-human predictions. But it can’t perform basic physical tasks. And when it can, it’s torturous or fragile or over-fit to those task.
Moravec’s paradox: counter-intuitively, reasoning requires relatively small amounts of compute. Perception and mobility require lots.
More generally: the earlier some process evolved, the harder it is to train computers to do. This could be associated with training time. We’ve been walking, bi-pedal style, for ~6 million years, but only writing code for ~100. Hence computers are much better at writing code. Example: it is much harder to develop a robot to hunt a warthog in southern Sudan than to drive a car around Seoul.
More biology: we aren’t anywhere close to understanding the human body and how it works. We’re still just throwing darts, trying to reverse engineer the processes. This might not be a good idea, as it seems to be incredibly hard to do for subconscious systems.
And then also these complex, highly-evolved, constantly-adapting systems, are interacting with an open, itself-complex, non-stationary system: The Real World. Consider walking across a park. It might be raining. It might be muddier than usual. Stepping on a soft bit of ground will change that bit of ground. There could be dogs running around. Etc. And that’s just a 2-D problem, don’t get me started on operations in a 3Dims. Compare walking with the finite number of words we use. You can literally slap all of these words (or tokens derived from them) into a big ol’ vector. Done. Whereas try this: how many ways were there of moving your right arm in a 1M^3 space? Which is why of course we have to focus on getting things roughly correct: heavily discretise time and space, do as much reductio dimensiono as pos, etc.
Another problem with these robot<>environment interactions is the reward function. First: non-monotonic loss. Although this isn’t a new problem, it’s amplified in this domain. Consider again walking across a park - simply moving forward might not be necessarily a good thing (collapsing forward will give you that), what you really want is the best combination of movements that drive you the most forward, and in a sustainable way. You need a lot of search space exploration. And that’s just for a simple task. Imagine cooking a pizza from scratch - how exactly does one gradient descend? Hmmm.
There are signs of improvement. MuZero lays out a general-purpose algorithm that learns a model that explains the environment, by learning its most important features. Their most-recent Robotics Transformers also look promising (although it can be hard to tell with robotics how real things are. Demos can be deceiving). Obviously other cool stuff from Tesla, with Optimus Gen 2 , and Boston Dynamics with their Atlas model.
RL is lagging behind other methods. We haven’t figured out a good way to program self-improvement, in complex environments, without human feedback. There might be a sense in which we’re optimising too much: in some cases it’s easier to train a model to optimise for a loss function than to define the loss function in the first place.
Even this whole loss function thing might be wrong. How, or, more importantly, why do humans learn? It’s not via pain, or loss. We explore. We mess around. We play. We are attracted to what interests us. We don’t really optimise, at all levels of biological complexity. There’s a reason you have 2 lungs (and 2 kidneys). Engineers would point out that we could greatly optimise the efficiency of the human body by removing this “superfluous” duplication. Optimisation = necessarily loss in robustness = exit gene pool (eventually). And we over-fit, making quick-and-dirty inferences left right and Chelsea. Because, again, we are not “designed” to be correct/efficient/optimised/the best, just to survive (and pass on genes). How this relates to learning, though, which is where this point started (oops), is not clear.
But I get the feeling this still doesn’t fully explain the fact that my fridge is more likely to be able to accurately predict what food I want to buy next week and order it for me than for some robot thing to boil me an egg.
There are also socioeconomic factors i.e. the concentration of talent working on these respective technologies. Cast your mind back to the hunting vs. car-driving problem. If we had lots and lots of engineers working on the former, we might be getting somewhere.
We are more interested in bits than atoms. People with engineering and science degrees aren’t working in labs or building machines. They’re writing code. Things that only require code to build, are far more likely to get built.