Using a training technique commonly used to teach dogs to sit and stay, computer scientists from Johns Hopkins University showed a robot how to learn some new tricks, including stacking blocks. With this method, the robot called Spot was able to learn in days, which usually takes a month.
By using positive reinforcement, an approach familiar to anyone who uses treats to change a dog’s behavior, the team improved the robot’s skills dramatically, and did so quickly enough to make training robots ready for work in the real world to make it a more workable company. The results were re-published in an article entitled “Good Robot!” Released.
“The question here was how do we get the robot to learn a skill?” said lead author Andrew Hundt, a graduate student who works in the Johns Hopkins Computer Interaction and Robotics Laboratory. “I’ve had dogs, so I know rewards work, and that was the inspiration behind the development of the learning algorithm.”
Unlike humans and animals, who are born with very intuitive brains, computers are empty slates and have to learn everything from scratch. But true learning is often achieved through trial and error, and robotics are still figuring out how robots can efficiently learn from their mistakes.
The team achieved this by developing a reward system that works for a robot the way work is treated for a dog. Where a dog could get a cookie for a job well done, the robot earned numerical points.
Hundt remembered once teaching his terrier mix puppy Leah the command to “leave it” so she could ignore squirrels on walks. He used two kinds of treats, ordinary coach treats and something even better, like cheese. When Leah got excited and sniffed at the treats, she got nothing. But when she calmed down and looked away, she got the good stuff. “So I gave her the cheese and said, ‘Leave it! Good Leah!'”
In order to stack blocks, Spot needed to know the robot how to focus on constructive actions. As the robot explored the blocks, it quickly found that correct stacking behavior earned high scores, but incorrect behavior scored high. Grab but not grab a block? No points. Knock over a pile? Definitely not points. Spot made the most money by putting the last block on a pile of four blocks.
Not only did the training tactic work, it only took a few days to teach the robot what used to take weeks. The team was able to reduce practice time by first training a simulated robot that is very similar to a video game and then running tests with Spot.
“The robot wants the higher number of points,” said Hundt. “It quickly learns the right behavior to get the best reward. In fact, it took the robot a month to get 100% accuracy. We could do this in two days.”
Positive reinforcement didn’t just help the robot learn how to stack blocks. With the scoring system, the robot learned several other tasks just as quickly – even playing a simulated navigation game. The ability to learn from mistakes in all kinds of situations is critical to developing a robot that can adapt to new environments.
“At the beginning the robot has no idea what it is doing, but it gets better and better with every workout. It never gives up and keeps trying to stack up and can do the task 100% of the time,” said Hundt.
The team envisions that these findings could help train household robots to do laundry and wash dishes – tasks that could be popular in the open market and help seniors live independently. It could also help design improved self-driving cars.
“Our goal is ultimately to develop robots that can perform complex tasks in the real world – such as product assembly, geriatric care and surgery,” said Hager. “We don’t currently know how to program such tasks – the world is too complex. But this work shows us that the idea that robots can learn how to do such real-world tasks safely and efficiently is promising way.”
The team and co-authors also included Johns Hopkins alumni Benjamin Killeen, Nicholas Greene, Heeyeon Kwon, and Hongtao Wu; former PhD student Chris Paxton; and Gregory D. Hager, professor of computer science.
This story was originally published by Johns Hopkins University. Reprinted with permission.