It’s a concept that has been proven accurate with the DARPA (Defense Advanced Research Projects Agency) (1) funded project revealed at the January 2015 Association for the Advancement of Artificial Intelligence’s 29th meeting (2).
A scientific team from the University of Maryland presented their paper titled, “Robot Learning Manipulation Action Plans by “Watching” Unconstrained Videos from the World Wide Web” (3).
Robots Learn by Watching YouTube
Historically, the challenge for robot development has been with the AI (Artificial Intelligence) inability to consistently and effectively duplicate human movements. For decades, perception and control for grasping have been stumbling blocks to furthering robotic development.
But now, it seems as though the University of Maryland’s team has found a solution to this problem. The goal of the project was to advance the robots’ “ability to sense visual information and turn it into action”.
This was achieved by the robots “viewing” videos of certain tasks. This learning method proved to be more cost-effective, efficient and a faster way for robots to learn how to perform tasks.
The team discovered that watching YouTube videos helped the machines achieve “advance action generation”. This meant they were able to determine the necessary action needed to perform a specific task.
In addition, the robots were able to build on what they learned and advance beyond what they were shown and taught. This is a major step forward in robot technology. While in the past robots have been able to recognize such things as objects and patterns through visual input, they’ve been unable to interpret what they were seeing.
They couldn’t connect the patterns and object to an action that was needed or determine a course of action related to the objects or how to use them.
Learning via video; however, seems to have bridged that gap. Reza Ghanadan, program manager in DARPA’s Defense Sciences Offices said,
We’ve now taken the next step to execution, where a robot processes visual cues through a manipulation action-grammar module and translates them into actions.” (4)
Robots Learn to Cook Watching YouTube Videos
The scientists explained that the robots “need computational tools” that allow them to “automatically interpret and represent human actions”. The YouTube videos provided this and more.
The robots learned by “processing unconstrained videos from the World Wide Web”. YouTube cooking videos were used to provide “robustly” generating actions that were viewed in “longer actions in video”. The longer actions allowed the robots to acquire the knowledge needed to imitate those actions. By using the cooking videos as a learning tool, the robots advanced faster and more efficiently.
The team selected “88 open-source YouTube cooking videos with unconstrained third-person view”. The paper states, “Our ultimate goal is to build a self-learning robot that is able to enrich its knowledge about fine grained manipulation actions by “watching” demo videos.” (3)
The “top 10 most common actions in cooking scenarios” used as the criteria for the cooking videos selected included:
Using grasp classification and object recognition along with the Convolutional Neural Network (CNN) based method, the team developed a system that allowed the robots to learn.
The range of grasping was from “grasping a knife to cut” to being able to “pinch a needle”. The distinction between large and small objects was taught as well as power grasps.
It wasn’t enough that the robots recognize objects and then act in doing various tasks; they needed to have an understanding of the categories and how those related to the tasks and necessary actions. That’s where the videos played a vital role, since these actions were sustained and repeated.
It was through the observation of hands during the video demonstrations, such as grasping and how those actions related to the object, that helped the robots build up these movements.
The results included:
–> 91% recognized grasping types
–> 79% accurately recognized objects
–> 83% predicted correct actions
After viewing the videos, the “robots were able to recognize, grab and manipulate the correct kitchen utensil or object and perform the demonstrated task with high accuracy—without additional human input or programming” (4).
This cognitive learning was aided by one of the most significant advancements that came from this research – “the robots’ ability to accumulate and share knowledge with others.”
Not only did the robots succeed at a high rate in performing the duties viewed in the videos, they were also able to build upon the knowledge they acquired. This cognitive ability means that the robots can continue to learn on their own. While this technology is being developed for the military to use in logistics and repairs, it has far-reaching possibilities.