Design World

  • Home
  • Technologies
    • ELECTRONICS • ELECTRICAL
    • Fastening • joining
    • FLUID POWER
    • LINEAR MOTION
    • MOTION CONTROL
    • SENSORS
    • TEST & MEASUREMENT
    • Factory automation
    • Warehouse automation
    • DIGITAL TRANSFORMATION
  • Learn
    • Tech Toolboxes
    • Learning center
    • eBooks • Tech Tips
    • Podcasts
    • Videos
    • Webinars • general engineering
    • Webinars • Automated warehousing
    • Voices
  • LEAP Awards
  • 2025 Leadership
    • 2024 Winners
    • 2023 Winners
    • 2022 Winners
    • 2021 Winners
  • Design Guides
  • Resources
    • Subscribe
    • 3D Cad Models
      • PARTsolutions
      • TraceParts
    • Digital Issues
      • Design World
      • EE World
    • Educational Assets
    • Engineering diversity
    • Trends
  • Supplier Listings
  • Advertise
  • Subscribe

Algorithm Enables Computers to Identify Actions More Efficiently

By atesmeh | May 14, 2014

With the commodification of digital cameras, digital video has become so easy to produce that human beings can have trouble keeping up with it. Among the tools that computer scientists are developing to make the profusion of video more useful are algorithms for activity recognition — or determining what the people on camera are doing when.

At the Conference on Computer Vision and Pattern Recognition in June, Hamed Pirsiavash, a postdoc at MIT, and his former thesis advisor, Deva Ramanan of the University of California at Irvine, will present a new activity-recognition algorithm that has several advantages over its predecessors.

One is that the algorithm’s execution time scales linearly with the size of the video file it’s searching. That means that if one file is 10 times the size of another, the new algorithm will take 10 times as long to search it — not 1,000 times as long, as some earlier algorithms would.

Another is that the algorithm is able to make good guesses about partially completed actions, so it can handle streaming video. Partway through an action, it will issue a probability that the action is of the type that it’s looking for. It may revise that probability as the video continues, but it doesn’t have to wait until the action is complete to assess it.

Finally, the amount of memory the algorithm requires is fixed, regardless of how many frames of video it’s already reviewed. That means that, unlike many of its predecessors, it can handle video streams of any length (or files of any size).

The grammar of action

Enabling all of these advances is the appropriation of a type of algorithm used in natural language processing, the computer science discipline that seeks techniques for interpreting sentences written in natural language.

“One of the challenging problems they try to solve is, if you have a sentence, you want to basically parse the sentence, saying what is the subject, what is the verb, what is the adverb,” Pirsiavash says. “We see an analogy here, which is, if you have a complex action — like making tea or making coffee — that has some subactions, we can basically stitch together these subactions and look at each one as something like verb, adjective, and adverb.”

On that analogy, the rules defining relationships between subactions are like rules of grammar. When you make tea, for instance, it doesn’t matter whether you first put the teabag in the cup or put the kettle on the stove. But it’s essential that you put the kettle on the stove before pouring the water into the cup. Similarly, in a given language, it could be the case that nouns can either precede or follow verbs, but that adjectives must always precede nouns.

For any given action, Pirsiavash and Ramanan’s algorithm must thus learn a new “grammar.” And the mechanism that it uses is the one that many natural-language-processing systems rely on: machine learning. Pirsiavash and Ramanan feed their algorithm training examples of videos depicting a particular action, and specify the number of subactions that the algorithm should look for. But they don’t give it any information about what those subactions are, or what the transitions between them look like.

Pruning possibilities

The rules relating subactions are the key to the algorithm’s efficiency. As a video plays, the algorithm constructs a set of hypotheses about which subactions are being depicted where, and it ranks them according to probability. It can’t limit itself to a single hypothesis, as each new frame could require it to revise its probabilities. But it can eliminate hypotheses that don’t conform to its grammatical rules, which dramatically limits the number of possibilities it has to canvass.

The researchers tested their algorithm on eight different types of athletic endeavor — such as weightlifting and bowling — with training videos culled from YouTube. They found that, according to metrics standard in the field of computer vision, their algorithm identified new instances of the same activities more accurately than its predecessors.

Pirsiavash is particularly interested in possible medical applications of action detection. The proper execution of physical-therapy exercises, for instance, could have a grammar that’s distinct from improper execution; similarly, the return of motor function in patients with neurological damage could be identified by its unique grammar. Action-detection algorithms could also help determine whether, for instance, elderly patients remembered to take their medication — and issue alerts if they didn’t.

“We’ve known for a very long time that the things that people do are made up of subactivities,” says David Forsyth, a professor of computer science at the University of Illinois at Urbana-Champaign. “The problem is we don’t know what the pieces are, and nobody has given us labeled training data saying, ‘There are two pieces in a dive and seven pieces in a weightlifting and 21 pieces in a hammer throw and these are their names.’ What we need is methods that can look at a whole bunch of examples of, say, weightlifting, and say, ‘Well, this piece occurs again and again in about this place,’ where ‘this place’ doesn’t so much mean three minutes after the start of the video as before this and after that.”

“There’s a fairly substantial open problem here,” Forsyth adds. “I wouldn’t have said that the open problem has completely gone away, but the method itself is very powerful, and there’s been a strong sense in the community for some time that something of this sort should work very well. This is certainly the best of this sort that I’ve seen.”

You Might Also Like


Filed Under: Rapid prototyping

 

LEARNING CENTER

Design World Learning Center
“dw
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, tools and strategies for Design Engineering Professionals.
Motor University

Design World Digital Edition

cover

Browse the most current issue of Design World and back issues in an easy to use high quality format. Clip, share and download with the leading design engineering magazine today.

EDABoard the Forum for Electronics

Top global problem solving EE forum covering Microcontrollers, DSP, Networking, Analog and Digital Design, RF, Power Electronics, PCB Routing and much more

EDABoard: Forum for electronics

Sponsored Content

  • Widening the scope for machine tool designers with FORTiS™ enclosed encoder
  • Sustainability, Innovation and Safety, Central to Our Approach
  • Why off-highway is the sweet spot for AC electrification technology
  • Looking to 2025: Past Success Guides Future Achievements
  • North American Companies Seek Stronger Ties with Italian OEMs
  • Adapt and Evolve
View More >>
Engineering Exchange

The Engineering Exchange is a global educational networking community for engineers.

Connect, share, and learn today »

Design World
  • About us
  • Contact
  • Manage your Design World Subscription
  • Subscribe
  • Design World Digital Network
  • Control Engineering
  • Consulting-Specifying Engineer
  • Plant Engineering
  • Engineering White Papers
  • Leap Awards

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search Design World

  • Home
  • Technologies
    • ELECTRONICS • ELECTRICAL
    • Fastening • joining
    • FLUID POWER
    • LINEAR MOTION
    • MOTION CONTROL
    • SENSORS
    • TEST & MEASUREMENT
    • Factory automation
    • Warehouse automation
    • DIGITAL TRANSFORMATION
  • Learn
    • Tech Toolboxes
    • Learning center
    • eBooks • Tech Tips
    • Podcasts
    • Videos
    • Webinars • general engineering
    • Webinars • Automated warehousing
    • Voices
  • LEAP Awards
  • 2025 Leadership
    • 2024 Winners
    • 2023 Winners
    • 2022 Winners
    • 2021 Winners
  • Design Guides
  • Resources
    • Subscribe
    • 3D Cad Models
      • PARTsolutions
      • TraceParts
    • Digital Issues
      • Design World
      • EE World
    • Educational Assets
    • Engineering diversity
    • Trends
  • Supplier Listings
  • Advertise
  • Subscribe
We use cookies to personalize content and ads, to provide social media features, and to analyze our traffic. We share information about your use of our site with our social media, advertising, and analytics partners who may combine it with other information you’ve provided to them or that they’ve collected from your use of their services. You consent to our cookies if you continue to use this website.