What is Mobile Vision API and its limitations?

Shweta

“Mobile Vision” is an interesting framework by Google. This framework has the capability to detect objects in photos and videos. Yes, you read correctly! You can detect objects in videos too.

What all you can detect in photos and videos??
Currently, Android API can detect
-Face
-Barcode
-Text

IOS API can detectonly 2 things, face and barcode.

There are few APIs on the cloud for mobile, but Google’s Mobile Vision works without cloud. So now you can detect faces, barcodes, and text in image or video by having API installed on a device. No network call required to detect all these. Cool!!

Let’s talk about available detectors. Mobile vision API includes detectors which locate visual objects in images or video frames. This API returns the position of an object in images or videos. Interesting right…. 🙂 More interesting thing, you can have multiple detectors to detect all supported objects simultaneously in frames. It works real time on the device as well as already clicked and recorded videos.

3 detectors are available:
-Face detector
-Barcode detector
-Text detector

Face Detector:
-This API tracks human faces in still images, recorded video and on mobile camera.
-It tracks facial landmarks such as eyes and nose.
-It provides information about the state of human faces. Currently, android face API supports 2 classifications only: eyes open and smiling face.
-It is able to track multiple faces in a frame.
You can try face detection by steps mentioned in Google code labs for face detection.

FaceDetector

ImageSource: Google mobile vision doc

This API can be used to create hands-free controls for Games and Apps. Eg. react when a person smile or wink.

Note: This API does not support face recognition, as it does not determine 2 faces are likely to correspond to the same person.

Barcode Detector:
This API can detect barcodes on mobile camera as well as in a picture. It supports following barcode formats:

-1D barcodes: EAN-13, EAN-8, Code-39, Code-93, Code-128, UPC-A, UPC-E, ITF, Codabar
-2D barcodes: PDF-417, AZTEC, QR Code, Data Matrix
Most importantly it can detect multiple barcodes at once and work in any orientation. You can try barcode code lab to integrate it in your app.

Text Detector:
This API can detect text in any latin language (French, German, English, etc.). The developer can get the text in segments. Segments can be block, lines, and words. The image below explains the block, line, and word concepts.

text-detector

ImageSource: Google mobile vision doc

This API can help in reading business cards, data entry job, convert a text file from an image etc. This API also work on live camera. Identified text could be in any sequence.

Limitations:
-Face detector does not support face recognition. This does not have the capability to identify 2 identical faces.
-Face detector supports only 2 classifications
-Eyes open
-Smiling
-Text detector reads randomly. There is a possibility that it returns text in a different sequence from the frame text sequence. But it returns the position of the text, which can be useful to arrange them in sequence.
-Text detector is not 100% accurate

So now anyone can start with Machine Learning/Computer vision on mobile, irrespective of prior experience. Google keeps on increasing their way to provide learning materials to developers. Here are the code labs provided by Google to learn Mobile Vision API for each type of objects. Try these code labs and have fun!

Reference:

https://android-developers.googleblog.com/2016/06/android-mobile-vision-text-api.html
https://developers.google.com/vision/

Google Certified Agency

 

 

about the author

Shweta

  1. Rajesh Maurya

    October 16, 2017

    Hi Shweta, Is there any update in this api? Does this detects iris?