Futuristic model depicting facial recognition

Today’s AI Computer Vision Landscape: Popular ML Platforms

By Sergei Zheleznov

Human perception is predominantly visual, with 97% of external information received through sight, while other senses like hearing and touch contribute only 3%. Recognizing the profound impact of visual information, major tech players like Meta (Facebook), Alphabet (Google) and Microsoft have invested heavily to advance computer vision technologies. 

And these investments have paid off. For instance, Facebook’s money and computational resources have elevated its deep-learning AI facial recognition software DeepFace, which has achieved unprecedented 97.3% accuracy – nearly hitting human-level performance in face verification.

Other AI market leaders include cloud providers Microsoft (Azure), Amazon (AWS Cloud), and Google (Google Cloud). Additionally there are standalone contenders, such as IBM (Watson) and OpenAI (which is 49% owned by Microsoft, which has invested $13 billion and is OpenAI’s biggest stakeholder). 

There are of course also smaller AI companies but they aren’t making quite as much noise in the marketplace. That’s because though they have Commercial Off-The-Shelf (COTS) Computer AI solutions, they typically lack full-fledged cloud services so they are unable to implement the full spectrum of computer vision tasks. 

The most central computer vision tasks include:

  • Training custom computer vision models like object detection, allowing trained machine learning (ML) models to export and run on edge devices
  • Streaming video in real time with spatial analysis
  • Automatically captioning and classification of images
  • Deconstructing a given scene with specific notations
  • Reading text from images with optical character recognition (OCR) technology
  • Verifying identities with facial recognition

Training Machine Learning Models

MLOps engineers harness the power of cloud-based computing clusters, seamlessly integrating CPU, GPU, and FPGA technologies to effectively train machine learning models. With these clusters, engineers construct ML pipelines – cloud-based services that help train, deploy, automate, manage and track ML models. Commonly used platforms for MLOps training include GCP Vertex AI, AWS SageMaker, and Azure AI Vision, which we’ll explore in a moment.

But first, I want to briefly mention platforms like TensorFlow and PyTorch, both open source, which encapsulate deep-learning for large data sets. And also Keras, is a high-level, deep learning API developed by Google. Written in Python, Keras is used to make the implementation of neural networks easy. 

Making ML Tech Accessible to More Developers

To make machine learning technologies more accessible to software engineers lacking sophisticated ML knowledge, cloud providers offer AI development studios – user-friendly visual interfaces for ML design. The most popular are GCP Vertex AI, AWS SageMaker, and Azure AI Vision.

GCP Vertex AI is a powerful and unified machine learning (ML) platform offered by Google Cloud. It provides a streamlined and scalable solution to develop, deploy, and manage ML models.

Amazon SageMaker  is a fully managed machine learning service that allows developers to quickly build and train machine learning models, and then directly deploy them into a production-ready hosted environment.

Azure AI Vision, another easy-to-use studio, is a unified Microsoft AI SaaS solution that facilitates manual or API-based MLOps processes. Since the studio has already been supplied with pre-trained ML models, a newly initiated ML training process is completed quickly because it combines existing ML pre-trained models with new ML models. 

Azure AI Vision grants apps the ability to analyze images, read text, and detect/recognize faces via prebuilt image tagging and text extraction with optical character recognition (OCR). These ML trained models can be exported and run on edge devices to help businesses harness new Azure AI Vision capabilities in their business value delivery streamline.

Hugging Face is a ML and data science platform that developers can use to build, demo, run and deploy AI in live applications. With Hugging Face, developers can share resources, models and research, and test their work openly.

Reliable Training Dataset is Essential

The training dataset is a crucial element when establishing your machine learning system or evaluating existing ones. Whether you are developing your own ML system or testing the performance of pre-existing models, having a reliable dataset is essential for meaningful comparisons. 

For instance, if your goal is to identify faces, it's imperative to ensure that your system accurately recognizes the presence of a face in an image and correctly identifies its absence when there is no face. To achieve this, a well-curated dataset is indispensable. Fortunately, there are numerous free datasets available, covering a variety of subjects. Many of these datasets focus on faces, as face recognition represents a highly sought-after capability in the market.

Here’s an example of the dataset with faces. CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. 

Enhancing Your ML and Computer Visions Skills

Software engineers with limited ML expertise need a combination of educational resources, user-friendly tools, practical use cases, and community support to build foundational knowledge. The goal is to create an environment where developers can easily dip their toes into machine learning, gradually build expertise, and contribute meaningfully to ML-driven projects.

If you enjoyed this blog, I invite you to read some of my other content here.