What is human-assisted machine learning? Better data and more effective models
What is human-assisted machine learning? Better data and more effective models
The human and machine learning loop uses human feedback to eliminate errors in the training data and increase the accuracy of the model.
Machine learning models are often far from perfect. When using the model’s predictions for purposes that affect people’s lives, such as a credit approval rating, a human is advised to review at least some of the predictions: those with a low confidence level, those that fall out of range, and a quality control random sample.
Additionally, the lack of good tagged (annotated) data often makes supervised learning difficult (unless you’re a professor where your students have nothing to do). One way to implement semi-supervised learning on untagged data is for people to label some data for model seeding, use high-confidence predictions to the interim model (or transfer model learning) to tag more data (self-tagging), and send predictions about low-confidence to check Human (active learning). The process can be repeated and tends to improve from transition to transition in practice.
Check also:
In short, human loop machine learning relies on human feedback to improve the quality of the data used to train machine learning models. In general, human machine learning is all about sampling good data that a human can tag (annotation), using that data to train the model, and using the model to sample more data for annotation. There are many services available to manage this process.
Amazon SageMaker Ground Truth
Amazon SageMaker offers two data description services: Amazon SageMaker Ground Truth Plus and Amazon SageMaker Ground Truth. Both options identify raw data such as images, text, and videos, and add informational labels to create high-quality training data sets for machine learning models. With Ground Truth Plus, Amazon experts set up data label workflows for you, and in the process, advance learning and automated validation of labeling are applied.
amazon augmented artificial intelligence
While Amazon SageMaker Ground Truth deals with data pre-classification, Amazon Augmented AI (Amazon A2I) provides human validation of low-confidence predictions or random prediction samples from implemented models. Enhanced AI manages both the review workflow creation and human reviewers. Integrates with AWS AI and machine learning services as well as models deployed on the Amazon SageMaker endpoint.
DataRobot – Human in the Ring
DataRobot has a modest AI feature that allows you to set rules to detect uncertain predictions, external inputs, and low-monitored areas. These rules can lead to three possible actions: no action (monitoring only); invalidate the prediction (usually with a “safe” value); or return an error (reject the prediction). DataRobot wrote documentation about human in the loop, but I couldn’t find any implementation on their site other than the rules of humility.
Google Cloud Human-in-the-Loop
Google Cloud offers Human-in-the-Loop (HITL) processing built into its Document AI services, but as in this writing, nothing for image or video processing. Currently, Google supports HITL review workflows for the following processors:
Order Processors:
Invoices
receipts
Loan Processors:
1003 Analyst
1040 Parser
1040 Analyzer Table C
1040 Table E Parser
1099-DIV Analyzer
1099-G Analyzer
1099-INT . Analyzer
1099-MISC Analyzer
bank statement analyst
Analyzed statement by HOA Analyst
Mortgage Extract Analyzer
Pay Coupon Analyzer
Retirement Analyst / Investment Statement
W2 محلل Analyzer
W9 Parser
human service programs
It can be difficult to configure human image annotations, such as image classification, object detection, and semantic segmentation, to label datasets. Fortunately, there are many good commercial and open source tools that labelers can use.
Humans in the Loop, a company that describes itself as a “social enterprise that delivers ethical workforce solutions to power the AI industry,” blogs periodically about its favorite annotation tools. In their most recent entries, they have listed 10 open source computer vision annotation tools: Label Studio, Diffgram, LabelImg, CVAT, ImageTagger, LabelMe, VIA, Make Sense, COCO Annotator, and DataTurks. These tools are often used to annotate training sets, and some can manage annotation sets.
For example, Computer Vision Annotation Tool (CVAT) is powerful, up-to-date and works in Chrome. It’s still one of the main tools we and our customers use for labeling, because it’s much faster than many tools on the market.”
CVAT README on GitHub writes, “CVAT is a free web-based interactive image and video annotation tool for computer vision. It is used by our team to annotate millions of objects with different properties. Many user interface and user experience decisions are based on feedback from Professional data annotation teams. Try it online at cvat.org.” It is necessary to create a login to run the demo.
CVAT is released as open source under the MIT license. Most of Intel’s active obligors operate in Nizhny Novgorod, Russia. The CVAT introductory video shows how the labeling process works.
As you can see, human loop processing can contribute to the machine learning process at two points: the initial creation of data sets tagged for supervised learning, and the review and correction of potentially problematic predictions when the model is run. The first use case helps smooth the model and the second case helps set the model.
Source: InfoWorld