What Is AutoML?

Automated machine learning, or AutoML, makes ML accessible to
non-experts by enabling them to build, validate, iterate, and explore ML models through an automated experience.
AutoML automatically prepares and cleans data, creates and picks features, picks the correct model family, optimizes
hyperparameters, and analyzes results. It also helps with data visualization, insight generation, model
explainability, and model deployment.

Why Is AutoML Important?

ML models provide businesses with valuable insights, yet the
responsibility to create the models often falls to those without extensive ML expertise. While AutoML doesn’t
replace the data scientist, it makes them more productive and enables them (and others) to automate the
code-intensive steps and focus on model testing and insights. Less experienced users (aka citizen data scientists)
often use AutoML to generate insights and as a quick way to learn about data science.

How AutoML Works

AutoML usually includes the following:

 

Data Evaluation PreProcessing
Data evaluation and pre-processing : Data is prepared, cleansed, and transformed to create a useful model-training dataset.

 

Feature Engineering
Feature engineering: New columns of data are created in the existing model-training data, which may better represent predictors in the
phenomenon described by the data or simply work better with the ML algorithms.

 

Feature Selection
Feature selection: After new features are built, AutoML picks only those that are useful in generating a model.

 

Algorithm-Selection
Algorithm selection: Competing candidate models are reviewed to select the one that best performs in terms of desired metric (E.g.,
optimizing for accuracy, recall, balanced accuracy).

 

Hyperparameter Tuning
Hyperparameter tuning: A set of optimal hyperparameters is chosen for a learning algorithm.

AutoML Examples

AutoML can help solve a myriad of business challenges including:

Personalization

Talking to a consumer base is
no longer enough. For a business to succeed, they need to be able to address each customer individually. AutoML
makes personalization more scalable by learning individual preferences and behaviors, which allows companies to
serve up personalized recommendations and content. The result is a more engaged consumer base and better sales.

Cleaning Customer Records

Spelling errors, updates, and
inconsistent information can create duplicates in a company database. AutoML makes it easy to find and correct those
outliers so data is clean, accurate, and usable.

Customer Churn

Attracting new customers is
essential to any business, but so is keeping the ones they already have. AutoML can find patterns in customer
activity to predict which ones are likely to switch to competitors. This information allows for targeted retention
efforts that can grow profits and brand value.

Fraud Detection

Fraud costs the U.S.
government about $80 billion a year. Nearly every federal agency is targeted, and there aren’t enough resources to
investigate each claim. As criminals get smarter, solutions have, too. AutoML integrates into existing systems and
utilizes data from past fraud cases to help find red flags and address issues quickly.

Getting Started with AutoML

Alteryx offers an accessible AutoML experience using a guided, educational approach that maintains the powerful
technical capabilities used by traditional data scientists. With Alteryx Machine Learning, AutoML is integrated into
every step of the data analysis process including preparation, blending, and enrichment.

At its most basic level, Alteryx Machine Learning can:

  • Automate steps of the data science and ML process
  • Train a number of predictive models on that data
  • Provide metrics about the performance of those models (E.g., receiver operating characteristics, precision,
    recall, accuracy, balance accuracy)

Beyond those functions, Alteryx features:

  • Interactive visualizations
  • Clear reporting for business stakeholders
  • The ability to deploy models to an operationalization system
  • Integrated lessons and glossaries
  • Automated training data evaluation
  • Suggestions to improve training data or automatically adjust that data