Machine Learning Overview

joel.larin — Thu, 11 Jun 2026 21:48:36 GMT

Current Revision posted to Article by joel.larin on 6/11/2026 9:48:36 PM

Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. Refer to the article for Amazon machine learning integrations that have been written about in detail in Appian's documentation.

What is Machine Learning?

Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning is a subset of AI that focuses on the development of algorithms and models that enable computers to learn and make decisions based on data. The models can be thought of as black boxes that are created by processing many observations both supervised and semi-supervised. These machine learning models are then able to take in one or many observations without a known outcome and produce possible outcomes based on their probabilities.

There are many different use cases and applications for machine learning. This article mainly focuses on machine learning technology that analyzes structured data, such as rows of an Excel spreadsheet or an Appian CDT, and delivers a prediction for a specific field or column in the data. The feature, value, or attribute that is being predicted for is often referred to as the target. Within the context of Appian, we’ll dive into the practical implementation of AI features that integrate with applications.

Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as IBM's Watson and Google AI respectively.

Appian ML/AI Capabilities

Appian AI Skills facilitate the integration of machine learning and AI capabilities into your application. This is done using a variety of low-code design objects, functions and smart services. Features available within Appian AI Skills include document and email classification with custom-built models, and document extraction with pre-trained models.

Classification models can be custom-built, including being trained and tested using data that will accurately reflect your use case. The Document Extraction AI skill identifies data from PDF documents, extracting and saving data into key-value pairs that can be used within the application or saved within a database.

Appian AI Skills offer pre-trained models that use built-in documentation extraction capabilities.

Pre-trained models in Appian are designed for general use cases and are used in documents that have similar information and labeled values (e.g. structured or semi-structured documents). Incorporating Google AI functionalities into your Appian application enables the integration of various features, including but not limited to natural language processing, translation services, cloud-based storage, and more. See Using Google AI Services for a full list of features available.

Note that starting from January 23, 2024, Appian is no longer selling Appian-provisioned Google credentials to customers. Customers have to purchase the license directly through Google and add their Google credentials to their Appian Admin console.

Appian AI Copilot is a starting point to further AI capabilities using Appian. AI Copilot utilizes generative AI to create functional interfaces by generating an initial interface from the fields in your form through a simple pdf upload. AI Copilot is integrated with Azure OpenAI to enable this functionally in your application. Azure OpenAI leverages generative AI models (e.g. gpt-3, codex, dall-e, chatgpt) to provide writing assistance, content generation, etc. You can use AI Copilot to build interfaces directly from a pdf, resulting in a personalized product that can be further customized according to your specific requirements once the initial interface is generated.

Machine learning, particularly deep learning, is one of the fundamental components of generative AI. Similarly to other machine learning models, generative AI models undergo training with large amounts of data that aids in identifying inherent patterns. The generative AI model is fine-tuned and enhanced with the introduction of more data over time. Leveraging AI with Appian allows you to automate repetitive tasks and simplify processes, streamlining development and increasing efficiency and productivity.

Common Model Types

There are two major categories of model types that are used for making machine learning predictions on structured data:

Regression: predicts a numeric value.
Classification: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
1. Binary classification: the model has only two prediction values to choose from (ex. true and false).
2. Multiclassification: the model has more than two prediction values to choose from.

Which type you utilize is dependent on the target attribute you want to predict for and your overall objective in creating the model. Read the sections below to learn more about the purpose of each model type and see examples describing appropriate uses of each one.

Regression

Regression models make predictions along a continuous range of numerical values. They have many important use cases (examples below), but can't be used in cases where binary, categorical, or non-numeric values are required without additional processing to the model’s output.
The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). The RMSE represents the standard deviation between predicted and actual values; thus a good RMSE is relative to the range of values you are trying to predict. A perfect model would have a RMSE of 0.
When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.

Regression models can be used to predict:

The sale price of a home, given information about the home's size, number of bedrooms, zip code, etc.
The appropriate salary for a job posting, given information about that job's difficulty and expected characteristics of qualified candidates.
The number of viewers who will watch the premiere of a new TV series, given information about the show's genre and cast.

Binary Classification

Binary classification models predict for a value that has only two possible outcomes.
A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5, then the predicted value will be true. However, machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.
The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.

Binary classification models can be used to predict:

Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.
Whether a loan application should be approved or rejected, given credit details about the applicant.
Whether someone will sign up for a service, given their demographics.
Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account's typical usage patterns.

Multiclassification

Multiclass models predict for a categorical value from a list of three or more discrete, finite possibilities.
A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model's best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.
Since the target attribute's possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.
The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.
Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.
To make predictions from a group of possibilities that is larger than a machine learning tool's limit, consider using a series of different models. For example, to classify animal species from an image, better results can be achieved by first training the model for a more general classification (e.g. feline, canine, rodent). Additional models can be trained to identify specific species.

Multiclass classification models can be used to predict:

Which category of car—sedan, truck or SUV—someone is likely to purchase, given their demographics.
A book's genre, given information about the book's author, length, characters, storyline, etc.

Appian AI Skills Use Case: Email Classification

The client receives thousands of emails everyday for customer support. Employees manually forward these emails to appropriate departments and locations based on a review of the email description and the customer's location. This process is time consuming and prone to human error. The client can automate this process using the Email Classification AI Skill that combines machine learning and automation. For the new model to be effective, the client must upload a "training set" consisting of a diverse set of emails which includes multiple examples for all desired email routing options.Once the model is trained and tested, the client can publish the model to make it available for use through the Classify Emails smart service.

Model Types Summary

Model	Prediction Type	Common Performance Metrics	Example
Regression	Predicts a numeric value	Root Mean Square Error (RMSE) Mean Absolute Error (MSE)	Predicting a home's sale price
Binary Classification	Predicts binary values (ex. true or false)	Area Under the Curve (AUC)	Predicting whether a job candidate should be offered employment
Multiclass Classification	Predicts values that belong to a limited, predefined set of permissible values	F1 Score Log Loss	Predicting a book's genre

Training Data

To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values of input data and the target attribute. The model ultimately applies the associations and patterns it found in the training data to make predictions for novel input data. There is a common adage that a model is “only as good as its training data''. If the training data is not a representative sample of the data against which it will be making predictions, the model’s performance will suffer.

Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car. In this use case, the column marked "Sale Price" would be identified to the model as the target attribute to predict for.

Year	Make	Model	Color	Transmission	Mileage	Previous Owners	Sale Price
1997	Ford	Mustang	Silver	Automatic	201,298	3	1,499
2013	Mazda	3	Black	Automatic	60,588	1	8,100
2005	Honda	Element	Red	Automatic	160,378	2	4,760
2009	Toyota	Camry	Blue	Manual	87,380	1	7,290

The details about how data should be ordered, formatted and uploaded to a machine learning tool for training vary depending on the specific tool being used, so refer to your tool's documentation for specific information about appropriately presenting data.

Best Practices and Tips for Training Data

The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be. This is applicable when a diverse and balanced (e.g. data between class A and B are split equally) set of training data is provided to avoid bias when making predictions. Google has documentation and a video regarding bias and machine learning that is helpful for learning more about this topic.
To the greatest extent possible, provide training data that resembles the data you expect to see in production.
Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool's documentation for more details.
Some tools allow you modify the weight given to specific columns during training, or specify a "time" column if training data values are influenced by time. Read your tool's documentation for more details.

See Also

Websites:

Best Practices for Creating Training Data

Videos:

Tags: Machine Learning, Architecture

Machine Learning Overview

joel.larin — Fri, 14 Jun 2024 18:53:20 GMT

Revision 15 posted to Article by joel.larin on 6/14/2024 6:53:20 PM

What is Machine Learning?

Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as IBM's Watson and Google AI respectively.

Appian ML/AI Capabilities

Appian AI Skills offer pre-trained models that use built-in documentation extraction capabilities.

Common Model Types

There are two major categories of model types that are used for making machine learning predictions on structured data:

Regression: predicts a numeric value.
Classification: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
1. Binary classification: the model has only two prediction values to choose from (ex. true and false).
2. Multiclassification: the model has more than two prediction values to choose from.

Regression

Regression models make predictions along a continuous range of numerical values. They have many important use cases (examples below), but can't be used in cases where binary, categorical, or non-numeric values are required without additional processing to the model’s output.
The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). The RMSE represents the standard deviation between predicted and actual values; thus a good RMSE is relative to the range of values you are trying to predict. A perfect model would have a RMSE of 0.
When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.

Regression models can be used to predict:

The sale price of a home, given information about the home's size, number of bedrooms, zip code, etc.
The appropriate salary for a job posting, given information about that job's difficulty and expected characteristics of qualified candidates.
The number of viewers who will watch the premiere of a new TV series, given information about the show's genre and cast.

Binary Classification

Binary classification models predict for a value that has only two possible outcomes.
A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5, then the predicted value will be true. However, machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.
The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.

Binary classification models can be used to predict:

Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.
Whether a loan application should be approved or rejected, given credit details about the applicant.
Whether someone will sign up for a service, given their demographics.
Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account's typical usage patterns.

Multiclassification

Multiclass models predict for a categorical value from a list of three or more discrete, finite possibilities.
A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model's best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.
Since the target attribute's possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.
The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.
Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.
To make predictions from a group of possibilities that is larger than a machine learning tool's limit, consider using a series of different models. For example, to classify animal species from an image, better results can be achieved by first training the model for a more general classification (e.g. feline, canine, rodent). Additional models can be trained to identify specific species.

Multiclass classification models can be used to predict:

Which category of car—sedan, truck or SUV—someone is likely to purchase, given their demographics.
A book's genre, given information about the book's author, length, characters, storyline, etc.

Appian AI Skills Use Case: Email Classification

Model Types Summary

Model	Prediction Type	Common Performance Metrics	Example
Regression	Predicts a numeric value	Root Mean Square Error (RMSE) Mean Absolute Error (MSE)	Predicting a home's sale price
Binary Classification	Predicts binary values (ex. true or false)	Area Under the Curve (AUC)	Predicting whether a job candidate should be offered employment
Multiclass Classification	Predicts values that belong to a limited, predefined set of permissible values	F1 Score Log Loss	Predicting a book's genre

Training Data

Year	Make	Model	Color	Transmission	Mileage	Previous Owners	Sale Price
1997	Ford	Mustang	Silver	Automatic	201,298	3	1,499
2013	Mazda	3	Black	Automatic	60,588	1	8,100
2005	Honda	Element	Red	Automatic	160,378	2	4,760
2009	Toyota	Camry	Blue	Manual	87,380	1	7,290

Best Practices and Tips for Training Data

The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be. This is applicable when a diverse and balanced (e.g. data between class A and B are split equally) set of training data is provided to avoid bias when making predictions. Google has documentation and a video regarding bias and machine learning that is helpful for learning more about this topic.
To the greatest extent possible, provide training data that resembles the data you expect to see in production.
Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool's documentation for more details.
Some tools allow you modify the weight given to specific columns during training, or specify a "time" column if training data values are influenced by time. Read your tool's documentation for more details.

See Also

Websites:

Best Practices for Creating Training Data

Videos:

Tags: Platform, Machine Learning, Architecture

Machine Learning Overview

joel.larin — Fri, 14 Jun 2024 18:51:38 GMT

Revision 14 posted to Article by joel.larin on 6/14/2024 6:51:38 PM

This is a test for informational boxes

What is Machine Learning?

Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as IBM's Watson and Google AI respectively.

Appian ML/AI Capabilities

Appian AI Skills offer pre-trained models that use built-in documentation extraction capabilities.

Common Model Types

There are two major categories of model types that are used for making machine learning predictions on structured data:

Regression: predicts a numeric value.
Classification: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
1. Binary classification: the model has only two prediction values to choose from (ex. true and false).
2. Multiclassification: the model has more than two prediction values to choose from.

Regression

Regression models make predictions along a continuous range of numerical values. They have many important use cases (examples below), but can't be used in cases where binary, categorical, or non-numeric values are required without additional processing to the model’s output.
The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). The RMSE represents the standard deviation between predicted and actual values; thus a good RMSE is relative to the range of values you are trying to predict. A perfect model would have a RMSE of 0.
When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.

Regression models can be used to predict:

The sale price of a home, given information about the home's size, number of bedrooms, zip code, etc.
The appropriate salary for a job posting, given information about that job's difficulty and expected characteristics of qualified candidates.
The number of viewers who will watch the premiere of a new TV series, given information about the show's genre and cast.

Binary Classification

Binary classification models predict for a value that has only two possible outcomes.
A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5, then the predicted value will be true. However, machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.
The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.

Binary classification models can be used to predict:

Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.
Whether a loan application should be approved or rejected, given credit details about the applicant.
Whether someone will sign up for a service, given their demographics.
Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account's typical usage patterns.

Multiclassification

Multiclass models predict for a categorical value from a list of three or more discrete, finite possibilities.
A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model's best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.
Since the target attribute's possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.
The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.
Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.
To make predictions from a group of possibilities that is larger than a machine learning tool's limit, consider using a series of different models. For example, to classify animal species from an image, better results can be achieved by first training the model for a more general classification (e.g. feline, canine, rodent). Additional models can be trained to identify specific species.

Multiclass classification models can be used to predict:

Which category of car—sedan, truck or SUV—someone is likely to purchase, given their demographics.
A book's genre, given information about the book's author, length, characters, storyline, etc.

Appian AI Skills Use Case: Email Classification

Model Types Summary

Model	Prediction Type	Common Performance Metrics	Example
Regression	Predicts a numeric value	Root Mean Square Error (RMSE) Mean Absolute Error (MSE)	Predicting a home's sale price
Binary Classification	Predicts binary values (ex. true or false)	Area Under the Curve (AUC)	Predicting whether a job candidate should be offered employment
Multiclass Classification	Predicts values that belong to a limited, predefined set of permissible values	F1 Score Log Loss	Predicting a book's genre

Training Data

Year	Make	Model	Color	Transmission	Mileage	Previous Owners	Sale Price
1997	Ford	Mustang	Silver	Automatic	201,298	3	1,499
2013	Mazda	3	Black	Automatic	60,588	1	8,100
2005	Honda	Element	Red	Automatic	160,378	2	4,760
2009	Toyota	Camry	Blue	Manual	87,380	1	7,290

Best Practices and Tips for Training Data

The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be. This is applicable when a diverse and balanced (e.g. data between class A and B are split equally) set of training data is provided to avoid bias when making predictions. Google has documentation and a video regarding bias and machine learning that is helpful for learning more about this topic.
To the greatest extent possible, provide training data that resembles the data you expect to see in production.
Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool's documentation for more details.
Some tools allow you modify the weight given to specific columns during training, or specify a "time" column if training data values are influenced by time. Read your tool's documentation for more details.

See Also

Websites:

Best Practices for Creating Training Data

Videos:

Tags: Platform, Machine Learning, Architecture

Machine Learning Overview

joel.larin — Fri, 14 Jun 2024 17:54:55 GMT

Revision 13 posted to Article by joel.larin on 6/14/2024 5:54:55 PM

What is Machine Learning?

Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as IBM's Watson and Google AI respectively.

Appian ML/AI Capabilities

Appian AI Skills offer pre-trained models that use built-in documentation extraction capabilities.

Common Model Types

There are two major categories of model types that are used for making machine learning predictions on structured data:

Regression: predicts a numeric value.
Classification: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
1. Binary classification: the model has only two prediction values to choose from (ex. true and false).
2. Multiclassification: the model has more than two prediction values to choose from.

Regression

Regression models make predictions along a continuous range of numerical values. They have many important use cases (examples below), but can't be used in cases where binary, categorical, or non-numeric values are required without additional processing to the model’s output.
The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). The RMSE represents the standard deviation between predicted and actual values; thus a good RMSE is relative to the range of values you are trying to predict. A perfect model would have a RMSE of 0.
When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.

Regression models can be used to predict:

The sale price of a home, given information about the home's size, number of bedrooms, zip code, etc.
The appropriate salary for a job posting, given information about that job's difficulty and expected characteristics of qualified candidates.
The number of viewers who will watch the premiere of a new TV series, given information about the show's genre and cast.

Binary Classification

Binary classification models predict for a value that has only two possible outcomes.
A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5, then the predicted value will be true. However, machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.
The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.

Binary classification models can be used to predict:

Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.
Whether a loan application should be approved or rejected, given credit details about the applicant.
Whether someone will sign up for a service, given their demographics.
Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account's typical usage patterns.

Multiclassification

Multiclass models predict for a categorical value from a list of three or more discrete, finite possibilities.
A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model's best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.
Since the target attribute's possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.
The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.
Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.
To make predictions from a group of possibilities that is larger than a machine learning tool's limit, consider using a series of different models. For example, to classify animal species from an image, better results can be achieved by first training the model for a more general classification (e.g. feline, canine, rodent). Additional models can be trained to identify specific species.

Multiclass classification models can be used to predict:

Which category of car—sedan, truck or SUV—someone is likely to purchase, given their demographics.
A book's genre, given information about the book's author, length, characters, storyline, etc.

Appian AI Skills Use Case: Email Classification

Model Types Summary

Model	Prediction Type	Common Performance Metrics	Example
Regression	Predicts a numeric value	Root Mean Square Error (RMSE) Mean Absolute Error (MSE)	Predicting a home's sale price
Binary Classification	Predicts binary values (ex. true or false)	Area Under the Curve (AUC)	Predicting whether a job candidate should be offered employment
Multiclass Classification	Predicts values that belong to a limited, predefined set of permissible values	F1 Score Log Loss	Predicting a book's genre

Training Data

Year	Make	Model	Color	Transmission	Mileage	Previous Owners	Sale Price
1997	Ford	Mustang	Silver	Automatic	201,298	3	1,499
2013	Mazda	3	Black	Automatic	60,588	1	8,100
2005	Honda	Element	Red	Automatic	160,378	2	4,760
2009	Toyota	Camry	Blue	Manual	87,380	1	7,290

Best Practices and Tips for Training Data

The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be. This is applicable when a diverse and balanced (e.g. data between class A and B are split equally) set of training data is provided to avoid bias when making predictions. Google has documentation and a video regarding bias and machine learning that is helpful for learning more about this topic.
To the greatest extent possible, provide training data that resembles the data you expect to see in production.
Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool's documentation for more details.
Some tools allow you modify the weight given to specific columns during training, or specify a "time" column if training data values are influenced by time. Read your tool's documentation for more details.

See Also

Websites:

Best Practices for Creating Training Data

Videos:

Tags: Platform, Machine Learning, Architecture

Machine Learning Overview

joel.larin — Fri, 14 Jun 2024 17:54:20 GMT

Revision 12 posted to Article by joel.larin on 6/14/2024 5:54:20 PM

This is a test for informational boxes

What is Machine Learning?

Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as IBM's Watson and Google AI respectively.

Appian ML/AI Capabilities

Appian AI Skills offer pre-trained models that use built-in documentation extraction capabilities.

Common Model Types

There are two major categories of model types that are used for making machine learning predictions on structured data:

Regression: predicts a numeric value.
Classification: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
1. Binary classification: the model has only two prediction values to choose from (ex. true and false).
2. Multiclassification: the model has more than two prediction values to choose from.

Regression

Regression models make predictions along a continuous range of numerical values. They have many important use cases (examples below), but can't be used in cases where binary, categorical, or non-numeric values are required without additional processing to the model’s output.
The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). The RMSE represents the standard deviation between predicted and actual values; thus a good RMSE is relative to the range of values you are trying to predict. A perfect model would have a RMSE of 0.
When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.

Regression models can be used to predict:

The sale price of a home, given information about the home's size, number of bedrooms, zip code, etc.
The appropriate salary for a job posting, given information about that job's difficulty and expected characteristics of qualified candidates.
The number of viewers who will watch the premiere of a new TV series, given information about the show's genre and cast.

Binary Classification

Binary classification models predict for a value that has only two possible outcomes.
A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5, then the predicted value will be true. However, machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.
The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.

Binary classification models can be used to predict:

Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.
Whether a loan application should be approved or rejected, given credit details about the applicant.
Whether someone will sign up for a service, given their demographics.
Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account's typical usage patterns.

Multiclassification

Multiclass models predict for a categorical value from a list of three or more discrete, finite possibilities.
A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model's best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.
Since the target attribute's possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.
The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.
Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.
To make predictions from a group of possibilities that is larger than a machine learning tool's limit, consider using a series of different models. For example, to classify animal species from an image, better results can be achieved by first training the model for a more general classification (e.g. feline, canine, rodent). Additional models can be trained to identify specific species.

Multiclass classification models can be used to predict:

Which category of car—sedan, truck or SUV—someone is likely to purchase, given their demographics.
A book's genre, given information about the book's author, length, characters, storyline, etc.

Appian AI Skills Use Case: Email Classification

Model Types Summary

Model	Prediction Type	Common Performance Metrics	Example
Regression	Predicts a numeric value	Root Mean Square Error (RMSE) Mean Absolute Error (MSE)	Predicting a home's sale price
Binary Classification	Predicts binary values (ex. true or false)	Area Under the Curve (AUC)	Predicting whether a job candidate should be offered employment
Multiclass Classification	Predicts values that belong to a limited, predefined set of permissible values	F1 Score Log Loss	Predicting a book's genre

Training Data

Year	Make	Model	Color	Transmission	Mileage	Previous Owners	Sale Price
1997	Ford	Mustang	Silver	Automatic	201,298	3	1,499
2013	Mazda	3	Black	Automatic	60,588	1	8,100
2005	Honda	Element	Red	Automatic	160,378	2	4,760
2009	Toyota	Camry	Blue	Manual	87,380	1	7,290

Best Practices and Tips for Training Data

The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be. This is applicable when a diverse and balanced (e.g. data between class A and B are split equally) set of training data is provided to avoid bias when making predictions. Google has documentation and a video regarding bias and machine learning that is helpful for learning more about this topic.
To the greatest extent possible, provide training data that resembles the data you expect to see in production.
Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool's documentation for more details.
Some tools allow you modify the weight given to specific columns during training, or specify a "time" column if training data values are influenced by time. Read your tool's documentation for more details.

See Also

Websites:

Best Practices for Creating Training Data

Videos:

Tags: Platform, Machine Learning, Architecture

Machine Learning Overview

Appian Max Team — Fri, 31 May 2024 14:37:35 GMT

Revision 11 posted to Article by Appian Max Team on 5/31/2024 2:37:35 PM

What is Machine Learning?

Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as IBM's Watson and Google AI respectively.

Appian ML/AI Capabilities

Appian AI Skills offer pre-trained models that use built-in documentation extraction capabilities.

Common Model Types

There are two major categories of model types that are used for making machine learning predictions on structured data:

Regression: predicts a numeric value.
Classification: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
1. Binary classification: the model has only two prediction values to choose from (ex. true and false).
2. Multiclassification: the model has more than two prediction values to choose from.

Regression

Regression models make predictions along a continuous range of numerical values. They have many important use cases (examples below), but can't be used in cases where binary, categorical, or non-numeric values are required without additional processing to the model’s output.
The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). The RMSE represents the standard deviation between predicted and actual values; thus a good RMSE is relative to the range of values you are trying to predict. A perfect model would have a RMSE of 0.
When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.

Regression models can be used to predict:

The sale price of a home, given information about the home's size, number of bedrooms, zip code, etc.
The appropriate salary for a job posting, given information about that job's difficulty and expected characteristics of qualified candidates.
The number of viewers who will watch the premiere of a new TV series, given information about the show's genre and cast.

Binary Classification

Binary classification models predict for a value that has only two possible outcomes.
A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5, then the predicted value will be true. However, machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.
The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.

Binary classification models can be used to predict:

Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.
Whether a loan application should be approved or rejected, given credit details about the applicant.
Whether someone will sign up for a service, given their demographics.
Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account's typical usage patterns.

Multiclassification

Multiclass models predict for a categorical value from a list of three or more discrete, finite possibilities.
A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model's best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.
Since the target attribute's possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.
The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.
Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.
To make predictions from a group of possibilities that is larger than a machine learning tool's limit, consider using a series of different models. For example, to classify animal species from an image, better results can be achieved by first training the model for a more general classification (e.g. feline, canine, rodent). Additional models can be trained to identify specific species.

Multiclass classification models can be used to predict:

Which category of car—sedan, truck or SUV—someone is likely to purchase, given their demographics.
A book's genre, given information about the book's author, length, characters, storyline, etc.

Appian AI Skills Use Case: Email Classification

Model Types Summary

Model	Prediction Type	Common Performance Metrics	Example
Regression	Predicts a numeric value	Root Mean Square Error (RMSE) Mean Absolute Error (MSE)	Predicting a home's sale price
Binary Classification	Predicts binary values (ex. true or false)	Area Under the Curve (AUC)	Predicting whether a job candidate should be offered employment
Multiclass Classification	Predicts values that belong to a limited, predefined set of permissible values	F1 Score Log Loss	Predicting a book's genre

Training Data

Year	Make	Model	Color	Transmission	Mileage	Previous Owners	Sale Price
1997	Ford	Mustang	Silver	Automatic	201,298	3	1,499
2013	Mazda	3	Black	Automatic	60,588	1	8,100
2005	Honda	Element	Red	Automatic	160,378	2	4,760
2009	Toyota	Camry	Blue	Manual	87,380	1	7,290

Best Practices and Tips for Training Data

The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be. This is applicable when a diverse and balanced (e.g. data between class A and B are split equally) set of training data is provided to avoid bias when making predictions. Google has documentation and a video regarding bias and machine learning that is helpful for learning more about this topic.
To the greatest extent possible, provide training data that resembles the data you expect to see in production.
Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool's documentation for more details.
Some tools allow you modify the weight given to specific columns during training, or specify a "time" column if training data values are influenced by time. Read your tool's documentation for more details.

See Also

Websites:

Best Practices for Creating Training Data

Videos:

Tags: Platform, Machine Learning, Architecture

Machine Learning Overview

matt.cosenza — Fri, 10 May 2024 19:43:36 GMT

Revision 10 posted to Article by matt.cosenza on 5/10/2024 7:43:36 PM

What is Machine Learning?

Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as IBM's Watson and Google AI respectively.

Appian ML/AI Capabilities

Appian AI Skills offer pre-trained models that use built-in documentation extraction capabilities.

Common Model Types

There are two major categories of model types that are used for making machine learning predictions on structured data:

Regression: predicts a numeric value.
Classification: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
1. Binary classification: the model has only two prediction values to choose from (ex. true and false).
2. Multiclassification: the model has more than two prediction values to choose from.

Regression

Regression models make predictions along a continuous range of numerical values. They have many important use cases (examples below), but can't be used in cases where binary, categorical, or non-numeric values are required without additional processing to the model’s output.
The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). The RMSE represents the standard deviation between predicted and actual values; thus a good RMSE is relative to the range of values you are trying to predict. A perfect model would have a RMSE of 0.
When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.

Regression models can be used to predict:

The sale price of a home, given information about the home's size, number of bedrooms, zip code, etc.
The appropriate salary for a job posting, given information about that job's difficulty and expected characteristics of qualified candidates.
The number of viewers who will watch the premiere of a new TV series, given information about the show's genre and cast.

Binary Classification

Binary classification models predict for a value that has only two possible outcomes.
A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5, then the predicted value will be true. However, machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.
The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.

Binary classification models can be used to predict:

Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.
Whether a loan application should be approved or rejected, given credit details about the applicant.
Whether someone will sign up for a service, given their demographics.
Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account's typical usage patterns.

Multiclassification

Multiclass models predict for a categorical value from a list of three or more discrete, finite possibilities.
A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model's best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.
Since the target attribute's possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.
The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.
Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.
To make predictions from a group of possibilities that is larger than a machine learning tool's limit, consider using a series of different models. For example, to classify animal species from an image, better results can be achieved by first training the model for a more general classification (e.g. feline, canine, rodent). Additional models can be trained to identify specific species.

Multiclass classification models can be used to predict:

Which category of car—sedan, truck or SUV—someone is likely to purchase, given their demographics.
A book's genre, given information about the book's author, length, characters, storyline, etc.

Appian AI Skills Use Case: Email Classification

Model Types Summary

Model	Prediction Type	Common Performance Metrics	Example
Regression	Predicts a numeric value	Root Mean Square Error (RMSE) Mean Absolute Error (MSE)	Predicting a home's sale price
Binary Classification	Predicts binary values (ex. true or false)	Area Under the Curve (AUC)	Predicting whether a job candidate should be offered employment
Multiclass Classification	Predicts values that belong to a limited, predefined set of permissible values	F1 Score Log Loss	Predicting a book's genre

Training Data

Year	Make	Model	Color	Transmission	Mileage	Previous Owners	Sale Price
1997	Ford	Mustang	Silver	Automatic	201,298	3	1,499
2013	Mazda	3	Black	Automatic	60,588	1	8,100
2005	Honda	Element	Red	Automatic	160,378	2	4,760
2009	Toyota	Camry	Blue	Manual	87,380	1	7,290

Best Practices and Tips for Training Data

The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be. This is applicable when a diverse and balanced (e.g. data between class A and B are split equally) set of training data is provided to avoid bias when making predictions. Google has documentation and a video regarding bias and machine learning that is helpful for learning more about this topic.
To the greatest extent possible, provide training data that resembles the data you expect to see in production.
Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool's documentation for more details.
Some tools allow you modify the weight given to specific columns during training, or specify a "time" column if training data values are influenced by time. Read your tool's documentation for more details.

See Also

Websites:

Best Practices for Creating Training Data

Videos:

Tags: Platform, Machine Learning, Architecture

Machine Learning Overview

matt.cosenza — Fri, 10 May 2024 19:28:57 GMT

Revision 9 posted to Article by matt.cosenza on 5/10/2024 7:28:57 PM

What is Machine Learning?

Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as IBM's Watson and Google AI respectively.

Appian ML/AI Capabilities

Appian AI Skills offer pre-trained models that use built-in documentation extraction capabilities.

Common Model Types

There are two major categories of model types that are used for making machine learning predictions on structured data:

Regression: predicts a numeric value.
Classification: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
1. Binary classification: the model has only two prediction values to choose from (ex. true and false).
2. Multiclassification: the model has more than two prediction values to choose from.

Regression

Regression models make predictions along a continuous range of numerical values. They have many important use cases (examples below), but can't be used in cases where binary, categorical, or non-numeric values are required without additional processing to the model’s output.
The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). The RMSE represents the standard deviation between predicted and actual values; thus a good RMSE is relative to the range of values you are trying to predict. A perfect model would have a RMSE of 0.
When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.

Regression models can be used to predict:

The sale price of a home, given information about the home's size, number of bedrooms, zip code, etc.
The appropriate salary for a job posting, given information about that job's difficulty and expected characteristics of qualified candidates.
The number of viewers who will watch the premiere of a new TV series, given information about the show's genre and cast.

Binary Classification

Binary classification models predict for a value that has only two possible outcomes.
A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5, then the predicted value will be true. However, machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.
The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.

Binary classification models can be used to predict:

Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.
Whether a loan application should be approved or rejected, given credit details about the applicant.
Whether someone will sign up for a service, given their demographics.
Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account's typical usage patterns.

Multiclassification

Multiclass models predict for a categorical value from a list of three or more discrete, finite possibilities.
A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model's best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.
Since the target attribute's possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.
The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.
Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.
To make predictions from a group of possibilities that is larger than a machine learning tool's limit, consider using a series of different models. For example, to classify animal species from an image, better results can be achieved by first training the model for a more general classification (e.g. feline, canine, rodent). Additional models can be trained to identify specific species.

Multiclass classification models can be used to predict:

Which category of car—sedan, truck or SUV—someone is likely to purchase, given their demographics.
A book's genre, given information about the book's author, length, characters, storyline, etc.

Appian AI Skills Use Case: Email Classification

Model Types Summary

Model	Prediction Type	Common Performance Metrics	Example
Regression	Predicts a numeric value	Root Mean Square Error (RMSE) Mean Absolute Error (MSE)	Predicting a home's sale price
Binary Classification	Predicts binary values (ex. true or false)	Area Under the Curve (AUC)	Predicting whether a job candidate should be offered employment
Multiclass Classification	Predicts values that belong to a limited, predefined set of permissible values	F1 Score Log Loss	Predicting a book's genre

Training Data

Year	Make	Model	Color	Transmission	Mileage	Previous Owners	Sale Price
1997	Ford	Mustang	Silver	Automatic	201,298	3	1,499
2013	Mazda	3	Black	Automatic	60,588	1	8,100
2005	Honda	Element	Red	Automatic	160,378	2	4,760
2009	Toyota	Camry	Blue	Manual	87,380	1	7,290

Best Practices and Tips for Training Data

The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be. This is applicable when a diverse and balanced (e.g. data between class A and B are splitted equally) set of training data is provided to avoid bias when making predictions. Google has documentation and a video regarding bias and machine learning that is helpful for learning more about this topic.
To the greatest extent possible, provide training data that resembles the data you expect to see in production.
Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool's documentation for more details.
Some tools allow you modify the weight given to specific columns during training, or specify a "time" column if training data values are influenced by time. Read your tool's documentation for more details.

See Also

Websites:

Best Practices for Creating Training Data

Videos:

Tags: Platform, Machine Learning, Architecture

Machine Learning Overview

Appian Max Team — Tue, 23 Apr 2024 13:08:37 GMT

Revision 8 posted to Article by Appian Max Team on 4/23/2024 1:08:37 PM

Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. For a list of the machine learning integrations that have been written about in detail in Appian's documentation, refer to the articles below:

What is Machine Learning?

Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning models can be thought of as black boxes that are created by processing many observations with known outcomes. These models are then able to take in one or many observations without a known outcome and produce possible outcomes and their probabilities.

There are many different uses and applications for machine learning, but this article currently focuses on machine learning technology that analyzes structured data—such as rows of an Excel spreadsheet or an Appian CDT—and delivers a prediction for a specific field or column in the data. This feature, value or attribute that is being predicted for is often referred to as the target.

Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as IBM's Watson and Google's AutoML Vision, respectively.

Common Model Types

There are two major categories of model types that are used for making machine learning predictions on structured data:

Regression: predicts a numeric value.
Classification: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
1. Binary classification: the model has only two prediction values to choose from (ex. true and false).
2. Multiclassification: the model has more than two prediction values to choose from.

Regression

Use a regression model when you want to predict for a numerical value that is not constrained to a finite or particular list of values.
The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). A perfect model would have a RMSE of 0. The RMSE represents the standard deviation between predicted and actual values; thus good values are relative to your value ranges you are trying to predict.
When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.

Regression models can be used to predict:

The sale price of a home, given information about the home's size, number of bedrooms, zip code, etc.
The appropriate salary for a job posting, given information about that job's difficulty and expected characteristics of qualified candidates.
The number of viewers who will watch the premiere of a new TV series, given information about the show's genre and cast.

Binary Classification

Use a binary classification model when you want to predict for a value that has only two possible outcomes.
A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5 than the predicted value will be true, but machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.
The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.

Binary classification models can be used to predict:

Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.
Whether a loan application should be approved or rejected, given credit details about the applicant.
Whether someone will sign up for a service, given their demographics.
Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account's typical usage patterns.

Multiclassification

Use a multiclass model when you want to predict for a value that can take on a single categorical value from among a list of three or more discrete, finite possibilities.
A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model's best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.
Since the target attribute's possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.
The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.
Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.
To use machine learning to make predictions from a group of possibilities that is larger a tool's limit, consider using series of different models. For example, imagine a car dealership that sells 75 different minivans, 50 different convertibles, and 100 different sedans. You may not be able to create one model to predict one of the 225 cars, but you could create a model to predict which type of car the customer is likely to buy (minivan, convertible, or sedan) and then one model for each type of car to predict the particular minivan, convertible, or sedan.

Multiclass classification models can be used to predict:

Which category of car—sedan, truck or SUV—someone is likely to purchase, given their demographics.
A book's genre, given information about the book's author, length, characters, storyline, etc.

Model Types Summary

Model	Prediction Type	Common Performance Metrics	Example
Regression	Predicts a numeric value	Root Mean Square Error (RMSE) Mean Absolute Error (MSE)	Predicting a home's sale price
Binary Classification	Predicts binary values (ex. true or false)	Area Under the Curve (AUC)	Predicting whether a job candidate should be offered employment
Multiclass Classification	Predicts values that belong to a limited, predefined set of permissible values	F1 Score Log Loss	Predicting a book's genre

Training Data

To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values and the target attribute. This training data is the means by which the model understands and recognizes patterns about the data for which you ask it to make predictions. Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car. In this use case, the column marked "Sale Price" would be identified to the model as the target attribute to predict for.

Year	Make	Model	Color	Transmission	Mileage	Previous Owners	Sale Price
1997	Ford	Mustang	Silver	Automatic	201,298	3	1,499
2013	Mazda	3	Black	Automatic	60,588	1	8,100
2005	Honda	Element	Red	Automatic	160,378	2	4,760
2009	Toyota	Camry	Blue	Manual	87,380	1	7,290

Best Practices and Tips for Training Data

The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be.
To the greatest extent possible, provide training data that resembles the data you expect to see in production.
Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool's documentation for more details.
Some tools allow you modify the weight given to specific columns during training, or specify a "time" column if training data values are influenced by time. Read your tool's documentation for more details.
Models trained with skewed or unrepresentative data can result in unwanted bias when making predictions. Google has documentation and a video regarding bias and machine learning that is helpful for learning more about this topic.

See Also

Websites:

Best Practices for Creating Training Data

Videos:

Tags: Platform, Machine Learning, Architecture

Machine Learning Overview

Appian Max Team — Thu, 22 Feb 2024 18:39:22 GMT

Revision 7 posted to Article by Appian Max Team on 2/22/2024 6:39:22 PM

Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. For a list of the machine learning integrations that have been written about in detail in Appian's documentation, refer to the articles below:

What is Machine Learning?

Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning models can be thought of as black boxes that are created by processing many observations with known outcomes. These models are then able to take in one or many observations without a known outcome and produce possible outcomes and their probabilities.

Common Model Types

There are two major categories of model types that are used for making machine learning predictions on structured data:

Regression: predicts a numeric value.
Classification: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
1. Binary classification: the model has only two prediction values to choose from (ex. true and false).
2. Multiclassification: the model has more than two prediction values to choose from.

Regression

Use a regression model when you want to predict for a numerical value that is not constrained to a finite or particular list of values.
The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). A perfect model would have a RMSE of 0. The RMSE represents the standard deviation between predicted and actual values; thus good values are relative to your value ranges you are trying to predict.
When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.

Regression models can be used to predict:

The sale price of a home, given information about the home's size, number of bedrooms, zip code, etc.
The appropriate salary for a job posting, given information about that job's difficulty and expected characteristics of qualified candidates.
The number of viewers who will watch the premiere of a new TV series, given information about the show's genre and cast.

Binary Classification

Use a binary classification model when you want to predict for a value that has only two possible outcomes.
A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5 than the predicted value will be true, but machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.
The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.

Binary classification models can be used to predict:

Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.
Whether a loan application should be approved or rejected, given credit details about the applicant.
Whether someone will sign up for a service, given their demographics.
Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account's typical usage patterns.

Multiclassification

Use a multiclass model when you want to predict for a value that can take on a single categorical value from among a list of three or more discrete, finite possibilities.
A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model's best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.
Since the target attribute's possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.
The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.
Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.
To use machine learning to make predictions from a group of possibilities that is larger a tool's limit, consider using series of different models. For example, imagine a car dealership that sells 75 different minivans, 50 different convertibles, and 100 different sedans. You may not be able to create one model to predict one of the 225 cars, but you could create a model to predict which type of car the customer is likely to buy (minivan, convertible, or sedan) and then one model for each type of car to predict the particular minivan, convertible, or sedan.

Multiclass classification models can be used to predict:

Which category of car—sedan, truck or SUV—someone is likely to purchase, given their demographics.
A book's genre, given information about the book's author, length, characters, storyline, etc.

Model Types Summary

Model	Prediction Type	Common Performance Metrics	Example
Regression	Predicts a numeric value	Root Mean Square Error (RMSE) Mean Absolute Error (MSE)	Predicting a home's sale price
Binary Classification	Predicts binary values (ex. true or false)	Area Under the Curve (AUC)	Predicting whether a job candidate should be offered employment
Multiclass Classification	Predicts values that belong to a limited, predefined set of permissible values	F1 Score Log Loss	Predicting a book's genre

Training Data

Year	Make	Model	Color	Transmission	Mileage	Previous Owners	Sale Price
1997	Ford	Mustang	Silver	Automatic	201,298	3	1,499
2013	Mazda	3	Black	Automatic	60,588	1	8,100
2005	Honda	Element	Red	Automatic	160,378	2	4,760
2009	Toyota	Camry	Blue	Manual	87,380	1	7,290

Best Practices and Tips for Training Data

The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be.
To the greatest extent possible, provide training data that resembles the data you expect to see in production.
Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool's documentation for more details.
Some tools allow you modify the weight given to specific columns during training, or specify a "time" column if training data values are influenced by time. Read your tool's documentation for more details.
Models trained with skewed or unrepresentative data can result in unwanted bias when making predictions. Google has documentation and a video regarding bias and machine learning that is helpful for learning more about this topic.

See Also

Websites:

Best Practices for Creating Training Data

Videos:

Tags: Platform, Machine Learning, Architecture

Machine Learning Overview

Appian Max Team — Thu, 22 Feb 2024 18:35:37 GMT

Revision 6 posted to Article by Appian Max Team on 2/22/2024 6:35:37 PM

Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. For Amazon machine learning integrations, refer to Integrating with Amazon Machine Learning.

What is Machine Learning?

There are many different use cases and applications for machine learning. This article mainly focuses on machine learning technology that analyzes structured data, such as rows of an Excel spreadsheet or an Appian CDT, and delivers a prediction for a specific field or column in the data. The feature, value, or attribute that is being predicted for is often referred to as the target. Within the context of Appian, we’ll dive into the practical implementation of AI features that integrate with applications.

Common Model Types

There are two major categories of model types that are used for making machine learning predictions on structured data:

Regression: predicts a numeric value.
Classification: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
1. Binary classification: the model has only two prediction values to choose from (ex. true and false).
2. Multiclassification: the model has more than two prediction values to choose from.

Regression

Use a regression model when you want to predict for a numerical value that is not constrained to a finite or particular list of values.
The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). A perfect model would have a RMSE of 0. The RMSE represents the standard deviation between predicted and actual values; thus good values are relative to your value ranges you are trying to predict.
When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.

Regression models can be used to predict:

The sale price of a home, given information about the home's size, number of bedrooms, zip code, etc.
The appropriate salary for a job posting, given information about that job's difficulty and expected characteristics of qualified candidates.
The number of viewers who will watch the premiere of a new TV series, given information about the show's genre and cast.

Binary Classification

Use a binary classification model when you want to predict for a value that has only two possible outcomes.
A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5 than the predicted value will be true, but machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.
The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.

Binary classification models can be used to predict:

Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.
Whether a loan application should be approved or rejected, given credit details about the applicant.
Whether someone will sign up for a service, given their demographics.
Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account's typical usage patterns.

Multiclassification

Use a multiclass model when you want to predict for a value that can take on a single categorical value from among a list of three or more discrete, finite possibilities.
A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model's best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.
Since the target attribute's possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.
The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.
Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.
To use machine learning to make predictions from a group of possibilities that is larger a tool's limit, consider using series of different models. For example, imagine a car dealership that sells 75 different minivans, 50 different convertibles, and 100 different sedans. You may not be able to create one model to predict one of the 225 cars, but you could create a model to predict which type of car the customer is likely to buy (minivan, convertible, or sedan) and then one model for each type of car to predict the particular minivan, convertible, or sedan.

Multiclass classification models can be used to predict:

Which category of car—sedan, truck or SUV—someone is likely to purchase, given their demographics.
A book's genre, given information about the book's author, length, characters, storyline, etc.

Model Types Summary

Model	Prediction Type	Common Performance Metrics	Example
Regression	Predicts a numeric value	Root Mean Square Error (RMSE) Mean Absolute Error (MSE)	Predicting a home's sale price
Binary Classification	Predicts binary values (ex. true or false)	Area Under the Curve (AUC)	Predicting whether a job candidate should be offered employment
Multiclass Classification	Predicts values that belong to a limited, predefined set of permissible values	F1 Score Log Loss	Predicting a book's genre

Training Data

Year	Make	Model	Color	Transmission	Mileage	Previous Owners	Sale Price
1997	Ford	Mustang	Silver	Automatic	201,298	3	1,499
2013	Mazda	3	Black	Automatic	60,588	1	8,100
2005	Honda	Element	Red	Automatic	160,378	2	4,760
2009	Toyota	Camry	Blue	Manual	87,380	1	7,290

Best Practices and Tips for Training Data

The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be.
To the greatest extent possible, provide training data that resembles the data you expect to see in production.
Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool's documentation for more details.
Some tools allow you modify the weight given to specific columns during training, or specify a "time" column if training data values are influenced by time. Read your tool's documentation for more details.
Models trained with skewed or unrepresentative data can result in unwanted bias when making predictions. Google has documentation and a video regarding bias and machine learning that is helpful for learning more about this topic.

See Also

Websites:

Best Practices for Creating Training Data

Videos:

Tags: Platform, Machine Learning, Architecture

Machine Learning Overview

Appian Max Team — Thu, 22 Feb 2024 18:35:07 GMT

Revision 5 posted to Article by Appian Max Team on 2/22/2024 6:35:07 PM

Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. For Amazon machine learning integrations, refer to Integrating with Amazon Machine Learning.

What is Machine Learning?

There are many different use cases and applications for machine learning. This article mainly focuses on machine learning technology that analyzes structured data, such as rows of an Excel spreadsheet or an Appian CDT, and delivers a prediction for a specific field or column in the data. The feature, value, or attribute that is being predicted for is often referred to as the target. Within the context of Appian, we’ll dive into the practical implementation of AI features that integrate with applications.

Common Model Types

There are two major categories of model types that are used for making machine learning predictions on structured data:

Regression: predicts a numeric value.
Classification: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
1. Binary classification: the model has only two prediction values to choose from (ex. true and false).
2. Multiclassification: the model has more than two prediction values to choose from.

Regression

Use a regression model when you want to predict for a numerical value that is not constrained to a finite or particular list of values.
The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). A perfect model would have a RMSE of 0. The RMSE represents the standard deviation between predicted and actual values; thus good values are relative to your value ranges you are trying to predict.
When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.

Regression models can be used to predict:

The sale price of a home, given information about the home's size, number of bedrooms, zip code, etc.
The appropriate salary for a job posting, given information about that job's difficulty and expected characteristics of qualified candidates.
The number of viewers who will watch the premiere of a new TV series, given information about the show's genre and cast.

Binary Classification

Use a binary classification model when you want to predict for a value that has only two possible outcomes.
A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5 than the predicted value will be true, but machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.
The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.

Binary classification models can be used to predict:

Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.
Whether a loan application should be approved or rejected, given credit details about the applicant.
Whether someone will sign up for a service, given their demographics.
Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account's typical usage patterns.

Multiclassification

Use a multiclass model when you want to predict for a value that can take on a single categorical value from among a list of three or more discrete, finite possibilities.
A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model's best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.
Since the target attribute's possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.
The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.
Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.
To use machine learning to make predictions from a group of possibilities that is larger a tool's limit, consider using series of different models. For example, imagine a car dealership that sells 75 different minivans, 50 different convertibles, and 100 different sedans. You may not be able to create one model to predict one of the 225 cars, but you could create a model to predict which type of car the customer is likely to buy (minivan, convertible, or sedan) and then one model for each type of car to predict the particular minivan, convertible, or sedan.

Multiclass classification models can be used to predict:

Which category of car—sedan, truck or SUV—someone is likely to purchase, given their demographics.
A book's genre, given information about the book's author, length, characters, storyline, etc.

Model Types Summary

Model	Prediction Type	Common Performance Metrics	Example
Regression	Predicts a numeric value	Root Mean Square Error (RMSE) Mean Absolute Error (MSE)	Predicting a home's sale price
Binary Classification	Predicts binary values (ex. true or false)	Area Under the Curve (AUC)	Predicting whether a job candidate should be offered employment
Multiclass Classification	Predicts values that belong to a limited, predefined set of permissible values	F1 Score Log Loss	Predicting a book's genre

Training Data

Year	Make	Model	Color	Transmission	Mileage	Previous Owners	Sale Price
1997	Ford	Mustang	Silver	Automatic	201,298	3	1,499
2013	Mazda	3	Black	Automatic	60,588	1	8,100
2005	Honda	Element	Red	Automatic	160,378	2	4,760
2009	Toyota	Camry	Blue	Manual	87,380	1	7,290

Best Practices and Tips for Training Data

The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be.
To the greatest extent possible, provide training data that resembles the data you expect to see in production.
Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool's documentation for more details.
Some tools allow you modify the weight given to specific columns during training, or specify a "time" column if training data values are influenced by time. Read your tool's documentation for more details.
Models trained with skewed or unrepresentative data can result in unwanted bias when making predictions. Google has documentation and a video regarding bias and machine learning that is helpful for learning more about this topic.

See Also

Websites:

Best Practices for Creating Training Data

Videos:

Tags: Platform, Machine Learning, Architecture

Machine Learning Overview

joel.larin — Tue, 31 Oct 2023 19:44:53 GMT

Revision 4 posted to Article by joel.larin on 10/31/2023 7:44:53 PM

Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. For a list of the machine learning integrations that have been written about in detail in Appian's documentation, refer to the articles below:

What is Machine Learning?

Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning models can be thought of as black boxes that are created by processing many observations with known outcomes. These models are then able to take in one or many observations without a known outcome and produce possible outcomes and their probabilities.

Common Model Types

There are two major categories of model types that are used for making machine learning predictions on structured data:

Regression: predicts a numeric value.
Classification: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
1. Binary classification: the model has only two prediction values to choose from (ex. true and false).
2. Multiclassification: the model has more than two prediction values to choose from.

Regression

Use a regression model when you want to predict for a numerical value that is not constrained to a finite or particular list of values.
The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). A perfect model would have a RMSE of 0. The RMSE represents the standard deviation between predicted and actual values; thus good values are relative to your value ranges you are trying to predict.
When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.

Regression models can be used to predict:

The sale price of a home, given information about the home's size, number of bedrooms, zip code, etc.
The appropriate salary for a job posting, given information about that job's difficulty and expected characteristics of qualified candidates.
The number of viewers who will watch the premiere of a new TV series, given information about the show's genre and cast.

Binary Classification

Use a binary classification model when you want to predict for a value that has only two possible outcomes.
A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5 than the predicted value will be true, but machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.
The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.

Binary classification models can be used to predict:

Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.
Whether a loan application should be approved or rejected, given credit details about the applicant.
Whether someone will sign up for a service, given their demographics.
Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account's typical usage patterns.

Multiclassification

Use a multiclass model when you want to predict for a value that can take on a single categorical value from among a list of three or more discrete, finite possibilities.
A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model's best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.
Since the target attribute's possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.
The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.
Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.
To use machine learning to make predictions from a group of possibilities that is larger a tool's limit, consider using series of different models. For example, imagine a car dealership that sells 75 different minivans, 50 different convertibles, and 100 different sedans. You may not be able to create one model to predict one of the 225 cars, but you could create a model to predict which type of car the customer is likely to buy (minivan, convertible, or sedan) and then one model for each type of car to predict the particular minivan, convertible, or sedan.

Multiclass classification models can be used to predict:

Which category of car—sedan, truck or SUV—someone is likely to purchase, given their demographics.
A book's genre, given information about the book's author, length, characters, storyline, etc.

Model Types Summary

Model	Prediction Type	Common Performance Metrics	Example
Regression	Predicts a numeric value	Root Mean Square Error (RMSE) Mean Absolute Error (MSE)	Predicting a home's sale price
Binary Classification	Predicts binary values (ex. true or false)	Area Under the Curve (AUC)	Predicting whether a job candidate should be offered employment
Multiclass Classification	Predicts values that belong to a limited, predefined set of permissible values	F1 Score Log Loss	Predicting a book's genre

Training Data

Year	Make	Model	Color	Transmission	Mileage	Previous Owners	Sale Price
1997	Ford	Mustang	Silver	Automatic	201,298	3	1,499
2013	Mazda	3	Black	Automatic	60,588	1	8,100
2005	Honda	Element	Red	Automatic	160,378	2	4,760
2009	Toyota	Camry	Blue	Manual	87,380	1	7,290

Best Practices and Tips for Training Data

The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be.
To the greatest extent possible, provide training data that resembles the data you expect to see in production.
Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool's documentation for more details.
Some tools allow you modify the weight given to specific columns during training, or specify a "time" column if training data values are influenced by time. Read your tool's documentation for more details.
Models trained with skewed or unrepresentative data can result in unwanted bias when making predictions. Google has documentation and a video regarding bias and machine learning that is helpful for learning more about this topic.

See Also

Websites:

Best Practices for Creating Training Data

Videos:

Tags: Platform, Machine Learning, Architecture

Machine Learning Overview

joel.larin — Tue, 31 Oct 2023 19:38:42 GMT

Revision 3 posted to Article by joel.larin on 10/31/2023 7:38:42 PM

Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. For a list of the machine learning integrations that have been written about in detail in Appian's documentation, refer to the articles below:

What is Machine Learning?

Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning models can be thought of as black boxes that are created by processing many observations with known outcomes. These models are then able to take in one or many observations without a known outcome and produce possible outcomes and their probabilities.

Common Model Types

There are two major categories of model types that are used for making machine learning predictions on structured data:

Regression: predicts a numeric value.
Classification: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
1. Binary classification: the model has only two prediction values to choose from (ex. true and false).
2. Multiclassification: the model has more than two prediction values to choose from.

Regression

Use a regression model when you want to predict for a numerical value that is not constrained to a finite or particular list of values.
The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). A perfect model would have a RMSE of 0. The RMSE represents the standard deviation between predicted and actual values; thus good values are relative to your value ranges you are trying to predict.
When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.

Regression models can be used to predict:

The sale price of a home, given information about the home's size, number of bedrooms, zip code, etc.
The appropriate salary for a job posting, given information about that job's difficulty and expected characteristics of qualified candidates.
The number of viewers who will watch the premiere of a new TV series, given information about the show's genre and cast.

Binary Classification

Use a binary classification model when you want to predict for a value that has only two possible outcomes.
A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5 than the predicted value will be true, but machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.
The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.

Binary classification models can be used to predict:

Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.
Whether a loan application should be approved or rejected, given credit details about the applicant.
Whether someone will sign up for a service, given their demographics.
Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account's typical usage patterns.

Multiclassification

Use a multiclass model when you want to predict for a value that can take on a single categorical value from among a list of three or more discrete, finite possibilities.
A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model's best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.
Since the target attribute's possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.
The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.
Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.
To use machine learning to make predictions from a group of possibilities that is larger a tool's limit, consider using series of different models. For example, imagine a car dealership that sells 75 different minivans, 50 different convertibles, and 100 different sedans. You may not be able to create one model to predict one of the 225 cars, but you could create a model to predict which type of car the customer is likely to buy (minivan, convertible, or sedan) and then one model for each type of car to predict the particular minivan, convertible, or sedan.

Multiclass classification models can be used to predict:

Which category of car—sedan, truck or SUV—someone is likely to purchase, given their demographics.
A book's genre, given information about the book's author, length, characters, storyline, etc.

Model Types Summary

Model	Prediction Type	Common Performance Metrics	Example
Regression	Predicts a numeric value	Root Mean Square Error (RMSE) Mean Absolute Error (MSE)	Predicting a home's sale price
Binary Classification	Predicts binary values (ex. true or false)	Area Under the Curve (AUC)	Predicting whether a job candidate should be offered employment
Multiclass Classification	Predicts values that belong to a limited, predefined set of permissible values	F1 Score Log Loss	Predicting a book's genre

Training Data

Year	Make	Model	Color	Transmission	Mileage	Previous Owners	Sale Price
1997	Ford	Mustang	Silver	Automatic	201,298	3	1,499
2013	Mazda	3	Black	Automatic	60,588	1	8,100
2005	Honda	Element	Red	Automatic	160,378	2	4,760
2009	Toyota	Camry	Blue	Manual	87,380	1	7,290

Best Practices and Tips for Training Data

The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be.
To the greatest extent possible, provide training data that resembles the data you expect to see in production.
Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool's documentation for more details.
Some tools allow you modify the weight given to specific columns during training, or specify a "time" column if training data values are influenced by time. Read your tool's documentation for more details.
Models trained with skewed or unrepresentative data can result in unwanted bias when making predictions. Google has documentation and a video regarding bias and machine learning that is helpful for learning more about this topic.

See Also

Websites:

Best Practices for Creating Training Data

Videos:

Tags: Platform, Machine Learning, Architecture

Machine Learning Overview

joel.larin — Tue, 31 Oct 2023 19:34:06 GMT

Revision 2 posted to Article by joel.larin on 10/31/2023 7:34:06 PM

Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. For a list of the machine learning integrations that have been written about in detail in Appian's documentation, refer to the articles below:

What is Machine Learning?

Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning models can be thought of as black boxes that are created by processing many observations with known outcomes. These models are then able to take in one or many observations without a known outcome and produce possible outcomes and their probabilities.

Common Model Types

There are two major categories of model types that are used for making machine learning predictions on structured data:

Regression: predicts a numeric value.
Classification: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
1. Binary classification: the model has only two prediction values to choose from (ex. true and false).
2. Multiclassification: the model has more than two prediction values to choose from.

Regression

Use a regression model when you want to predict for a numerical value that is not constrained to a finite or particular list of values.
The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). A perfect model would have a RMSE of 0. The RMSE represents the standard deviation between predicted and actual values; thus good values are relative to your value ranges you are trying to predict.
When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.

Regression models can be used to predict:

The sale price of a home, given information about the home's size, number of bedrooms, zip code, etc.
The appropriate salary for a job posting, given information about that job's difficulty and expected characteristics of qualified candidates.
The number of viewers who will watch the premiere of a new TV series, given information about the show's genre and cast.

Binary Classification

Use a binary classification model when you want to predict for a value that has only two possible outcomes.
A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5 than the predicted value will be true, but machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.
The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.

Binary classification models can be used to predict:

Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.
Whether a loan application should be approved or rejected, given credit details about the applicant.
Whether someone will sign up for a service, given their demographics.
Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account's typical usage patterns.

Multiclassification

Use a multiclass model when you want to predict for a value that can take on a single categorical value from among a list of three or more discrete, finite possibilities.
A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model's best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.
Since the target attribute's possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.
The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.
Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.
To use machine learning to make predictions from a group of possibilities that is larger a tool's limit, consider using series of different models. For example, imagine a car dealership that sells 75 different minivans, 50 different convertibles, and 100 different sedans. You may not be able to create one model to predict one of the 225 cars, but you could create a model to predict which type of car the customer is likely to buy (minivan, convertible, or sedan) and then one model for each type of car to predict the particular minivan, convertible, or sedan.

Multiclass classification models can be used to predict:

Which category of car—sedan, truck or SUV—someone is likely to purchase, given their demographics.
A book's genre, given information about the book's author, length, characters, storyline, etc.

Model Types Summary

Model	Prediction Type	Common Performance Metrics	Example
Regression	Predicts a numeric value	Root Mean Square Error (RMSE) Mean Absolute Error (MSE)	Predicting a home's sale price
Binary Classification	Predicts binary values (ex. true or false)	Area Under the Curve (AUC)	Predicting whether a job candidate should be offered employment
Multiclass Classification	Predicts values that belong to a limited, predefined set of permissible values	F1 Score Log Loss	Predicting a book's genre

Training Data

Year	Make	Model	Color	Transmission	Mileage	Previous Owners	Sale Price
1997	Ford	Mustang	Silver	Automatic	201,298	3	1,499
2013	Mazda	3	Black	Automatic	60,588	1	8,100
2005	Honda	Element	Red	Automatic	160,378	2	4,760
2009	Toyota	Camry	Blue	Manual	87,380	1	7,290

Best Practices and Tips for Training Data

The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be.
To the greatest extent possible, provide training data that resembles the data you expect to see in production.
Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool's documentation for more details.
Some tools allow you modify the weight given to specific columns during training, or specify a "time" column if training data values are influenced by time. Read your tool's documentation for more details.
Models trained with skewed or unrepresentative data can result in unwanted bias when making predictions. Google has documentation and a video regarding bias and machine learning that is helpful for learning more about this topic.

See Also

Websites:

Best Practices for Creating Training Data

Videos:

Machine Learning Overview

joel.larin — Wed, 18 Oct 2023 13:43:05 GMT

Revision 1 posted to Article by joel.larin on 10/18/2023 1:43:05 PM

fdsa