<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="https://community.appian.com/cfs-file/__key/system/syndication/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview</link><description /><dc:language>en-US</dc:language><generator>Telligent Community 12</generator><item><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview</link><pubDate>Fri, 14 Jun 2024 18:53:20 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:3439a4de-33fa-4d77-939f-3e5561eecd3a</guid><dc:creator>joel.larin</dc:creator><comments>https://community.appian.com/success/w/article/3392/machine-learning-overview#comments</comments><description>Current Revision posted to Article by joel.larin on 6/14/2024 6:53:20 PM&lt;br /&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. Refer to the &lt;/span&gt;&lt;a href="/success/w/guide/3407/integrating-with-amazon-machine-learning"&gt;&lt;span style="font-weight:400;"&gt;article&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; for Amazon machine learning integrations that have been written about in detail in Appian&amp;#39;s documentation.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk0"&gt;&lt;b&gt;What is Machine Learning?&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning is a subset of AI that focuses on the development of algorithms and models that enable computers to learn and make decisions based on data. The models can be thought of as black boxes that are created by processing many observations both supervised and semi-supervised. These machine learning models are then able to take in one or many observations without a known outcome and produce possible outcomes based on their probabilities.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;There are many different use cases and applications for machine learning. This article mainly focuses on machine learning technology that analyzes structured data, such as rows of an Excel spreadsheet or an Appian CDT, and delivers a prediction for a specific field or column in the data. The feature, value, or attribute that is being predicted for is often referred to as the &lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;target&lt;/span&gt;&lt;/i&gt;&lt;span style="font-weight:400;"&gt;. Within the context of Appian, we&amp;rsquo;ll dive into the practical implementation of AI features that integrate with applications.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as &lt;/span&gt;&lt;a href="https://www.ibm.com/watson/services/natural-language-understanding/?cm_mmc=Search_Google-_-Watson+AI_Watson+Core+-+Platform-_-WW_NA-_-watson%20sentiment%20analysis_e&amp;amp;cm_mmca1=000036IU&amp;amp;cm_mmca2=10010583&amp;amp;cm_mmca7=9007770&amp;amp;cm_mmca8=kwd-309710612366&amp;amp;cm_mmca9=_k_EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE_k_&amp;amp;cm_mmca10=338427417319&amp;amp;cm_mmca11=e&amp;amp;gclid=EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE"&gt;&lt;span style="font-weight:400;"&gt;IBM&amp;#39;s Watson&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/document-ai"&gt;&lt;span style="font-weight:400;"&gt;Google AI&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; respectively.&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk1"&gt;&lt;b&gt;Appian ML/AI Capabilities&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://docs.appian.com/suite/help/latest/ai-skill-object.html"&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; facilitate the integration of machine learning and AI capabilities into your application. This is done using a variety of low-code design objects, functions and smart services. Features available within Appian AI Skills include document and email classification with custom-built models, and document extraction with pre-trained models.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Classification models can be custom-built, including being trained and tested using data that will accurately reflect your use case. The &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/create-skill-doc-extraction.html"&gt;&lt;span style="font-weight:400;"&gt;Document Extraction&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; AI skill identifies data from PDF documents, extracting and saving data into key-value pairs that can be used within the application or saved within a database.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills offer pre-trained models that use built-in documentation extraction capabilities.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;&amp;nbsp;Pre-trained models in Appian are designed for general use cases and are used in documents that have similar information and labeled values (e.g. structured or semi-structured documents). Incorporating Google AI functionalities into your Appian application enables the integration of various features, including but not limited to natural language processing, translation services, cloud-based storage, and mor&lt;/span&gt;&lt;span style="font-weight:400;"&gt;e&lt;/span&gt;&lt;span style="font-weight:400;"&gt;. See &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/using-google.html"&gt;&lt;span style="font-weight:400;"&gt;Using Google AI Services&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; for a full list of features available.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;Note that starting from January 23, 2024, Appian is no longer selling Appian-provisioned Google credentials to customers. Customers have to purchase the license directly through Google and add their Google credentials to their Appian Admin console&lt;/span&gt;&lt;/i&gt;&lt;span style="font-weight:400;"&gt;.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/appian-ai-copilot.html"&gt;&lt;span style="font-weight:400;"&gt;AI Copilot&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; is a starting point to further AI capabilities using Appian. AI Copilot utilizes generative AI to create functional interfaces by generating an initial interface from the fields in your form through a simple pdf upload. AI Copilot is integrated with Azure OpenAI to enable this functionally in your application. Azure OpenAI leverages generative AI models (e.g. gpt-3, codex, dall-e, chatgpt) to provide writing assistance, content generation, etc. You can use AI Copilot to build interfaces directly from a pdf, resulting in a personalized product that can be further customized according to your specific requirements once the initial interface is generated.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Machine learning, particularly deep learning, is one of the fundamental components of generative AI. Similarly to other machine learning models, generative AI models undergo training with large amounts of data that aids in identifying inherent patterns. The generative AI model is fine-tuned and enhanced with the introduction of more data over time. Leveraging AI with Appian allows you to automate repetitive tasks and simplify processes, streamlining development and increasing efficiency and productivity.&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk2"&gt;&lt;b&gt;Common Model Types&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;There are two major categories of model types that are used for making machine learning predictions on structured data:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Regression&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: predicts a numeric value.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:&lt;/span&gt;
&lt;ol&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Binary classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: the model has only two prediction values to choose from (ex. true and false).&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Multiclassification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: the model has more than two prediction values to choose from.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Which type you utilize is dependent on the target attribute you want to predict for and your overall objective in creating the model. Read the sections below to learn more about the purpose of each model type and see examples describing appropriate uses of each one.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Regression&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Regression models make predictions along a continuous range of numerical values. They have many important use cases (examples below), but can&amp;#39;t be used in cases where binary, categorical, or non-numeric values are required without additional processing to the model&amp;rsquo;s output.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). The RMSE represents the standard deviation between predicted and actual values; thus a good RMSE is&amp;nbsp; relative to the range of values&amp;nbsp; you are trying to predict. A perfect model would have a RMSE of 0.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Regression models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The sale price of a home, given information about the home&amp;#39;s size, number of bedrooms, zip code, etc.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The appropriate salary for a job posting, given information about that job&amp;#39;s difficulty and expected characteristics of qualified candidates.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The number of viewers who will watch the premiere of a new TV series, given information about the show&amp;#39;s genre and cast.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Binary Classification&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Binary classification models predict for a value that has only two possible outcomes.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5, then the predicted value will be true. However, machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to evaluate performance of a binary classification model is &lt;/span&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;&lt;span style="font-weight:400;"&gt;Area Under the Curve (AUC)&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt;. The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Binary classification models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a loan application should be approved or rejected, given credit details about the applicant.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether someone will sign up for a service, given their demographics.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account&amp;#39;s typical usage patterns.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Multiclassification&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Multiclass models predict for a categorical value from a list of three or more discrete, finite possibilities.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model&amp;#39;s best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Since the target attribute&amp;#39;s possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to determine the accuracy of a multiclass model is called an &lt;/span&gt;&lt;a href="https://www.v7labs.com/blog/f1-score-guide"&gt;&lt;span style="font-weight:400;"&gt;F1 score&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt;. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;To make predictions from a group of possibilities that is larger than a machine learning tool&amp;#39;s limit, consider using a series of different models. For example, to classify animal species from an image, better results can be achieved by first training the model for a more general classification (e.g. feline, canine, rodent). Additional models can be trained to identify specific species.&amp;nbsp;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Multiclass classification models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Which category of car&amp;mdash;sedan, truck or SUV&amp;mdash;someone is likely to purchase, given their demographics.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A book&amp;#39;s genre, given information about the book&amp;#39;s author, length, characters, storyline, etc.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills Use Case: &lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;Email Classification&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;The client receives thousands of emails everyday for customer support. Employees manually forward these emails to appropriate departments and locations based on a review of the email description and the customer&amp;#39;s location. This process is time consuming and prone to human error. The client can automate this process using the &lt;/span&gt;&lt;b&gt;Email Classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt; AI Skill that combines machine learning and automation. For the new model to be effective, the client must upload a &amp;quot;training set&amp;quot; consisting of a diverse set of emails which includes multiple examples for all desired email routing options.Once the model is trained and tested, the client can publish the model to make it available for use through the &lt;/span&gt;&lt;b&gt;Classify Emails&lt;/b&gt;&lt;span style="font-weight:400;"&gt; smart service&lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Model Types Summary&lt;/b&gt;&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Model&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Prediction Type&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Common Performance Metrics&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Example&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Regression&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts a numeric value&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://agrimetsoft.com/data-tool.aspx#rmse"&gt;&lt;span style="font-weight:400;"&gt;Root Mean Square Error (RMSE)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://agrimetsoft.com/data-tool.aspx#mae"&gt;&lt;span style="font-weight:400;"&gt;Mean Absolute Error (MSE)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting a home&amp;#39;s sale price&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Binary Classification&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts binary values (ex. true or false)&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;&lt;span style="font-weight:400;"&gt;Area Under the Curve (AUC)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting whether a job candidate should be offered employment&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Multiclass Classification&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts values that belong to a limited, predefined set of permissible values&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/F1_score"&gt;&lt;span style="font-weight:400;"&gt;F1 Score&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Loss_functions_for_classification#Logistic_loss"&gt;&lt;span style="font-weight:400;"&gt;Log Loss&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting a book&amp;#39;s genre&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="mcetoc_1hthvnjgk3"&gt;&lt;b&gt;Training Data&amp;nbsp;&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values of input data and the target attribute. The model ultimately applies the associations and patterns it found in the training data to make predictions for novel input data. There is a common adage that a model is &amp;ldquo;only as good as its training data&amp;#39;&amp;#39;. If the training data is not a representative sample of the data against which it will be making predictions, the model&amp;rsquo;s performance will suffer.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car. In this use case, the column marked &amp;quot;Sale Price&amp;quot; would be identified to the model as the target attribute to predict for.&lt;/span&gt;&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Year&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Make&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Model&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Color&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Transmission&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Mileage&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Previous Owners&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Sale Price&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1997&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Ford&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Mustang&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Silver&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;201,298&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;3&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1,499&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2013&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Mazda&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;3&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Black&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;60,588&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;8,100&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2005&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Honda&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Element&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Red&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;160,378&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;4,760&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2009&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Toyota&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Camry&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Blue&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Manual&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;87,380&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;7,290&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight:400;"&gt;The details about how data should be ordered, formatted and uploaded to a machine learning tool for training vary depending on the specific tool being used, so refer to your tool&amp;#39;s documentation for specific information about appropriately presenting data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Best Practices and Tips for Training Data&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be. This is applicable when a diverse and balanced (e.g. data between class A and B are split equally) set of training data is provided to avoid bias when making predictions. Google has &lt;/span&gt;&lt;a href="https://cloud.google.com/inclusive-ml/#fairness-in-ml-automl"&gt;&lt;span style="font-weight:400;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; and a &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=59bMh59JQDo"&gt;&lt;span style="font-weight:400;"&gt;video&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; regarding bias and machine learning that is helpful for learning more about this topic.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;To the greatest extent possible, provide training data that resembles the data you expect to see in production.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool&amp;#39;s documentation for more details.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Some tools allow you modify the weight given to specific columns during training, or specify a &amp;quot;time&amp;quot; column if training data values are influenced by time. Read your tool&amp;#39;s documentation for more details.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;See Also&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Websites:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://cloud.google.com/automl-tables/docs/data-best-practices"&gt;&lt;span style="font-weight:400;"&gt;Best Practices for Creating Training Data&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Videos:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://www.youtube.com/watch?v=f_uwKZIAeM0"&gt;&lt;span style="font-weight:400;"&gt;What is Machine Learning? (2 mins)&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://www.youtube.com/watch?v=2ePf9rue1Ao"&gt;&lt;span style="font-weight:400;"&gt;What is Artificial Intelligence? (5 mins)&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div id="gtx-trans" style="left:611px;position:absolute;top:1695.23px;"&gt;
&lt;div class="gtx-trans-icon"&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;

&lt;div style="font-size: 90%;"&gt;Tags: Platform, Machine Learning, Architecture&lt;/div&gt;
</description></item><item><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview/revision/14</link><pubDate>Fri, 14 Jun 2024 18:51:38 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:3439a4de-33fa-4d77-939f-3e5561eecd3a</guid><dc:creator>joel.larin</dc:creator><comments>https://community.appian.com/success/w/article/3392/machine-learning-overview#comments</comments><description>Revision 14 posted to Article by joel.larin on 6/14/2024 6:51:38 PM&lt;br /&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. Refer to the &lt;/span&gt;&lt;a href="/success/w/guide/3407/integrating-with-amazon-machine-learning"&gt;&lt;span style="font-weight:400;"&gt;article&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; for Amazon machine learning integrations that have been written about in detail in Appian&amp;#39;s documentation.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;div&gt;
&lt;div class="callout-box callout-info"&gt;This is a test for informational boxes&lt;/div&gt;
&lt;h2 id="mcetoc_1hthvnjgk0"&gt;&lt;b&gt;What is Machine Learning?&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning is a subset of AI that focuses on the development of algorithms and models that enable computers to learn and make decisions based on data. The models can be thought of as black boxes that are created by processing many observations both supervised and semi-supervised. These machine learning models are then able to take in one or many observations without a known outcome and produce possible outcomes based on their probabilities.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;There are many different use cases and applications for machine learning. This article mainly focuses on machine learning technology that analyzes structured data, such as rows of an Excel spreadsheet or an Appian CDT, and delivers a prediction for a specific field or column in the data. The feature, value, or attribute that is being predicted for is often referred to as the &lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;target&lt;/span&gt;&lt;/i&gt;&lt;span style="font-weight:400;"&gt;. Within the context of Appian, we&amp;rsquo;ll dive into the practical implementation of AI features that integrate with applications.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as &lt;/span&gt;&lt;a href="https://www.ibm.com/watson/services/natural-language-understanding/?cm_mmc=Search_Google-_-Watson+AI_Watson+Core+-+Platform-_-WW_NA-_-watson%20sentiment%20analysis_e&amp;amp;cm_mmca1=000036IU&amp;amp;cm_mmca2=10010583&amp;amp;cm_mmca7=9007770&amp;amp;cm_mmca8=kwd-309710612366&amp;amp;cm_mmca9=_k_EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE_k_&amp;amp;cm_mmca10=338427417319&amp;amp;cm_mmca11=e&amp;amp;gclid=EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE"&gt;&lt;span style="font-weight:400;"&gt;IBM&amp;#39;s Watson&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/document-ai"&gt;&lt;span style="font-weight:400;"&gt;Google AI&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; respectively.&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk1"&gt;&lt;b&gt;Appian ML/AI Capabilities&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://docs.appian.com/suite/help/latest/ai-skill-object.html"&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; facilitate the integration of machine learning and AI capabilities into your application. This is done using a variety of low-code design objects, functions and smart services. Features available within Appian AI Skills include document and email classification with custom-built models, and document extraction with pre-trained models.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Classification models can be custom-built, including being trained and tested using data that will accurately reflect your use case. The &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/create-skill-doc-extraction.html"&gt;&lt;span style="font-weight:400;"&gt;Document Extraction&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; AI skill identifies data from PDF documents, extracting and saving data into key-value pairs that can be used within the application or saved within a database.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills offer pre-trained models that use built-in documentation extraction capabilities.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;&amp;nbsp;Pre-trained models in Appian are designed for general use cases and are used in documents that have similar information and labeled values (e.g. structured or semi-structured documents). Incorporating Google AI functionalities into your Appian application enables the integration of various features, including but not limited to natural language processing, translation services, cloud-based storage, and mor&lt;/span&gt;&lt;span style="font-weight:400;"&gt;e&lt;/span&gt;&lt;span style="font-weight:400;"&gt;. See &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/using-google.html"&gt;&lt;span style="font-weight:400;"&gt;Using Google AI Services&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; for a full list of features available.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;Note that starting from January 23, 2024, Appian is no longer selling Appian-provisioned Google credentials to customers. Customers have to purchase the license directly through Google and add their Google credentials to their Appian Admin console&lt;/span&gt;&lt;/i&gt;&lt;span style="font-weight:400;"&gt;.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/appian-ai-copilot.html"&gt;&lt;span style="font-weight:400;"&gt;AI Copilot&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; is a starting point to further AI capabilities using Appian. AI Copilot utilizes generative AI to create functional interfaces by generating an initial interface from the fields in your form through a simple pdf upload. AI Copilot is integrated with Azure OpenAI to enable this functionally in your application. Azure OpenAI leverages generative AI models (e.g. gpt-3, codex, dall-e, chatgpt) to provide writing assistance, content generation, etc. You can use AI Copilot to build interfaces directly from a pdf, resulting in a personalized product that can be further customized according to your specific requirements once the initial interface is generated.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Machine learning, particularly deep learning, is one of the fundamental components of generative AI. Similarly to other machine learning models, generative AI models undergo training with large amounts of data that aids in identifying inherent patterns. The generative AI model is fine-tuned and enhanced with the introduction of more data over time. Leveraging AI with Appian allows you to automate repetitive tasks and simplify processes, streamlining development and increasing efficiency and productivity.&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk2"&gt;&lt;b&gt;Common Model Types&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;There are two major categories of model types that are used for making machine learning predictions on structured data:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Regression&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: predicts a numeric value.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:&lt;/span&gt;
&lt;ol&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Binary classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: the model has only two prediction values to choose from (ex. true and false).&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Multiclassification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: the model has more than two prediction values to choose from.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Which type you utilize is dependent on the target attribute you want to predict for and your overall objective in creating the model. Read the sections below to learn more about the purpose of each model type and see examples describing appropriate uses of each one.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Regression&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Regression models make predictions along a continuous range of numerical values. They have many important use cases (examples below), but can&amp;#39;t be used in cases where binary, categorical, or non-numeric values are required without additional processing to the model&amp;rsquo;s output.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). The RMSE represents the standard deviation between predicted and actual values; thus a good RMSE is&amp;nbsp; relative to the range of values&amp;nbsp; you are trying to predict. A perfect model would have a RMSE of 0.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Regression models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The sale price of a home, given information about the home&amp;#39;s size, number of bedrooms, zip code, etc.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The appropriate salary for a job posting, given information about that job&amp;#39;s difficulty and expected characteristics of qualified candidates.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The number of viewers who will watch the premiere of a new TV series, given information about the show&amp;#39;s genre and cast.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Binary Classification&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Binary classification models predict for a value that has only two possible outcomes.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5, then the predicted value will be true. However, machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to evaluate performance of a binary classification model is &lt;/span&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;&lt;span style="font-weight:400;"&gt;Area Under the Curve (AUC)&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt;. The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Binary classification models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a loan application should be approved or rejected, given credit details about the applicant.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether someone will sign up for a service, given their demographics.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account&amp;#39;s typical usage patterns.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Multiclassification&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Multiclass models predict for a categorical value from a list of three or more discrete, finite possibilities.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model&amp;#39;s best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Since the target attribute&amp;#39;s possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to determine the accuracy of a multiclass model is called an &lt;/span&gt;&lt;a href="https://www.v7labs.com/blog/f1-score-guide"&gt;&lt;span style="font-weight:400;"&gt;F1 score&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt;. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;To make predictions from a group of possibilities that is larger than a machine learning tool&amp;#39;s limit, consider using a series of different models. For example, to classify animal species from an image, better results can be achieved by first training the model for a more general classification (e.g. feline, canine, rodent). Additional models can be trained to identify specific species.&amp;nbsp;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Multiclass classification models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Which category of car&amp;mdash;sedan, truck or SUV&amp;mdash;someone is likely to purchase, given their demographics.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A book&amp;#39;s genre, given information about the book&amp;#39;s author, length, characters, storyline, etc.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills Use Case: &lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;Email Classification&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;The client receives thousands of emails everyday for customer support. Employees manually forward these emails to appropriate departments and locations based on a review of the email description and the customer&amp;#39;s location. This process is time consuming and prone to human error. The client can automate this process using the &lt;/span&gt;&lt;b&gt;Email Classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt; AI Skill that combines machine learning and automation. For the new model to be effective, the client must upload a &amp;quot;training set&amp;quot; consisting of a diverse set of emails which includes multiple examples for all desired email routing options.Once the model is trained and tested, the client can publish the model to make it available for use through the &lt;/span&gt;&lt;b&gt;Classify Emails&lt;/b&gt;&lt;span style="font-weight:400;"&gt; smart service&lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Model Types Summary&lt;/b&gt;&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Model&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Prediction Type&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Common Performance Metrics&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Example&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Regression&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts a numeric value&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://agrimetsoft.com/data-tool.aspx#rmse"&gt;&lt;span style="font-weight:400;"&gt;Root Mean Square Error (RMSE)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://agrimetsoft.com/data-tool.aspx#mae"&gt;&lt;span style="font-weight:400;"&gt;Mean Absolute Error (MSE)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting a home&amp;#39;s sale price&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Binary Classification&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts binary values (ex. true or false)&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;&lt;span style="font-weight:400;"&gt;Area Under the Curve (AUC)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting whether a job candidate should be offered employment&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Multiclass Classification&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts values that belong to a limited, predefined set of permissible values&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/F1_score"&gt;&lt;span style="font-weight:400;"&gt;F1 Score&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Loss_functions_for_classification#Logistic_loss"&gt;&lt;span style="font-weight:400;"&gt;Log Loss&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting a book&amp;#39;s genre&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="mcetoc_1hthvnjgk3"&gt;&lt;b&gt;Training Data&amp;nbsp;&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values of input data and the target attribute. The model ultimately applies the associations and patterns it found in the training data to make predictions for novel input data. There is a common adage that a model is &amp;ldquo;only as good as its training data&amp;#39;&amp;#39;. If the training data is not a representative sample of the data against which it will be making predictions, the model&amp;rsquo;s performance will suffer.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car. In this use case, the column marked &amp;quot;Sale Price&amp;quot; would be identified to the model as the target attribute to predict for.&lt;/span&gt;&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Year&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Make&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Model&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Color&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Transmission&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Mileage&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Previous Owners&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Sale Price&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1997&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Ford&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Mustang&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Silver&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;201,298&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;3&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1,499&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2013&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Mazda&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;3&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Black&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;60,588&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;8,100&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2005&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Honda&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Element&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Red&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;160,378&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;4,760&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2009&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Toyota&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Camry&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Blue&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Manual&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;87,380&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;7,290&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight:400;"&gt;The details about how data should be ordered, formatted and uploaded to a machine learning tool for training vary depending on the specific tool being used, so refer to your tool&amp;#39;s documentation for specific information about appropriately presenting data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Best Practices and Tips for Training Data&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be. This is applicable when a diverse and balanced (e.g. data between class A and B are split equally) set of training data is provided to avoid bias when making predictions. Google has &lt;/span&gt;&lt;a href="https://cloud.google.com/inclusive-ml/#fairness-in-ml-automl"&gt;&lt;span style="font-weight:400;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; and a &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=59bMh59JQDo"&gt;&lt;span style="font-weight:400;"&gt;video&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; regarding bias and machine learning that is helpful for learning more about this topic.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;To the greatest extent possible, provide training data that resembles the data you expect to see in production.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool&amp;#39;s documentation for more details.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Some tools allow you modify the weight given to specific columns during training, or specify a &amp;quot;time&amp;quot; column if training data values are influenced by time. Read your tool&amp;#39;s documentation for more details.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;See Also&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Websites:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://cloud.google.com/automl-tables/docs/data-best-practices"&gt;&lt;span style="font-weight:400;"&gt;Best Practices for Creating Training Data&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Videos:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://www.youtube.com/watch?v=f_uwKZIAeM0"&gt;&lt;span style="font-weight:400;"&gt;What is Machine Learning? (2 mins)&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://www.youtube.com/watch?v=2ePf9rue1Ao"&gt;&lt;span style="font-weight:400;"&gt;What is Artificial Intelligence? (5 mins)&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div id="gtx-trans" style="left:611px;position:absolute;top:1695.23px;"&gt;
&lt;div class="gtx-trans-icon"&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;

&lt;div style="font-size: 90%;"&gt;Tags: Platform, Machine Learning, Architecture&lt;/div&gt;
</description></item><item><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview/revision/13</link><pubDate>Fri, 14 Jun 2024 17:54:55 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:3439a4de-33fa-4d77-939f-3e5561eecd3a</guid><dc:creator>joel.larin</dc:creator><comments>https://community.appian.com/success/w/article/3392/machine-learning-overview#comments</comments><description>Revision 13 posted to Article by joel.larin on 6/14/2024 5:54:55 PM&lt;br /&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. Refer to the &lt;/span&gt;&lt;a href="/success/w/guide/3407/integrating-with-amazon-machine-learning"&gt;&lt;span style="font-weight:400;"&gt;article&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; for Amazon machine learning integrations that have been written about in detail in Appian&amp;#39;s documentation.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk0"&gt;&lt;b&gt;What is Machine Learning?&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning is a subset of AI that focuses on the development of algorithms and models that enable computers to learn and make decisions based on data. The models can be thought of as black boxes that are created by processing many observations both supervised and semi-supervised. These machine learning models are then able to take in one or many observations without a known outcome and produce possible outcomes based on their probabilities.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;There are many different use cases and applications for machine learning. This article mainly focuses on machine learning technology that analyzes structured data, such as rows of an Excel spreadsheet or an Appian CDT, and delivers a prediction for a specific field or column in the data. The feature, value, or attribute that is being predicted for is often referred to as the &lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;target&lt;/span&gt;&lt;/i&gt;&lt;span style="font-weight:400;"&gt;. Within the context of Appian, we&amp;rsquo;ll dive into the practical implementation of AI features that integrate with applications.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as &lt;/span&gt;&lt;a href="https://www.ibm.com/watson/services/natural-language-understanding/?cm_mmc=Search_Google-_-Watson+AI_Watson+Core+-+Platform-_-WW_NA-_-watson%20sentiment%20analysis_e&amp;amp;cm_mmca1=000036IU&amp;amp;cm_mmca2=10010583&amp;amp;cm_mmca7=9007770&amp;amp;cm_mmca8=kwd-309710612366&amp;amp;cm_mmca9=_k_EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE_k_&amp;amp;cm_mmca10=338427417319&amp;amp;cm_mmca11=e&amp;amp;gclid=EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE"&gt;&lt;span style="font-weight:400;"&gt;IBM&amp;#39;s Watson&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/document-ai"&gt;&lt;span style="font-weight:400;"&gt;Google AI&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; respectively.&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk1"&gt;&lt;b&gt;Appian ML/AI Capabilities&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://docs.appian.com/suite/help/latest/ai-skill-object.html"&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; facilitate the integration of machine learning and AI capabilities into your application. This is done using a variety of low-code design objects, functions and smart services. Features available within Appian AI Skills include document and email classification with custom-built models, and document extraction with pre-trained models.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Classification models can be custom-built, including being trained and tested using data that will accurately reflect your use case. The &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/create-skill-doc-extraction.html"&gt;&lt;span style="font-weight:400;"&gt;Document Extraction&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; AI skill identifies data from PDF documents, extracting and saving data into key-value pairs that can be used within the application or saved within a database.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills offer pre-trained models that use built-in documentation extraction capabilities.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;&amp;nbsp;Pre-trained models in Appian are designed for general use cases and are used in documents that have similar information and labeled values (e.g. structured or semi-structured documents). Incorporating Google AI functionalities into your Appian application enables the integration of various features, including but not limited to natural language processing, translation services, cloud-based storage, and mor&lt;/span&gt;&lt;span style="font-weight:400;"&gt;e&lt;/span&gt;&lt;span style="font-weight:400;"&gt;. See &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/using-google.html"&gt;&lt;span style="font-weight:400;"&gt;Using Google AI Services&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; for a full list of features available.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;Note that starting from January 23, 2024, Appian is no longer selling Appian-provisioned Google credentials to customers. Customers have to purchase the license directly through Google and add their Google credentials to their Appian Admin console&lt;/span&gt;&lt;/i&gt;&lt;span style="font-weight:400;"&gt;.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/appian-ai-copilot.html"&gt;&lt;span style="font-weight:400;"&gt;AI Copilot&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; is a starting point to further AI capabilities using Appian. AI Copilot utilizes generative AI to create functional interfaces by generating an initial interface from the fields in your form through a simple pdf upload. AI Copilot is integrated with Azure OpenAI to enable this functionally in your application. Azure OpenAI leverages generative AI models (e.g. gpt-3, codex, dall-e, chatgpt) to provide writing assistance, content generation, etc. You can use AI Copilot to build interfaces directly from a pdf, resulting in a personalized product that can be further customized according to your specific requirements once the initial interface is generated.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Machine learning, particularly deep learning, is one of the fundamental components of generative AI. Similarly to other machine learning models, generative AI models undergo training with large amounts of data that aids in identifying inherent patterns. The generative AI model is fine-tuned and enhanced with the introduction of more data over time. Leveraging AI with Appian allows you to automate repetitive tasks and simplify processes, streamlining development and increasing efficiency and productivity.&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk2"&gt;&lt;b&gt;Common Model Types&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;There are two major categories of model types that are used for making machine learning predictions on structured data:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Regression&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: predicts a numeric value.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:&lt;/span&gt;
&lt;ol&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Binary classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: the model has only two prediction values to choose from (ex. true and false).&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Multiclassification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: the model has more than two prediction values to choose from.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Which type you utilize is dependent on the target attribute you want to predict for and your overall objective in creating the model. Read the sections below to learn more about the purpose of each model type and see examples describing appropriate uses of each one.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Regression&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Regression models make predictions along a continuous range of numerical values. They have many important use cases (examples below), but can&amp;#39;t be used in cases where binary, categorical, or non-numeric values are required without additional processing to the model&amp;rsquo;s output.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). The RMSE represents the standard deviation between predicted and actual values; thus a good RMSE is&amp;nbsp; relative to the range of values&amp;nbsp; you are trying to predict. A perfect model would have a RMSE of 0.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Regression models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The sale price of a home, given information about the home&amp;#39;s size, number of bedrooms, zip code, etc.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The appropriate salary for a job posting, given information about that job&amp;#39;s difficulty and expected characteristics of qualified candidates.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The number of viewers who will watch the premiere of a new TV series, given information about the show&amp;#39;s genre and cast.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Binary Classification&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Binary classification models predict for a value that has only two possible outcomes.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5, then the predicted value will be true. However, machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to evaluate performance of a binary classification model is &lt;/span&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;&lt;span style="font-weight:400;"&gt;Area Under the Curve (AUC)&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt;. The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Binary classification models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a loan application should be approved or rejected, given credit details about the applicant.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether someone will sign up for a service, given their demographics.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account&amp;#39;s typical usage patterns.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Multiclassification&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Multiclass models predict for a categorical value from a list of three or more discrete, finite possibilities.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model&amp;#39;s best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Since the target attribute&amp;#39;s possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to determine the accuracy of a multiclass model is called an &lt;/span&gt;&lt;a href="https://www.v7labs.com/blog/f1-score-guide"&gt;&lt;span style="font-weight:400;"&gt;F1 score&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt;. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;To make predictions from a group of possibilities that is larger than a machine learning tool&amp;#39;s limit, consider using a series of different models. For example, to classify animal species from an image, better results can be achieved by first training the model for a more general classification (e.g. feline, canine, rodent). Additional models can be trained to identify specific species.&amp;nbsp;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Multiclass classification models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Which category of car&amp;mdash;sedan, truck or SUV&amp;mdash;someone is likely to purchase, given their demographics.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A book&amp;#39;s genre, given information about the book&amp;#39;s author, length, characters, storyline, etc.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills Use Case: &lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;Email Classification&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;The client receives thousands of emails everyday for customer support. Employees manually forward these emails to appropriate departments and locations based on a review of the email description and the customer&amp;#39;s location. This process is time consuming and prone to human error. The client can automate this process using the &lt;/span&gt;&lt;b&gt;Email Classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt; AI Skill that combines machine learning and automation. For the new model to be effective, the client must upload a &amp;quot;training set&amp;quot; consisting of a diverse set of emails which includes multiple examples for all desired email routing options.Once the model is trained and tested, the client can publish the model to make it available for use through the &lt;/span&gt;&lt;b&gt;Classify Emails&lt;/b&gt;&lt;span style="font-weight:400;"&gt; smart service&lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Model Types Summary&lt;/b&gt;&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Model&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Prediction Type&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Common Performance Metrics&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Example&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Regression&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts a numeric value&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://agrimetsoft.com/data-tool.aspx#rmse"&gt;&lt;span style="font-weight:400;"&gt;Root Mean Square Error (RMSE)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://agrimetsoft.com/data-tool.aspx#mae"&gt;&lt;span style="font-weight:400;"&gt;Mean Absolute Error (MSE)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting a home&amp;#39;s sale price&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Binary Classification&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts binary values (ex. true or false)&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;&lt;span style="font-weight:400;"&gt;Area Under the Curve (AUC)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting whether a job candidate should be offered employment&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Multiclass Classification&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts values that belong to a limited, predefined set of permissible values&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/F1_score"&gt;&lt;span style="font-weight:400;"&gt;F1 Score&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Loss_functions_for_classification#Logistic_loss"&gt;&lt;span style="font-weight:400;"&gt;Log Loss&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting a book&amp;#39;s genre&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="mcetoc_1hthvnjgk3"&gt;&lt;b&gt;Training Data&amp;nbsp;&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values of input data and the target attribute. The model ultimately applies the associations and patterns it found in the training data to make predictions for novel input data. There is a common adage that a model is &amp;ldquo;only as good as its training data&amp;#39;&amp;#39;. If the training data is not a representative sample of the data against which it will be making predictions, the model&amp;rsquo;s performance will suffer.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car. In this use case, the column marked &amp;quot;Sale Price&amp;quot; would be identified to the model as the target attribute to predict for.&lt;/span&gt;&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Year&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Make&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Model&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Color&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Transmission&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Mileage&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Previous Owners&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Sale Price&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1997&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Ford&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Mustang&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Silver&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;201,298&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;3&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1,499&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2013&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Mazda&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;3&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Black&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;60,588&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;8,100&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2005&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Honda&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Element&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Red&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;160,378&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;4,760&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2009&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Toyota&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Camry&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Blue&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Manual&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;87,380&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;7,290&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight:400;"&gt;The details about how data should be ordered, formatted and uploaded to a machine learning tool for training vary depending on the specific tool being used, so refer to your tool&amp;#39;s documentation for specific information about appropriately presenting data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Best Practices and Tips for Training Data&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be. This is applicable when a diverse and balanced (e.g. data between class A and B are split equally) set of training data is provided to avoid bias when making predictions. Google has &lt;/span&gt;&lt;a href="https://cloud.google.com/inclusive-ml/#fairness-in-ml-automl"&gt;&lt;span style="font-weight:400;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; and a &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=59bMh59JQDo"&gt;&lt;span style="font-weight:400;"&gt;video&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; regarding bias and machine learning that is helpful for learning more about this topic.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;To the greatest extent possible, provide training data that resembles the data you expect to see in production.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool&amp;#39;s documentation for more details.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Some tools allow you modify the weight given to specific columns during training, or specify a &amp;quot;time&amp;quot; column if training data values are influenced by time. Read your tool&amp;#39;s documentation for more details.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;See Also&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Websites:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://cloud.google.com/automl-tables/docs/data-best-practices"&gt;&lt;span style="font-weight:400;"&gt;Best Practices for Creating Training Data&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Videos:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://www.youtube.com/watch?v=f_uwKZIAeM0"&gt;&lt;span style="font-weight:400;"&gt;What is Machine Learning? (2 mins)&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://www.youtube.com/watch?v=2ePf9rue1Ao"&gt;&lt;span style="font-weight:400;"&gt;What is Artificial Intelligence? (5 mins)&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div id="gtx-trans" style="left:611px;position:absolute;top:1695.23px;"&gt;
&lt;div class="gtx-trans-icon"&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;

&lt;div style="font-size: 90%;"&gt;Tags: Platform, Machine Learning, Architecture&lt;/div&gt;
</description></item><item><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview/revision/12</link><pubDate>Fri, 14 Jun 2024 17:54:20 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:3439a4de-33fa-4d77-939f-3e5561eecd3a</guid><dc:creator>joel.larin</dc:creator><comments>https://community.appian.com/success/w/article/3392/machine-learning-overview#comments</comments><description>Revision 12 posted to Article by joel.larin on 6/14/2024 5:54:20 PM&lt;br /&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. Refer to the &lt;/span&gt;&lt;a href="/success/w/guide/3407/integrating-with-amazon-machine-learning"&gt;&lt;span style="font-weight:400;"&gt;article&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; for Amazon machine learning integrations that have been written about in detail in Appian&amp;#39;s documentation.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;div&gt;
&lt;div class="callout-box callout-info"&gt;This is a test for informational boxes&lt;/div&gt;
&lt;h2 id="mcetoc_1hthvnjgk0"&gt;&lt;b&gt;What is Machine Learning?&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning is a subset of AI that focuses on the development of algorithms and models that enable computers to learn and make decisions based on data. The models can be thought of as black boxes that are created by processing many observations both supervised and semi-supervised. These machine learning models are then able to take in one or many observations without a known outcome and produce possible outcomes based on their probabilities.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;There are many different use cases and applications for machine learning. This article mainly focuses on machine learning technology that analyzes structured data, such as rows of an Excel spreadsheet or an Appian CDT, and delivers a prediction for a specific field or column in the data. The feature, value, or attribute that is being predicted for is often referred to as the &lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;target&lt;/span&gt;&lt;/i&gt;&lt;span style="font-weight:400;"&gt;. Within the context of Appian, we&amp;rsquo;ll dive into the practical implementation of AI features that integrate with applications.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as &lt;/span&gt;&lt;a href="https://www.ibm.com/watson/services/natural-language-understanding/?cm_mmc=Search_Google-_-Watson+AI_Watson+Core+-+Platform-_-WW_NA-_-watson%20sentiment%20analysis_e&amp;amp;cm_mmca1=000036IU&amp;amp;cm_mmca2=10010583&amp;amp;cm_mmca7=9007770&amp;amp;cm_mmca8=kwd-309710612366&amp;amp;cm_mmca9=_k_EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE_k_&amp;amp;cm_mmca10=338427417319&amp;amp;cm_mmca11=e&amp;amp;gclid=EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE"&gt;&lt;span style="font-weight:400;"&gt;IBM&amp;#39;s Watson&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/document-ai"&gt;&lt;span style="font-weight:400;"&gt;Google AI&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; respectively.&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk1"&gt;&lt;b&gt;Appian ML/AI Capabilities&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://docs.appian.com/suite/help/latest/ai-skill-object.html"&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; facilitate the integration of machine learning and AI capabilities into your application. This is done using a variety of low-code design objects, functions and smart services. Features available within Appian AI Skills include document and email classification with custom-built models, and document extraction with pre-trained models.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Classification models can be custom-built, including being trained and tested using data that will accurately reflect your use case. The &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/create-skill-doc-extraction.html"&gt;&lt;span style="font-weight:400;"&gt;Document Extraction&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; AI skill identifies data from PDF documents, extracting and saving data into key-value pairs that can be used within the application or saved within a database.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills offer pre-trained models that use built-in documentation extraction capabilities.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;&amp;nbsp;Pre-trained models in Appian are designed for general use cases and are used in documents that have similar information and labeled values (e.g. structured or semi-structured documents). Incorporating Google AI functionalities into your Appian application enables the integration of various features, including but not limited to natural language processing, translation services, cloud-based storage, and mor&lt;/span&gt;&lt;span style="font-weight:400;"&gt;e&lt;/span&gt;&lt;span style="font-weight:400;"&gt;. See &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/using-google.html"&gt;&lt;span style="font-weight:400;"&gt;Using Google AI Services&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; for a full list of features available.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;Note that starting from January 23, 2024, Appian is no longer selling Appian-provisioned Google credentials to customers. Customers have to purchase the license directly through Google and add their Google credentials to their Appian Admin console&lt;/span&gt;&lt;/i&gt;&lt;span style="font-weight:400;"&gt;.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/appian-ai-copilot.html"&gt;&lt;span style="font-weight:400;"&gt;AI Copilot&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; is a starting point to further AI capabilities using Appian. AI Copilot utilizes generative AI to create functional interfaces by generating an initial interface from the fields in your form through a simple pdf upload. AI Copilot is integrated with Azure OpenAI to enable this functionally in your application. Azure OpenAI leverages generative AI models (e.g. gpt-3, codex, dall-e, chatgpt) to provide writing assistance, content generation, etc. You can use AI Copilot to build interfaces directly from a pdf, resulting in a personalized product that can be further customized according to your specific requirements once the initial interface is generated.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Machine learning, particularly deep learning, is one of the fundamental components of generative AI. Similarly to other machine learning models, generative AI models undergo training with large amounts of data that aids in identifying inherent patterns. The generative AI model is fine-tuned and enhanced with the introduction of more data over time. Leveraging AI with Appian allows you to automate repetitive tasks and simplify processes, streamlining development and increasing efficiency and productivity.&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk2"&gt;&lt;b&gt;Common Model Types&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;There are two major categories of model types that are used for making machine learning predictions on structured data:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Regression&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: predicts a numeric value.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:&lt;/span&gt;
&lt;ol&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Binary classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: the model has only two prediction values to choose from (ex. true and false).&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Multiclassification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: the model has more than two prediction values to choose from.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Which type you utilize is dependent on the target attribute you want to predict for and your overall objective in creating the model. Read the sections below to learn more about the purpose of each model type and see examples describing appropriate uses of each one.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Regression&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Regression models make predictions along a continuous range of numerical values. They have many important use cases (examples below), but can&amp;#39;t be used in cases where binary, categorical, or non-numeric values are required without additional processing to the model&amp;rsquo;s output.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). The RMSE represents the standard deviation between predicted and actual values; thus a good RMSE is&amp;nbsp; relative to the range of values&amp;nbsp; you are trying to predict. A perfect model would have a RMSE of 0.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Regression models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The sale price of a home, given information about the home&amp;#39;s size, number of bedrooms, zip code, etc.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The appropriate salary for a job posting, given information about that job&amp;#39;s difficulty and expected characteristics of qualified candidates.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The number of viewers who will watch the premiere of a new TV series, given information about the show&amp;#39;s genre and cast.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Binary Classification&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Binary classification models predict for a value that has only two possible outcomes.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5, then the predicted value will be true. However, machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to evaluate performance of a binary classification model is &lt;/span&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;&lt;span style="font-weight:400;"&gt;Area Under the Curve (AUC)&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt;. The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Binary classification models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a loan application should be approved or rejected, given credit details about the applicant.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether someone will sign up for a service, given their demographics.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account&amp;#39;s typical usage patterns.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Multiclassification&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Multiclass models predict for a categorical value from a list of three or more discrete, finite possibilities.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model&amp;#39;s best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Since the target attribute&amp;#39;s possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to determine the accuracy of a multiclass model is called an &lt;/span&gt;&lt;a href="https://www.v7labs.com/blog/f1-score-guide"&gt;&lt;span style="font-weight:400;"&gt;F1 score&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt;. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;To make predictions from a group of possibilities that is larger than a machine learning tool&amp;#39;s limit, consider using a series of different models. For example, to classify animal species from an image, better results can be achieved by first training the model for a more general classification (e.g. feline, canine, rodent). Additional models can be trained to identify specific species.&amp;nbsp;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Multiclass classification models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Which category of car&amp;mdash;sedan, truck or SUV&amp;mdash;someone is likely to purchase, given their demographics.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A book&amp;#39;s genre, given information about the book&amp;#39;s author, length, characters, storyline, etc.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills Use Case: &lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;Email Classification&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;The client receives thousands of emails everyday for customer support. Employees manually forward these emails to appropriate departments and locations based on a review of the email description and the customer&amp;#39;s location. This process is time consuming and prone to human error. The client can automate this process using the &lt;/span&gt;&lt;b&gt;Email Classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt; AI Skill that combines machine learning and automation. For the new model to be effective, the client must upload a &amp;quot;training set&amp;quot; consisting of a diverse set of emails which includes multiple examples for all desired email routing options.Once the model is trained and tested, the client can publish the model to make it available for use through the &lt;/span&gt;&lt;b&gt;Classify Emails&lt;/b&gt;&lt;span style="font-weight:400;"&gt; smart service&lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Model Types Summary&lt;/b&gt;&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Model&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Prediction Type&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Common Performance Metrics&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Example&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Regression&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts a numeric value&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://agrimetsoft.com/data-tool.aspx#rmse"&gt;&lt;span style="font-weight:400;"&gt;Root Mean Square Error (RMSE)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://agrimetsoft.com/data-tool.aspx#mae"&gt;&lt;span style="font-weight:400;"&gt;Mean Absolute Error (MSE)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting a home&amp;#39;s sale price&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Binary Classification&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts binary values (ex. true or false)&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;&lt;span style="font-weight:400;"&gt;Area Under the Curve (AUC)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting whether a job candidate should be offered employment&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Multiclass Classification&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts values that belong to a limited, predefined set of permissible values&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/F1_score"&gt;&lt;span style="font-weight:400;"&gt;F1 Score&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Loss_functions_for_classification#Logistic_loss"&gt;&lt;span style="font-weight:400;"&gt;Log Loss&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting a book&amp;#39;s genre&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="mcetoc_1hthvnjgk3"&gt;&lt;b&gt;Training Data&amp;nbsp;&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values of input data and the target attribute. The model ultimately applies the associations and patterns it found in the training data to make predictions for novel input data. There is a common adage that a model is &amp;ldquo;only as good as its training data&amp;#39;&amp;#39;. If the training data is not a representative sample of the data against which it will be making predictions, the model&amp;rsquo;s performance will suffer.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car. In this use case, the column marked &amp;quot;Sale Price&amp;quot; would be identified to the model as the target attribute to predict for.&lt;/span&gt;&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Year&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Make&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Model&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Color&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Transmission&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Mileage&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Previous Owners&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Sale Price&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1997&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Ford&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Mustang&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Silver&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;201,298&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;3&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1,499&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2013&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Mazda&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;3&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Black&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;60,588&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;8,100&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2005&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Honda&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Element&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Red&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;160,378&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;4,760&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2009&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Toyota&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Camry&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Blue&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Manual&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;87,380&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;7,290&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight:400;"&gt;The details about how data should be ordered, formatted and uploaded to a machine learning tool for training vary depending on the specific tool being used, so refer to your tool&amp;#39;s documentation for specific information about appropriately presenting data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Best Practices and Tips for Training Data&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be. This is applicable when a diverse and balanced (e.g. data between class A and B are split equally) set of training data is provided to avoid bias when making predictions. Google has &lt;/span&gt;&lt;a href="https://cloud.google.com/inclusive-ml/#fairness-in-ml-automl"&gt;&lt;span style="font-weight:400;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; and a &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=59bMh59JQDo"&gt;&lt;span style="font-weight:400;"&gt;video&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; regarding bias and machine learning that is helpful for learning more about this topic.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;To the greatest extent possible, provide training data that resembles the data you expect to see in production.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool&amp;#39;s documentation for more details.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Some tools allow you modify the weight given to specific columns during training, or specify a &amp;quot;time&amp;quot; column if training data values are influenced by time. Read your tool&amp;#39;s documentation for more details.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;See Also&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Websites:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://cloud.google.com/automl-tables/docs/data-best-practices"&gt;&lt;span style="font-weight:400;"&gt;Best Practices for Creating Training Data&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Videos:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://www.youtube.com/watch?v=f_uwKZIAeM0"&gt;&lt;span style="font-weight:400;"&gt;What is Machine Learning? (2 mins)&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://www.youtube.com/watch?v=2ePf9rue1Ao"&gt;&lt;span style="font-weight:400;"&gt;What is Artificial Intelligence? (5 mins)&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div id="gtx-trans" style="left:611px;position:absolute;top:1695.23px;"&gt;
&lt;div class="gtx-trans-icon"&gt;&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;

&lt;div style="font-size: 90%;"&gt;Tags: Platform, Machine Learning, Architecture&lt;/div&gt;
</description></item><item><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview/revision/11</link><pubDate>Fri, 31 May 2024 14:37:35 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:3439a4de-33fa-4d77-939f-3e5561eecd3a</guid><dc:creator>Appian Max Team</dc:creator><comments>https://community.appian.com/success/w/article/3392/machine-learning-overview#comments</comments><description>Revision 11 posted to Article by Appian Max Team on 5/31/2024 2:37:35 PM&lt;br /&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. Refer to the &lt;/span&gt;&lt;a href="/success/w/guide/3407/integrating-with-amazon-machine-learning"&gt;&lt;span style="font-weight:400;"&gt;article&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; for Amazon machine learning integrations that have been written about in detail in Appian&amp;#39;s documentation.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk0"&gt;&lt;b&gt;What is Machine Learning?&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning is a subset of AI that focuses on the development of algorithms and models that enable computers to learn and make decisions based on data. The models can be thought of as black boxes that are created by processing many observations both supervised and semi-supervised. These machine learning models are then able to take in one or many observations without a known outcome and produce possible outcomes based on their probabilities.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;There are many different use cases and applications for machine learning. This article mainly focuses on machine learning technology that analyzes structured data, such as rows of an Excel spreadsheet or an Appian CDT, and delivers a prediction for a specific field or column in the data. The feature, value, or attribute that is being predicted for is often referred to as the &lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;target&lt;/span&gt;&lt;/i&gt;&lt;span style="font-weight:400;"&gt;. Within the context of Appian, we&amp;rsquo;ll dive into the practical implementation of AI features that integrate with applications.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as &lt;/span&gt;&lt;a href="https://www.ibm.com/watson/services/natural-language-understanding/?cm_mmc=Search_Google-_-Watson+AI_Watson+Core+-+Platform-_-WW_NA-_-watson%20sentiment%20analysis_e&amp;amp;cm_mmca1=000036IU&amp;amp;cm_mmca2=10010583&amp;amp;cm_mmca7=9007770&amp;amp;cm_mmca8=kwd-309710612366&amp;amp;cm_mmca9=_k_EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE_k_&amp;amp;cm_mmca10=338427417319&amp;amp;cm_mmca11=e&amp;amp;gclid=EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE"&gt;&lt;span style="font-weight:400;"&gt;IBM&amp;#39;s Watson&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/document-ai"&gt;&lt;span style="font-weight:400;"&gt;Google AI&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; respectively.&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk1"&gt;&lt;b&gt;Appian ML/AI Capabilities&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://docs.appian.com/suite/help/latest/ai-skill-object.html"&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; facilitate the integration of machine learning and AI capabilities into your application. This is done using a variety of low-code design objects, functions and smart services. Features available within Appian AI Skills include document and email classification with custom-built models, and document extraction with pre-trained models.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Classification models can be custom-built, including being trained and tested using data that will accurately reflect your use case. The &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/create-skill-doc-extraction.html"&gt;&lt;span style="font-weight:400;"&gt;Document Extraction&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; AI skill identifies data from PDF documents, extracting and saving data into key-value pairs that can be used within the application or saved within a database.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills offer pre-trained models that use built-in documentation extraction capabilities.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;&amp;nbsp;Pre-trained models in Appian are designed for general use cases and are used in documents that have similar information and labeled values (e.g. structured or semi-structured documents). Incorporating Google AI functionalities into your Appian application enables the integration of various features, including but not limited to natural language processing, translation services, cloud-based storage, and mor&lt;/span&gt;&lt;span style="font-weight:400;"&gt;e&lt;/span&gt;&lt;span style="font-weight:400;"&gt;. See &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/using-google.html"&gt;&lt;span style="font-weight:400;"&gt;Using Google AI Services&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; for a full list of features available.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;Note that starting from January 23, 2024, Appian is no longer selling Appian-provisioned Google credentials to customers. Customers have to purchase the license directly through Google and add their Google credentials to their Appian Admin console&lt;/span&gt;&lt;/i&gt;&lt;span style="font-weight:400;"&gt;.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/appian-ai-copilot.html"&gt;&lt;span style="font-weight:400;"&gt;AI Copilot&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; is a starting point to further AI capabilities using Appian. AI Copilot utilizes generative AI to create functional interfaces by generating an initial interface from the fields in your form through a simple pdf upload. AI Copilot is integrated with Azure OpenAI to enable this functionally in your application. Azure OpenAI leverages generative AI models (e.g. gpt-3, codex, dall-e, chatgpt) to provide writing assistance, content generation, etc. You can use AI Copilot to build interfaces directly from a pdf, resulting in a personalized product that can be further customized according to your specific requirements once the initial interface is generated.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Machine learning, particularly deep learning, is one of the fundamental components of generative AI. Similarly to other machine learning models, generative AI models undergo training with large amounts of data that aids in identifying inherent patterns. The generative AI model is fine-tuned and enhanced with the introduction of more data over time. Leveraging AI with Appian allows you to automate repetitive tasks and simplify processes, streamlining development and increasing efficiency and productivity.&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk2"&gt;&lt;b&gt;Common Model Types&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;There are two major categories of model types that are used for making machine learning predictions on structured data:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Regression&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: predicts a numeric value.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:&lt;/span&gt;
&lt;ol&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Binary classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: the model has only two prediction values to choose from (ex. true and false).&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Multiclassification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: the model has more than two prediction values to choose from.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Which type you utilize is dependent on the target attribute you want to predict for and your overall objective in creating the model. Read the sections below to learn more about the purpose of each model type and see examples describing appropriate uses of each one.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Regression&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Regression models make predictions along a continuous range of numerical values. They have many important use cases (examples below), but can&amp;#39;t be used in cases where binary, categorical, or non-numeric values are required without additional processing to the model&amp;rsquo;s output.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). The RMSE represents the standard deviation between predicted and actual values; thus a good RMSE is&amp;nbsp; relative to the range of values&amp;nbsp; you are trying to predict. A perfect model would have a RMSE of 0.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Regression models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The sale price of a home, given information about the home&amp;#39;s size, number of bedrooms, zip code, etc.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The appropriate salary for a job posting, given information about that job&amp;#39;s difficulty and expected characteristics of qualified candidates.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The number of viewers who will watch the premiere of a new TV series, given information about the show&amp;#39;s genre and cast.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Binary Classification&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Binary classification models predict for a value that has only two possible outcomes.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5, then the predicted value will be true. However, machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to evaluate performance of a binary classification model is &lt;/span&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;&lt;span style="font-weight:400;"&gt;Area Under the Curve (AUC)&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt;. The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Binary classification models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a loan application should be approved or rejected, given credit details about the applicant.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether someone will sign up for a service, given their demographics.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account&amp;#39;s typical usage patterns.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Multiclassification&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Multiclass models predict for a categorical value from a list of three or more discrete, finite possibilities.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model&amp;#39;s best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Since the target attribute&amp;#39;s possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to determine the accuracy of a multiclass model is called an &lt;/span&gt;&lt;a href="https://www.v7labs.com/blog/f1-score-guide"&gt;&lt;span style="font-weight:400;"&gt;F1 score&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt;. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;To make predictions from a group of possibilities that is larger than a machine learning tool&amp;#39;s limit, consider using a series of different models. For example, to classify animal species from an image, better results can be achieved by first training the model for a more general classification (e.g. feline, canine, rodent). Additional models can be trained to identify specific species.&amp;nbsp;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Multiclass classification models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Which category of car&amp;mdash;sedan, truck or SUV&amp;mdash;someone is likely to purchase, given their demographics.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A book&amp;#39;s genre, given information about the book&amp;#39;s author, length, characters, storyline, etc.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills Use Case: &lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;Email Classification&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;The client receives thousands of emails everyday for customer support. Employees manually forward these emails to appropriate departments and locations based on a review of the email description and the customer&amp;#39;s location. This process is time consuming and prone to human error. The client can automate this process using the &lt;/span&gt;&lt;b&gt;Email Classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt; AI Skill that combines machine learning and automation. For the new model to be effective, the client must upload a &amp;quot;training set&amp;quot; consisting of a diverse set of emails which includes multiple examples for all desired email routing options.Once the model is trained and tested, the client can publish the model to make it available for use through the &lt;/span&gt;&lt;b&gt;Classify Emails&lt;/b&gt;&lt;span style="font-weight:400;"&gt; smart service&lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Model Types Summary&lt;/b&gt;&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Model&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Prediction Type&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Common Performance Metrics&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Example&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Regression&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts a numeric value&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://agrimetsoft.com/data-tool.aspx#rmse"&gt;&lt;span style="font-weight:400;"&gt;Root Mean Square Error (RMSE)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://agrimetsoft.com/data-tool.aspx#mae"&gt;&lt;span style="font-weight:400;"&gt;Mean Absolute Error (MSE)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting a home&amp;#39;s sale price&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Binary Classification&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts binary values (ex. true or false)&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;&lt;span style="font-weight:400;"&gt;Area Under the Curve (AUC)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting whether a job candidate should be offered employment&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Multiclass Classification&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts values that belong to a limited, predefined set of permissible values&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/F1_score"&gt;&lt;span style="font-weight:400;"&gt;F1 Score&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Loss_functions_for_classification#Logistic_loss"&gt;&lt;span style="font-weight:400;"&gt;Log Loss&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting a book&amp;#39;s genre&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="mcetoc_1hthvnjgk3"&gt;&lt;b&gt;Training Data&amp;nbsp;&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values of input data and the target attribute. The model ultimately applies the associations and patterns it found in the training data to make predictions for novel input data. There is a common adage that a model is &amp;ldquo;only as good as its training data&amp;#39;&amp;#39;. If the training data is not a representative sample of the data against which it will be making predictions, the model&amp;rsquo;s performance will suffer.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car. In this use case, the column marked &amp;quot;Sale Price&amp;quot; would be identified to the model as the target attribute to predict for.&lt;/span&gt;&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Year&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Make&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Model&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Color&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Transmission&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Mileage&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Previous Owners&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Sale Price&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1997&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Ford&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Mustang&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Silver&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;201,298&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;3&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1,499&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2013&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Mazda&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;3&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Black&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;60,588&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;8,100&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2005&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Honda&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Element&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Red&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;160,378&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;4,760&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2009&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Toyota&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Camry&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Blue&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Manual&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;87,380&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;7,290&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight:400;"&gt;The details about how data should be ordered, formatted and uploaded to a machine learning tool for training vary depending on the specific tool being used, so refer to your tool&amp;#39;s documentation for specific information about appropriately presenting data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Best Practices and Tips for Training Data&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be. This is applicable when a diverse and balanced (e.g. data between class A and B are split equally) set of training data is provided to avoid bias when making predictions. Google has &lt;/span&gt;&lt;a href="https://cloud.google.com/inclusive-ml/#fairness-in-ml-automl"&gt;&lt;span style="font-weight:400;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; and a &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=59bMh59JQDo"&gt;&lt;span style="font-weight:400;"&gt;video&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; regarding bias and machine learning that is helpful for learning more about this topic.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;To the greatest extent possible, provide training data that resembles the data you expect to see in production.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool&amp;#39;s documentation for more details.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Some tools allow you modify the weight given to specific columns during training, or specify a &amp;quot;time&amp;quot; column if training data values are influenced by time. Read your tool&amp;#39;s documentation for more details.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;See Also&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Websites:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://cloud.google.com/automl-tables/docs/data-best-practices"&gt;&lt;span style="font-weight:400;"&gt;Best Practices for Creating Training Data&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Videos:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://www.youtube.com/watch?v=f_uwKZIAeM0"&gt;&lt;span style="font-weight:400;"&gt;What is Machine Learning? (2 mins)&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://www.youtube.com/watch?v=2ePf9rue1Ao"&gt;&lt;span style="font-weight:400;"&gt;What is Artificial Intelligence? (5 mins)&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div id="gtx-trans" style="left:611px;position:absolute;top:1695.23px;"&gt;
&lt;div class="gtx-trans-icon"&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;

&lt;div style="font-size: 90%;"&gt;Tags: Platform, Machine Learning, Architecture&lt;/div&gt;
</description></item><item><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview/revision/10</link><pubDate>Fri, 10 May 2024 19:43:36 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:3439a4de-33fa-4d77-939f-3e5561eecd3a</guid><dc:creator>matt.cosenza</dc:creator><comments>https://community.appian.com/success/w/article/3392/machine-learning-overview#comments</comments><description>Revision 10 posted to Article by matt.cosenza on 5/10/2024 7:43:36 PM&lt;br /&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. Refer to the &lt;/span&gt;&lt;a href="/success/w/guide/3407/integrating-with-amazon-machine-learning"&gt;&lt;span style="font-weight:400;"&gt;article&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; for Amazon machine learning integrations that have been written about in detail in Appian&amp;#39;s documentation.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk0"&gt;&lt;b&gt;What is Machine Learning?&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning is a subset of AI that focuses on the development of algorithms and models that enable computers to learn and make decisions based on data. The models can be thought of as black boxes that are created by processing many observations both supervised and semi-supervised. These machine learning models are then able to take in one or many observations without a known outcome and produce possible outcomes based on their probabilities.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;There are many different use cases and applications for machine learning. This article mainly focuses on machine learning technology that analyzes structured data, such as rows of an Excel spreadsheet or an Appian CDT, and delivers a prediction for a specific field or column in the data. The feature, value, or attribute that is being predicted for is often referred to as the &lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;target&lt;/span&gt;&lt;/i&gt;&lt;span style="font-weight:400;"&gt;. Within the context of Appian, we&amp;rsquo;ll dive into the practical implementation of AI features that integrate with applications.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as &lt;/span&gt;&lt;a href="https://www.ibm.com/watson/services/natural-language-understanding/?cm_mmc=Search_Google-_-Watson+AI_Watson+Core+-+Platform-_-WW_NA-_-watson%20sentiment%20analysis_e&amp;amp;cm_mmca1=000036IU&amp;amp;cm_mmca2=10010583&amp;amp;cm_mmca7=9007770&amp;amp;cm_mmca8=kwd-309710612366&amp;amp;cm_mmca9=_k_EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE_k_&amp;amp;cm_mmca10=338427417319&amp;amp;cm_mmca11=e&amp;amp;gclid=EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE"&gt;&lt;span style="font-weight:400;"&gt;IBM&amp;#39;s Watson&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; and &lt;/span&gt;&lt;a href="https://cloud.google.com/document-ai"&gt;&lt;span style="font-weight:400;"&gt;Google AI&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; respectively.&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk1"&gt;&lt;b&gt;Appian ML/AI Capabilities&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://docs.appian.com/suite/help/latest/ai-skill-object.html"&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; facilitate the integration of machine learning and AI capabilities into your application. This is done using a variety of low-code design objects, functions and smart services. Features available within Appian AI Skills include document and email classification with custom-built models, and document extraction with pre-trained models.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Classification models can be custom-built, including being trained and tested using data that will accurately reflect your use case. The &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/create-skill-doc-extraction.html"&gt;&lt;span style="font-weight:400;"&gt;Document Extraction&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; AI skill identifies data from PDF documents, extracting and saving data into key-value pairs that can be used within the application or saved within a database.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills offer pre-trained models that use built-in documentation extraction capabilities.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;&amp;nbsp;Pre-trained models in Appian are designed for general use cases and are used in documents that have similar information and labeled values (e.g. structured or semi-structured documents). Incorporating Google AI functionalities into your Appian application enables the integration of various features, including but not limited to natural language processing, translation services, cloud-based storage, and mor&lt;/span&gt;&lt;span style="font-weight:400;"&gt;e&lt;/span&gt;&lt;span style="font-weight:400;"&gt;. See &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/using-google.html"&gt;&lt;span style="font-weight:400;"&gt;Using Google AI Services&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; for a full list of features available.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;Note that starting from January 23, 2024, Appian is no longer selling Appian-provisioned Google credentials to customers. Customers have to purchase the license directly through Google and add their Google credentials to their Appian Admin console&lt;/span&gt;&lt;/i&gt;&lt;span style="font-weight:400;"&gt;.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/appian-ai-copilot.html"&gt;&lt;span style="font-weight:400;"&gt;AI Copilot&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; is a starting point to further AI capabilities using Appian. AI Copilot utilizes generative AI to create functional interfaces by generating an initial interface from the fields in your form through a simple pdf upload. AI Copilot is integrated with Azure OpenAI to enable this functionally in your application. Azure OpenAI leverages generative AI models (e.g. gpt-3, codex, dall-e, chatgpt) to provide writing assistance, content generation, etc. You can use AI Copilot to build interfaces directly from a pdf, resulting in a personalized product that can be further customized according to your specific requirements once the initial interface is generated.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Machine learning, particularly deep learning, is one of the fundamental components of generative AI. Similarly to other machine learning models, generative AI models undergo training with large amounts of data that aids in identifying inherent patterns. The generative AI model is fine-tuned and enhanced with the introduction of more data over time. Leveraging AI with Appian allows you to automate repetitive tasks and simplify processes, streamlining development and increasing efficiency and productivity.&lt;/span&gt;&lt;/p&gt;
&lt;h2 id="mcetoc_1hthvnjgk2"&gt;&lt;b&gt;Common Model Types&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;There are two major categories of model types that are used for making machine learning predictions on structured data:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Regression&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: predicts a numeric value.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:&lt;/span&gt;
&lt;ol&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Binary classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: the model has only two prediction values to choose from (ex. true and false).&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Multiclassification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: the model has more than two prediction values to choose from.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Which type you utilize is dependent on the target attribute you want to predict for and your overall objective in creating the model. Read the sections below to learn more about the purpose of each model type and see examples describing appropriate uses of each one.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Regression&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Regression models make predictions along a continuous range of numerical values. They have many important use cases (examples below), but can&amp;#39;t be used in cases where binary, categorical, or non-numeric values are required without additional processing to the model&amp;rsquo;s output.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). The RMSE represents the standard deviation between predicted and actual values; thus a good RMSE is&amp;nbsp; relative to the range of values&amp;nbsp; you are trying to predict. A perfect model would have a RMSE of 0.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Regression models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The sale price of a home, given information about the home&amp;#39;s size, number of bedrooms, zip code, etc.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The appropriate salary for a job posting, given information about that job&amp;#39;s difficulty and expected characteristics of qualified candidates.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The number of viewers who will watch the premiere of a new TV series, given information about the show&amp;#39;s genre and cast.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Binary Classification&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Binary classification models predict for a value that has only two possible outcomes.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5, then the predicted value will be true. However, machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to evaluate performance of a binary classification model is &lt;/span&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;&lt;span style="font-weight:400;"&gt;Area Under the Curve (AUC)&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt;. The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Binary classification models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a loan application should be approved or rejected, given credit details about the applicant.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether someone will sign up for a service, given their demographics.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account&amp;#39;s typical usage patterns.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Multiclassification&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Multiclass models predict for a categorical value from a list of three or more discrete, finite possibilities.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model&amp;#39;s best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Since the target attribute&amp;#39;s possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to determine the accuracy of a multiclass model is called an &lt;/span&gt;&lt;a href="https://www.v7labs.com/blog/f1-score-guide"&gt;&lt;span style="font-weight:400;"&gt;F1 score&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt;. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;To make predictions from a group of possibilities that is larger than a machine learning tool&amp;#39;s limit, consider using a series of different models. For example, to classify animal species from an image, better results can be achieved by first training the model for a more general classification (e.g. feline, canine, rodent). Additional models can be trained to identify specific species.&amp;nbsp;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Multiclass classification models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Which category of car&amp;mdash;sedan, truck or SUV&amp;mdash;someone is likely to purchase, given their demographics.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A book&amp;#39;s genre, given information about the book&amp;#39;s author, length, characters, storyline, etc.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills Use Case: &lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;Email Classification&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;The client receives thousands of emails everyday for customer support. Employees manually forward these emails to appropriate departments and locations based on a review of the email description and the customer&amp;#39;s location. This process is time consuming and prone to human error. The client can automate this process using the &lt;/span&gt;&lt;b&gt;Email Classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt; AI Skill that combines machine learning and automation. For the new model to be effective, the client must upload a &amp;quot;training set&amp;quot; consisting of a diverse set of emails which includes multiple examples for all desired email routing options.Once the model is trained and tested, the client can publish the model to make it available for use through the &lt;/span&gt;&lt;b&gt;Classify Emails&lt;/b&gt;&lt;span style="font-weight:400;"&gt; smart service&lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Model Types Summary&lt;/b&gt;&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Model&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Prediction Type&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Common Performance Metrics&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Example&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Regression&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts a numeric value&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://agrimetsoft.com/data-tool.aspx#rmse"&gt;&lt;span style="font-weight:400;"&gt;Root Mean Square Error (RMSE)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://agrimetsoft.com/data-tool.aspx#mae"&gt;&lt;span style="font-weight:400;"&gt;Mean Absolute Error (MSE)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting a home&amp;#39;s sale price&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Binary Classification&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts binary values (ex. true or false)&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;&lt;span style="font-weight:400;"&gt;Area Under the Curve (AUC)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting whether a job candidate should be offered employment&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Multiclass Classification&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts values that belong to a limited, predefined set of permissible values&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/F1_score"&gt;&lt;span style="font-weight:400;"&gt;F1 Score&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Loss_functions_for_classification#Logistic_loss"&gt;&lt;span style="font-weight:400;"&gt;Log Loss&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting a book&amp;#39;s genre&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="mcetoc_1hthvnjgk3"&gt;&lt;b&gt;Training Data&amp;nbsp;&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values of input data and the target attribute. The model ultimately applies the associations and patterns it found in the training data to make predictions for novel input data. There is a common adage that a model is &amp;ldquo;only as good as its training data&amp;#39;&amp;#39;. If the training data is not a representative sample of the data against which it will be making predictions, the model&amp;rsquo;s performance will suffer.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car. In this use case, the column marked &amp;quot;Sale Price&amp;quot; would be identified to the model as the target attribute to predict for.&lt;/span&gt;&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Year&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Make&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Model&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Color&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Transmission&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Mileage&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Previous Owners&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Sale Price&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1997&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Ford&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Mustang&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Silver&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;201,298&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;3&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1,499&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2013&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Mazda&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;3&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Black&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;60,588&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;8,100&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2005&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Honda&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Element&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Red&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;160,378&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;4,760&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2009&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Toyota&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Camry&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Blue&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Manual&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;87,380&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;7,290&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight:400;"&gt;The details about how data should be ordered, formatted and uploaded to a machine learning tool for training vary depending on the specific tool being used, so refer to your tool&amp;#39;s documentation for specific information about appropriately presenting data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Best Practices and Tips for Training Data&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be. This is applicable when a diverse and balanced (e.g. data between class A and B are split equally) set of training data is provided to avoid bias when making predictions. Google has &lt;/span&gt;&lt;a href="https://cloud.google.com/inclusive-ml/#fairness-in-ml-automl"&gt;&lt;span style="font-weight:400;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; and a &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=59bMh59JQDo"&gt;&lt;span style="font-weight:400;"&gt;video&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; regarding bias and machine learning that is helpful for learning more about this topic.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;To the greatest extent possible, provide training data that resembles the data you expect to see in production.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool&amp;#39;s documentation for more details.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Some tools allow you modify the weight given to specific columns during training, or specify a &amp;quot;time&amp;quot; column if training data values are influenced by time. Read your tool&amp;#39;s documentation for more details.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;See Also&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Websites:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://cloud.google.com/automl-tables/docs/data-best-practices"&gt;&lt;span style="font-weight:400;"&gt;Best Practices for Creating Training Data&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Videos:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://www.youtube.com/watch?v=f_uwKZIAeM0"&gt;&lt;span style="font-weight:400;"&gt;What is Machine Learning? (2 mins)&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://www.youtube.com/watch?v=2ePf9rue1Ao"&gt;&lt;span style="font-weight:400;"&gt;What is Artificial Intelligence? (5 mins)&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div id="gtx-trans" style="left:611px;position:absolute;top:1695.23px;"&gt;
&lt;div class="gtx-trans-icon"&gt;&lt;/div&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;

&lt;div style="font-size: 90%;"&gt;Tags: Platform, Machine Learning, Architecture&lt;/div&gt;
</description></item><item><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview/revision/9</link><pubDate>Fri, 10 May 2024 19:28:57 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:3439a4de-33fa-4d77-939f-3e5561eecd3a</guid><dc:creator>matt.cosenza</dc:creator><comments>https://community.appian.com/success/w/article/3392/machine-learning-overview#comments</comments><description>Revision 9 posted to Article by matt.cosenza on 5/10/2024 7:28:57 PM&lt;br /&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. Refer to the &lt;/span&gt;&lt;a href="/w/the-appian-playbook/998/integrating-with-amazon-machine-learning"&gt;&lt;span style="font-weight:400;"&gt;article&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; for Amazon machine learning integrations that have been written about in detail in Appian&amp;#39;s documentation.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;b&gt;What is Machine Learning?&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning is a subset of AI that focuses on the development of algorithms and models that enable computers to learn and make decisions based on data. The models can be thought of as black boxes that are created by processing many observations both supervised and semi-supervised. These machine learning models are then able to take in one or many observations without a known outcome and produce possible outcomes based on their probabilities.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;There are many different use cases and applications for machine learning. This article mainly focuses on machine learning technology that analyzes structured data, such as rows of an Excel spreadsheet or an Appian CDT, and delivers a prediction for a specific field or column in the data. The feature, value, or attribute that is being predicted for is often referred to as the &lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;target&lt;/span&gt;&lt;/i&gt;&lt;span style="font-weight:400;"&gt;. Within the context of Appian, we&amp;rsquo;ll dive into the practical implementation of AI features that integrate with applications.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as &lt;/span&gt;&lt;a href="https://www.ibm.com/watson/services/natural-language-understanding/?cm_mmc=Search_Google-_-Watson+AI_Watson+Core+-+Platform-_-WW_NA-_-watson%20sentiment%20analysis_e&amp;amp;cm_mmca1=000036IU&amp;amp;cm_mmca2=10010583&amp;amp;cm_mmca7=9007770&amp;amp;cm_mmca8=kwd-309710612366&amp;amp;cm_mmca9=_k_EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE_k_&amp;amp;cm_mmca10=338427417319&amp;amp;cm_mmca11=e&amp;amp;gclid=EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE"&gt;&lt;span style="font-weight:400;"&gt;IBM&amp;#39;s Watson&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; and &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/using-google.html"&gt;&lt;span style="font-weight:400;"&gt;Google AI&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; respectively.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;b&gt;Appian ML/AI Capabilities&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://docs.appian.com/suite/help/latest/ai-skill-object.html"&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; facilitate the integration of machine learning and AI capabilities into your application. This is done using a variety of low-code design objects, functions and smart services. Features available within Appian AI Skills include document and email classification with custom-built models, and document extraction with pre-trained models.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Classification models can be custom-built, including being trained and tested using data that will accurately reflect your use case. The &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/23.4/create-skill-doc-extraction.html"&gt;&lt;span style="font-weight:400;"&gt;Document Extraction&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; AI skill identifies data from PDF documents, extracting and saving data into key-value pairs that can be used within the application or saved within a database.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills offer pre-trained models that use built-in documentation extraction capabilities.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;&amp;nbsp;Pre-trained models in Appian are designed for general use cases and are used in documents that have similar information and labeled values (e.g. structured or semi-structured documents). Incorporating Google AI functionalities into your Appian application enables the integration of various features, including but not limited to natural language processing, translation services, cloud-based storage, and mor&lt;/span&gt;&lt;span style="font-weight:400;"&gt;e&lt;/span&gt;&lt;span style="font-weight:400;"&gt;. See &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/using-google.html"&gt;&lt;span style="font-weight:400;"&gt;Using Google AI Services&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; for a full list of features available.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;Note that starting from January 23, 2024, Appian is no longer selling Appian-provisioned Google credentials to customers. Customers have to purchase the license directly through Google and add their Google credentials to their Appian Admin console&lt;/span&gt;&lt;/i&gt;&lt;span style="font-weight:400;"&gt;.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian &lt;/span&gt;&lt;a href="https://docs.appian.com/suite/help/latest/appian-ai-copilot.html"&gt;&lt;span style="font-weight:400;"&gt;AI Copilot&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; is a starting point to further AI capabilities using Appian. AI Copilot utilizes generative AI to create functional interfaces by generating an initial interface from the fields in your form through a simple pdf upload. AI Copilot is integrated with Azure OpenAI to enable this functionally in your application. Azure OpenAI leverages generative AI models (e.g. gpt-3, codex, dall-e, chatgpt) to provide writing assistance, content generation, etc. You can use AI Copilot to build interfaces directly from a pdf, resulting in a personalized product that can be further customized according to your specific requirements once the initial interface is generated.&amp;nbsp;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Machine learning, particularly deep learning, is one of the fundamental components of generative AI. Similarly to other machine learning models, generative AI models undergo training with large amounts of data that aids in identifying inherent patterns. The generative AI model is fine-tuned and enhanced with the introduction of more data over time. Leveraging AI with Appian allows you to automate repetitive tasks and simplify processes, streamlining development and increasing efficiency and productivity.&lt;/span&gt;&lt;/p&gt;
&lt;h2&gt;&lt;b&gt;Common Model Types&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;There are two major categories of model types that are used for making machine learning predictions on structured data:&lt;/span&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Regression&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: predicts a numeric value.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:&lt;/span&gt;&lt;ol&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Binary classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: the model has only two prediction values to choose from (ex. true and false).&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;b&gt;Multiclassification&lt;/b&gt;&lt;span style="font-weight:400;"&gt;: the model has more than two prediction values to choose from.&lt;/span&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Which type you utilize is dependent on the target attribute you want to predict for and your overall objective in creating the model. Read the sections below to learn more about the purpose of each model type and see examples describing appropriate uses of each one.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Regression&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Regression models make predictions along a continuous range of numerical values. They have many important use cases (examples below), but can&amp;#39;t be used in cases where binary, categorical, or non-numeric values are required without additional processing to the model&amp;rsquo;s output.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). The RMSE represents the standard deviation between predicted and actual values; thus a good RMSE is&amp;nbsp; relative to the range of values&amp;nbsp; you are trying to predict. A perfect model would have a RMSE of 0.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Regression models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The sale price of a home, given information about the home&amp;#39;s size, number of bedrooms, zip code, etc.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The appropriate salary for a job posting, given information about that job&amp;#39;s difficulty and expected characteristics of qualified candidates.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The number of viewers who will watch the premiere of a new TV series, given information about the show&amp;#39;s genre and cast.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Binary Classification&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Binary classification models predict for a value that has only two possible outcomes.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5, then the predicted value will be true. However, machine learning tools typically allow you to adjust the score threshold to alter the number of true and false values depending on your use case.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to evaluate performance of a binary classification model is &lt;/span&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;&lt;span style="font-weight:400;"&gt;Area Under the Curve (AUC)&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt;. The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Binary classification models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a loan application should be approved or rejected, given credit details about the applicant.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether someone will sign up for a service, given their demographics.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account&amp;#39;s typical usage patterns.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;b&gt;Multiclassification&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Multiclass models predict for a categorical value from a list of three or more discrete, finite possibilities.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model&amp;#39;s best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Since the target attribute&amp;#39;s possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The main metric used to determine the accuracy of a multiclass model is called an &lt;/span&gt;&lt;a href="https://www.v7labs.com/blog/f1-score-guide"&gt;&lt;span style="font-weight:400;"&gt;F1 score&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt;. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values can be difficult to train and have a higher likelihood of failure and poor model performance.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;To make predictions from a group of possibilities that is larger than a machine learning tool&amp;#39;s limit, consider using a series of different models. For example, to classify animal species from an image, better results can be achieved by first training the model for a more general classification (e.g. feline, canine, rodent). Additional models can be trained to identify specific species.&amp;nbsp;&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Multiclass classification models can be used to predict:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Which category of car&amp;mdash;sedan, truck or SUV&amp;mdash;someone is likely to purchase, given their demographics.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;A book&amp;#39;s genre, given information about the book&amp;#39;s author, length, characters, storyline, etc.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Appian AI Skills Use Case: &lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;Email Classification&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;The client receives thousands of emails everyday for customer support. Employees manually forward these emails to appropriate departments and locations based on a review of the email description and the customer&amp;#39;s location. This process is time consuming and prone to human error. The client can automate this process using the &lt;/span&gt;&lt;b&gt;Email Classification&lt;/b&gt;&lt;span style="font-weight:400;"&gt; AI Skill that combines machine learning and automation. For the new model to be effective, the client must upload a &amp;quot;training set&amp;quot; consisting of a diverse set of emails which includes multiple examples for all desired email routing options.Once the model is trained and tested, the client can publish the model to make it available for use through the &lt;/span&gt;&lt;b&gt;Classify Emails&lt;/b&gt;&lt;span style="font-weight:400;"&gt; smart service&lt;/span&gt;&lt;i&gt;&lt;span style="font-weight:400;"&gt;.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;/i&gt;&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Model Types Summary&lt;/b&gt;&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Model&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Prediction Type&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Common Performance Metrics&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Example&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Regression&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts a numeric value&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://towardsdatascience.com/how-to-select-the-right-evaluation-metric-for-machine-learning-models-part-1-regrression-metrics-3606e25beae0"&gt;&lt;span style="font-weight:400;"&gt;Root Mean Square Error (RMSE)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://towardsdatascience.com/how-to-select-the-right-evaluation-metric-for-machine-learning-models-part-1-regrression-metrics-3606e25beae0"&gt;&lt;span style="font-weight:400;"&gt;Mean Absolute Error (MSE)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting a home&amp;#39;s sale price&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Binary Classification&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts binary values (ex. true or false)&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;&lt;span style="font-weight:400;"&gt;Area Under the Curve (AUC)&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting whether a job candidate should be offered employment&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Multiclass Classification&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicts values that belong to a limited, predefined set of permissible values&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/F1_score"&gt;&lt;span style="font-weight:400;"&gt;F1 Score&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://wiki.fast.ai/index.php/Log_Loss"&gt;&lt;span style="font-weight:400;"&gt;Log Loss&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Predicting a book&amp;#39;s genre&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2&gt;&lt;b&gt;Training Data&amp;nbsp;&lt;/b&gt;&lt;/h2&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values of input data and the target attribute. The model ultimately applies the associations and patterns it found in the training data to make predictions for novel input data. There is a common adage that a model is &amp;ldquo;only as good as its training data&amp;#39;&amp;#39;. If the training data is not a representative sample of the data against which it will be making predictions, the model&amp;rsquo;s performance will suffer.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car. In this use case, the column marked &amp;quot;Sale Price&amp;quot; would be identified to the model as the target attribute to predict for.&lt;/span&gt;&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Year&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Make&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Model&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Color&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Transmission&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Mileage&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Previous Owners&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;b&gt;Sale Price&lt;/b&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1997&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Ford&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Mustang&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Silver&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;201,298&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;3&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1,499&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2013&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Mazda&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;3&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Black&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;60,588&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;8,100&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2005&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Honda&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Element&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Red&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Automatic&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;160,378&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;4,760&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;2009&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Toyota&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Camry&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Blue&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Manual&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;87,380&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;1&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;7,290&lt;/span&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight:400;"&gt;The details about how data should be ordered, formatted and uploaded to a machine learning tool for training vary depending on the specific tool being used, so refer to your tool&amp;#39;s documentation for specific information about appropriately presenting data.&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;Best Practices and Tips for Training Data&lt;/b&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be. This is applicable when a diverse and balanced (e.g. data between class A and B are splitted equally) set of training data is provided to avoid bias when making predictions. Google has &lt;/span&gt;&lt;a href="https://cloud.google.com/inclusive-ml/#fairness-in-ml-automl"&gt;&lt;span style="font-weight:400;"&gt;documentation&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; and a &lt;/span&gt;&lt;a href="https://www.youtube.com/watch?v=59bMh59JQDo"&gt;&lt;span style="font-weight:400;"&gt;video&lt;/span&gt;&lt;/a&gt;&lt;span style="font-weight:400;"&gt; regarding bias and machine learning that is helpful for learning more about this topic.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;To the greatest extent possible, provide training data that resembles the data you expect to see in production.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data. Read your tool&amp;#39;s documentation for more details.&lt;/span&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;span style="font-weight:400;"&gt;Some tools allow you modify the weight given to specific columns during training, or specify a &amp;quot;time&amp;quot; column if training data values are influenced by time. Read your tool&amp;#39;s documentation for more details.&lt;/span&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;See Also&lt;/b&gt;&lt;/p&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Websites:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://cloud.google.com/automl-tables/docs/data-best-practices"&gt;&lt;span style="font-weight:400;"&gt;Best Practices for Creating Training Data&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;span style="font-weight:400;"&gt;Videos:&lt;/span&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://www.youtube.com/watch?v=f_uwKZIAeM0"&gt;&lt;span style="font-weight:400;"&gt;What is Machine Learning? (2 mins)&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li style="font-weight:400;"&gt;&lt;a href="https://www.youtube.com/watch?v=2ePf9rue1Ao"&gt;&lt;span style="font-weight:400;"&gt;What is Artificial Intelligence? (5 mins)&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;

&lt;div style="font-size: 90%;"&gt;Tags: Platform, Machine Learning, Architecture&lt;/div&gt;
</description></item><item><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview/revision/8</link><pubDate>Tue, 23 Apr 2024 13:08:37 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:3439a4de-33fa-4d77-939f-3e5561eecd3a</guid><dc:creator>Appian Max Team</dc:creator><comments>https://community.appian.com/success/w/article/3392/machine-learning-overview#comments</comments><description>Revision 8 posted to Article by Appian Max Team on 4/23/2024 1:08:37 PM&lt;br /&gt;
&lt;div&gt;
&lt;p&gt;Appian is integration agnostic and&amp;nbsp;has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. For a list of the machine learning integrations that have been written about in detail in Appian&amp;#39;s documentation, refer to the articles below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="/w/guide/3407/integrating-with-amazon-machine-learning"&gt;Integrating with Amazon Machine Learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/w/guide/3252/integrating-with-google-automl-tables"&gt;Integrating with Google AutoML Tables&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what_is_machine_learning?"&gt;What is Machine Learning?&lt;/h2&gt;
&lt;p&gt;Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning models can be thought of as black boxes that are created by processing many observations with known outcomes. These models are then able to take in one or many observations without a known outcome and produce possible outcomes and their probabilities.&lt;/p&gt;
&lt;p&gt;There are many different&amp;nbsp;uses and applications for machine learning, but this article currently focuses on machine learning technology that analyzes structured data&amp;mdash;such as rows of an Excel spreadsheet or an Appian CDT&amp;mdash;and delivers a prediction for a specific field or column in the data. This feature, value or attribute that is being predicted for is often referred to as the&amp;nbsp;&lt;em&gt;target&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as &lt;a href="https://www.ibm.com/watson/services/natural-language-understanding/?cm_mmc=Search_Google-_-Watson+AI_Watson+Core+-+Platform-_-WW_NA-_-watson%20sentiment%20analysis_e&amp;amp;cm_mmca1=000036IU&amp;amp;cm_mmca2=10010583&amp;amp;cm_mmca7=9007770&amp;amp;cm_mmca8=kwd-309710612366&amp;amp;cm_mmca9=_k_EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE_k_&amp;amp;cm_mmca10=338427417319&amp;amp;cm_mmca11=e&amp;amp;gclid=EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE"&gt;IBM&amp;#39;s Watson&lt;/a&gt; and &lt;a href="https://cloud.google.com/vision/"&gt;Google&amp;#39;s AutoML Vision&lt;/a&gt;, respectively.&lt;/p&gt;
&lt;h2 id="common_model_types"&gt;Common Model Types&lt;/h2&gt;
&lt;p&gt;There are two major categories of model types that are used for making machine learning predictions on structured data:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Regression&lt;/strong&gt;: predicts a numeric value.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Classification&lt;/strong&gt;: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Binary classification&lt;/strong&gt;: the model has only two prediction values to choose from (ex. true and false).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multiclassification&lt;/strong&gt;:&amp;nbsp; the model has more than two prediction values to choose from.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Which type you utilize is dependent on the target attribute you want to predict for and your overall objective in creating the model. Read the sections below to learn more about the purpose of each model type and see examples describing appropriate uses of each one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Regression&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a regression model when you want to predict for a numerical value that is not constrained to a finite or particular list of values.&lt;/li&gt;
&lt;li&gt;The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). A perfect model would have a RMSE of 0. The RMSE&amp;nbsp;represents the&amp;nbsp;standard deviation between predicted and actual values; thus good values are relative to your value ranges you are trying to predict.&lt;/li&gt;
&lt;li&gt;When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Regression models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The sale price of a home, given information about the home&amp;#39;s size, number of bedrooms, zip code, etc.&lt;/li&gt;
&lt;li&gt;The appropriate salary for a job posting, given information about that job&amp;#39;s difficulty and expected characteristics of qualified candidates.&lt;/li&gt;
&lt;li&gt;The number of viewers who will watch the premiere of a new TV series, given information about the show&amp;#39;s genre and cast.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Binary Classification&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a binary classification model when you want to predict for a value that has only two possible outcomes.&lt;/li&gt;
&lt;li&gt;A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5 than the predicted value will be true, but machine learning tools typically allow you to adjust the score threshold to alter the number of&amp;nbsp;true and false&amp;nbsp;values depending on your use case.&lt;/li&gt;
&lt;li&gt;The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Binary classification models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.&lt;/li&gt;
&lt;li&gt;Whether a loan application should be approved or rejected, given credit details about the applicant.&lt;/li&gt;
&lt;li&gt;Whether someone will sign up for a service, given their demographics.&lt;/li&gt;
&lt;li&gt;Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account&amp;#39;s typical usage patterns.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Multiclassification&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a multiclass model when you want to predict for a value that can take on a single categorical value from among a list of three or more discrete, finite possibilities.&lt;/li&gt;
&lt;li&gt;A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model&amp;#39;s best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.&lt;/li&gt;
&lt;li&gt;Since the target attribute&amp;#39;s possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.&lt;/li&gt;
&lt;li&gt;The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.&lt;/li&gt;
&lt;li&gt;Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values&amp;nbsp;can be&amp;nbsp;difficult to train and have a higher likelihood of failure and poor model performance.&lt;/li&gt;
&lt;li&gt;To use machine learning to make predictions from a group&amp;nbsp;of possibilities that is larger a&amp;nbsp;tool&amp;#39;s limit,&amp;nbsp;consider using series of different models. For example, imagine a car dealership that sells 75 different minivans, 50 different convertibles, and 100 different sedans. You may not be able to create one model to predict one of the 225 cars, but you could create a model to predict which type of car the customer is likely to buy (minivan, convertible, or sedan) and then one model for each type of car to predict the particular minivan, convertible, or sedan.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Multiclass classification models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Which category of car&amp;mdash;sedan, truck or SUV&amp;mdash;someone is likely to purchase, given their demographics.&lt;/li&gt;
&lt;li&gt;A book&amp;#39;s genre, given information about the book&amp;#39;s author, length, characters, storyline, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Model Types Summary&lt;/strong&gt;&lt;/p&gt;
&lt;table width="917" height="135"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Prediction Type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Common Performance Metrics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regression&lt;/td&gt;
&lt;td&gt;Predicts a numeric value&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;Root Mean Square Error (RMSE)&lt;/p&gt;
&lt;p&gt;Mean Absolute Error (MSE)&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;Predicting a home&amp;#39;s sale price&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary Classification&lt;/td&gt;
&lt;td&gt;Predicts binary values (ex. true or false)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;Area Under the Curve (AUC)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Predicting whether a job candidate should be offered employment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiclass Classification&lt;/td&gt;
&lt;td&gt;Predicts values that belong to a limited, predefined set of permissible values&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/F1_score"&gt;F1 Score&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Log Loss&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;Predicting a book&amp;#39;s genre&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="training_data&amp;nbsp;"&gt;Training Data&amp;nbsp;&lt;/h2&gt;
&lt;p&gt;To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values and the target attribute. This training data is the means by which the model understands and recognizes patterns about the data for which you ask it to make predictions. Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car.&amp;nbsp;In this use case, the column marked &amp;quot;Sale Price&amp;quot; would be identified to the model as the target attribute to predict for.&lt;/p&gt;
&lt;table width="579" height="128"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Year&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Make&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Color&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Transmission&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Mileage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Previous Owners&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Sale Price&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1997&lt;/td&gt;
&lt;td&gt;Ford&lt;/td&gt;
&lt;td&gt;Mustang&lt;/td&gt;
&lt;td&gt;Silver&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;201,298&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1,499&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2013&lt;/td&gt;
&lt;td&gt;Mazda&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Black&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;60,588&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;8,100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2005&lt;/td&gt;
&lt;td&gt;Honda&lt;/td&gt;
&lt;td&gt;Element&lt;/td&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;160,378&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4,760&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2009&lt;/td&gt;
&lt;td&gt;Toyota&lt;/td&gt;
&lt;td&gt;Camry&lt;/td&gt;
&lt;td&gt;Blue&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;87,380&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;7,290&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The details about how data should be ordered, formatted and uploaded to a machine learning tool for training vary depending on the specific tool being used, so refer to your tool&amp;#39;s documentation for specific information about appropriately presenting data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Best Practices and Tips for Training Data&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be.&lt;/li&gt;
&lt;li&gt;To the greatest extent possible,&amp;nbsp;provide training data that resembles the data you expect to see in production.&lt;/li&gt;
&lt;li&gt;Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data.&amp;nbsp;Read your tool&amp;#39;s documentation for more details.&lt;/li&gt;
&lt;li&gt;Some tools allow you modify the weight given to specific columns during training, or specify a &amp;quot;time&amp;quot; column if training data values are influenced by time. Read your tool&amp;#39;s documentation for more details.&lt;/li&gt;
&lt;li&gt;Models trained with skewed or unrepresentative data can result in unwanted bias when making predictions. Google has &lt;a href="https://cloud.google.com/inclusive-ml/#fairness-in-ml-automl"&gt;documentation&lt;/a&gt;&amp;nbsp;and a &lt;a href="https://www.youtube.com/watch?v=59bMh59JQDo"&gt;video&lt;/a&gt; regarding bias and machine learning that is helpful for learning more about this topic.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;See Also&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Websites:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/automl-tables/docs/data-best-practices"&gt;Best Practices for Creating Training Data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Videos:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=f_uwKZIAeM0"&gt;What is Machine Learning? (2 mins)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=2ePf9rue1Ao"&gt;What is Artificial Intelligence? (5 mins)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;

&lt;div style="font-size: 90%;"&gt;Tags: Platform, Machine Learning, Architecture&lt;/div&gt;
</description></item><item><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview/revision/7</link><pubDate>Thu, 22 Feb 2024 18:39:22 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:3439a4de-33fa-4d77-939f-3e5561eecd3a</guid><dc:creator>Appian Max Team</dc:creator><comments>https://community.appian.com/success/w/article/3392/machine-learning-overview#comments</comments><description>Revision 7 posted to Article by Appian Max Team on 2/22/2024 6:39:22 PM&lt;br /&gt;
&lt;div&gt;
&lt;p&gt;Appian is integration agnostic and&amp;nbsp;has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. For a list of the machine learning integrations that have been written about in detail in Appian&amp;#39;s documentation, refer to the articles below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="/w/guide/3407/integrating-with-amazon-machine-learning"&gt;Integrating with Amazon Machine Learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/w/guide/3252/integrating-with-google-automl-tables"&gt;Integrating with Google AutoML Tables&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what_is_machine_learning?"&gt;What is Machine Learning?&lt;/h2&gt;
&lt;p&gt;Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning models can be thought of as black boxes that are created by processing many observations with known outcomes. These models are then able to take in one or many observations without a known outcome and produce possible outcomes and their probabilities.&lt;/p&gt;
&lt;p&gt;There are many different&amp;nbsp;uses and applications for machine learning, but this article currently focuses on machine learning technology that analyzes structured data&amp;mdash;such as rows of an Excel spreadsheet or an Appian CDT&amp;mdash;and delivers a prediction for a specific field or column in the data. This feature, value or attribute that is being predicted for is often referred to as the&amp;nbsp;&lt;em&gt;target&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as &lt;a href="https://www.ibm.com/watson/services/natural-language-understanding/?cm_mmc=Search_Google-_-Watson+AI_Watson+Core+-+Platform-_-WW_NA-_-watson%20sentiment%20analysis_e&amp;amp;cm_mmca1=000036IU&amp;amp;cm_mmca2=10010583&amp;amp;cm_mmca7=9007770&amp;amp;cm_mmca8=kwd-309710612366&amp;amp;cm_mmca9=_k_EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE_k_&amp;amp;cm_mmca10=338427417319&amp;amp;cm_mmca11=e&amp;amp;gclid=EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE"&gt;IBM&amp;#39;s Watson&lt;/a&gt; and &lt;a href="https://cloud.google.com/vision/"&gt;Google&amp;#39;s AutoML Vision&lt;/a&gt;, respectively.&lt;/p&gt;
&lt;h2 id="common_model_types"&gt;Common Model Types&lt;/h2&gt;
&lt;p&gt;There are two major categories of model types that are used for making machine learning predictions on structured data:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Regression&lt;/strong&gt;: predicts a numeric value.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Classification&lt;/strong&gt;: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Binary classification&lt;/strong&gt;: the model has only two prediction values to choose from (ex. true and false).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multiclassification&lt;/strong&gt;:&amp;nbsp; the model has more than two prediction values to choose from.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Which type you utilize is dependent on the target attribute you want to predict for and your overall objective in creating the model. Read the sections below to learn more about the purpose of each model type and see examples describing appropriate uses of each one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Regression&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a regression model when you want to predict for a numerical value that is not constrained to a finite or particular list of values.&lt;/li&gt;
&lt;li&gt;The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). A perfect model would have a RMSE of 0. The RMSE&amp;nbsp;represents the&amp;nbsp;standard deviation between predicted and actual values; thus good values are relative to your value ranges you are trying to predict.&lt;/li&gt;
&lt;li&gt;When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Regression models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The sale price of a home, given information about the home&amp;#39;s size, number of bedrooms, zip code, etc.&lt;/li&gt;
&lt;li&gt;The appropriate salary for a job posting, given information about that job&amp;#39;s difficulty and expected characteristics of qualified candidates.&lt;/li&gt;
&lt;li&gt;The number of viewers who will watch the premiere of a new TV series, given information about the show&amp;#39;s genre and cast.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Binary Classification&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a binary classification model when you want to predict for a value that has only two possible outcomes.&lt;/li&gt;
&lt;li&gt;A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5 than the predicted value will be true, but machine learning tools typically allow you to adjust the score threshold to alter the number of&amp;nbsp;true and false&amp;nbsp;values depending on your use case.&lt;/li&gt;
&lt;li&gt;The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Binary classification models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.&lt;/li&gt;
&lt;li&gt;Whether a loan application should be approved or rejected, given credit details about the applicant.&lt;/li&gt;
&lt;li&gt;Whether someone will sign up for a service, given their demographics.&lt;/li&gt;
&lt;li&gt;Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account&amp;#39;s typical usage patterns.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Multiclassification&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a multiclass model when you want to predict for a value that can take on a single categorical value from among a list of three or more discrete, finite possibilities.&lt;/li&gt;
&lt;li&gt;A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model&amp;#39;s best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.&lt;/li&gt;
&lt;li&gt;Since the target attribute&amp;#39;s possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.&lt;/li&gt;
&lt;li&gt;The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.&lt;/li&gt;
&lt;li&gt;Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values&amp;nbsp;can be&amp;nbsp;difficult to train and have a higher likelihood of failure and poor model performance.&lt;/li&gt;
&lt;li&gt;To use machine learning to make predictions from a group&amp;nbsp;of possibilities that is larger a&amp;nbsp;tool&amp;#39;s limit,&amp;nbsp;consider using series of different models. For example, imagine a car dealership that sells 75 different minivans, 50 different convertibles, and 100 different sedans. You may not be able to create one model to predict one of the 225 cars, but you could create a model to predict which type of car the customer is likely to buy (minivan, convertible, or sedan) and then one model for each type of car to predict the particular minivan, convertible, or sedan.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Multiclass classification models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Which category of car&amp;mdash;sedan, truck or SUV&amp;mdash;someone is likely to purchase, given their demographics.&lt;/li&gt;
&lt;li&gt;A book&amp;#39;s genre, given information about the book&amp;#39;s author, length, characters, storyline, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Model Types Summary&lt;/strong&gt;&lt;/p&gt;
&lt;table height="135" width="917"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Prediction Type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Common Performance Metrics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regression&lt;/td&gt;
&lt;td&gt;Predicts a numeric value&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;Root Mean Square Error (RMSE)&lt;/p&gt;
&lt;p&gt;Mean Absolute Error (MSE)&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;Predicting a home&amp;#39;s sale price&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary Classification&lt;/td&gt;
&lt;td&gt;Predicts binary values (ex. true or false)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;Area Under the Curve (AUC)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Predicting whether a job candidate should be offered employment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiclass Classification&lt;/td&gt;
&lt;td&gt;Predicts values that belong to a limited, predefined set of permissible values&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/F1_score"&gt;F1 Score&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Log Loss&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;Predicting a book&amp;#39;s genre&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="training_data&amp;nbsp;"&gt;Training Data&amp;nbsp;&lt;/h2&gt;
&lt;p&gt;To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values and the target attribute. This training data is the means by which the model understands and recognizes patterns about the data for which you ask it to make predictions. Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car.&amp;nbsp;In this use case, the column marked &amp;quot;Sale Price&amp;quot; would be identified to the model as the target attribute to predict for.&lt;/p&gt;
&lt;table height="128" width="579"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Year&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Make&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Color&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Transmission&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Mileage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Previous Owners&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Sale Price&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1997&lt;/td&gt;
&lt;td&gt;Ford&lt;/td&gt;
&lt;td&gt;Mustang&lt;/td&gt;
&lt;td&gt;Silver&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;201,298&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1,499&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2013&lt;/td&gt;
&lt;td&gt;Mazda&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Black&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;60,588&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;8,100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2005&lt;/td&gt;
&lt;td&gt;Honda&lt;/td&gt;
&lt;td&gt;Element&lt;/td&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;160,378&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4,760&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2009&lt;/td&gt;
&lt;td&gt;Toyota&lt;/td&gt;
&lt;td&gt;Camry&lt;/td&gt;
&lt;td&gt;Blue&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;87,380&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;7,290&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The details about how data should be ordered, formatted and uploaded to a machine learning tool for training vary depending on the specific tool being used, so refer to your tool&amp;#39;s documentation for specific information about appropriately presenting data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Best Practices and Tips for Training Data&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be.&lt;/li&gt;
&lt;li&gt;To the greatest extent possible,&amp;nbsp;provide training data that resembles the data you expect to see in production.&lt;/li&gt;
&lt;li&gt;Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data.&amp;nbsp;Read your tool&amp;#39;s documentation for more details.&lt;/li&gt;
&lt;li&gt;Some tools allow you modify the weight given to specific columns during training, or specify a &amp;quot;time&amp;quot; column if training data values are influenced by time. Read your tool&amp;#39;s documentation for more details.&lt;/li&gt;
&lt;li&gt;Models trained with skewed or unrepresentative data can result in unwanted bias when making predictions. Google has &lt;a href="https://cloud.google.com/inclusive-ml/#fairness-in-ml-automl"&gt;documentation&lt;/a&gt;&amp;nbsp;and a &lt;a href="https://www.youtube.com/watch?v=59bMh59JQDo"&gt;video&lt;/a&gt; regarding bias and machine learning that is helpful for learning more about this topic.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;See Also&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Websites:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/automl-tables/docs/data-best-practices"&gt;Best Practices for Creating Training Data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Videos:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=f_uwKZIAeM0"&gt;What is Machine Learning? (2 mins)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=2ePf9rue1Ao"&gt;What is Artificial Intelligence? (5 mins)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;

&lt;div style="font-size: 90%;"&gt;Tags: Platform, Machine Learning, Architecture&lt;/div&gt;
</description></item><item><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview/revision/6</link><pubDate>Thu, 22 Feb 2024 18:35:37 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:3439a4de-33fa-4d77-939f-3e5561eecd3a</guid><dc:creator>Appian Max Team</dc:creator><comments>https://community.appian.com/success/w/article/3392/machine-learning-overview#comments</comments><description>Revision 6 posted to Article by Appian Max Team on 2/22/2024 6:35:37 PM&lt;br /&gt;
&lt;div&gt;
&lt;p&gt;Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. For&amp;nbsp;Amazon machine learning integrations, refer to &lt;a href="/success/w/guide/3407/integrating-with-amazon-machine-learning"&gt;Integrating with Amazon Machine Learning&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="what_is_machine_learning?"&gt;What is Machine Learning?&lt;/h2&gt;
&lt;p&gt;Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning is a subset of AI that focuses on the development of algorithms and models that enable computers to learn and make decisions based on data. The models can be thought of as black boxes that are created by processing many observations both supervised and semi-supervised. These machine learning models are then able to take in one or many observations without a known outcome and produce possible outcomes based on their probabilities.&lt;/p&gt;
&lt;p&gt;There are many different use cases and applications for machine learning. This article mainly focuses on machine learning technology that analyzes structured data, such as rows of an Excel spreadsheet or an Appian CDT, and delivers a prediction for a specific field or column in the data. The feature, value, or attribute that is being predicted for is often referred to as the target. Within the context of Appian, we&amp;rsquo;ll dive into the practical implementation of AI features that integrate with applications.&lt;/p&gt;
&lt;p&gt;Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as &lt;a href="https://www.ibm.com/watson/services/natural-language-understanding/?cm_mmc=Search_Google-_-Watson+AI_Watson+Core+-+Platform-_-WW_NA-_-watson%20sentiment%20analysis_e&amp;amp;cm_mmca1=000036IU&amp;amp;cm_mmca2=10010583&amp;amp;cm_mmca7=9007770&amp;amp;cm_mmca8=kwd-309710612366&amp;amp;cm_mmca9=_k_EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE_k_&amp;amp;cm_mmca10=338427417319&amp;amp;cm_mmca11=e&amp;amp;gclid=EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE"&gt;IBM&amp;#39;s Watson&lt;/a&gt; and &lt;a href="https://cloud.google.com/vision/"&gt;Google&amp;#39;s AutoML Vision&lt;/a&gt;, respectively.&lt;/p&gt;
&lt;h2 id="common_model_types"&gt;Common Model Types&lt;/h2&gt;
&lt;p&gt;There are two major categories of model types that are used for making machine learning predictions on structured data:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Regression&lt;/strong&gt;: predicts a numeric value.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Classification&lt;/strong&gt;: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Binary classification&lt;/strong&gt;: the model has only two prediction values to choose from (ex. true and false).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multiclassification&lt;/strong&gt;:&amp;nbsp; the model has more than two prediction values to choose from.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Which type you utilize is dependent on the target attribute you want to predict for and your overall objective in creating the model. Read the sections below to learn more about the purpose of each model type and see examples describing appropriate uses of each one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Regression&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a regression model when you want to predict for a numerical value that is not constrained to a finite or particular list of values.&lt;/li&gt;
&lt;li&gt;The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). A perfect model would have a RMSE of 0. The RMSE&amp;nbsp;represents the&amp;nbsp;standard deviation between predicted and actual values; thus good values are relative to your value ranges you are trying to predict.&lt;/li&gt;
&lt;li&gt;When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Regression models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The sale price of a home, given information about the home&amp;#39;s size, number of bedrooms, zip code, etc.&lt;/li&gt;
&lt;li&gt;The appropriate salary for a job posting, given information about that job&amp;#39;s difficulty and expected characteristics of qualified candidates.&lt;/li&gt;
&lt;li&gt;The number of viewers who will watch the premiere of a new TV series, given information about the show&amp;#39;s genre and cast.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Binary Classification&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a binary classification model when you want to predict for a value that has only two possible outcomes.&lt;/li&gt;
&lt;li&gt;A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5 than the predicted value will be true, but machine learning tools typically allow you to adjust the score threshold to alter the number of&amp;nbsp;true and false&amp;nbsp;values depending on your use case.&lt;/li&gt;
&lt;li&gt;The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Binary classification models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.&lt;/li&gt;
&lt;li&gt;Whether a loan application should be approved or rejected, given credit details about the applicant.&lt;/li&gt;
&lt;li&gt;Whether someone will sign up for a service, given their demographics.&lt;/li&gt;
&lt;li&gt;Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account&amp;#39;s typical usage patterns.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Multiclassification&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a multiclass model when you want to predict for a value that can take on a single categorical value from among a list of three or more discrete, finite possibilities.&lt;/li&gt;
&lt;li&gt;A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model&amp;#39;s best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.&lt;/li&gt;
&lt;li&gt;Since the target attribute&amp;#39;s possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.&lt;/li&gt;
&lt;li&gt;The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.&lt;/li&gt;
&lt;li&gt;Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values&amp;nbsp;can be&amp;nbsp;difficult to train and have a higher likelihood of failure and poor model performance.&lt;/li&gt;
&lt;li&gt;To use machine learning to make predictions from a group&amp;nbsp;of possibilities that is larger a&amp;nbsp;tool&amp;#39;s limit,&amp;nbsp;consider using series of different models. For example, imagine a car dealership that sells 75 different minivans, 50 different convertibles, and 100 different sedans. You may not be able to create one model to predict one of the 225 cars, but you could create a model to predict which type of car the customer is likely to buy (minivan, convertible, or sedan) and then one model for each type of car to predict the particular minivan, convertible, or sedan.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Multiclass classification models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Which category of car&amp;mdash;sedan, truck or SUV&amp;mdash;someone is likely to purchase, given their demographics.&lt;/li&gt;
&lt;li&gt;A book&amp;#39;s genre, given information about the book&amp;#39;s author, length, characters, storyline, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Model Types Summary&lt;/strong&gt;&lt;/p&gt;
&lt;table width="917" height="135"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Prediction Type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Common Performance Metrics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regression&lt;/td&gt;
&lt;td&gt;Predicts a numeric value&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;Root Mean Square Error (RMSE)&lt;/p&gt;
&lt;p&gt;Mean Absolute Error (MSE)&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;Predicting a home&amp;#39;s sale price&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary Classification&lt;/td&gt;
&lt;td&gt;Predicts binary values (ex. true or false)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;Area Under the Curve (AUC)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Predicting whether a job candidate should be offered employment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiclass Classification&lt;/td&gt;
&lt;td&gt;Predicts values that belong to a limited, predefined set of permissible values&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/F1_score"&gt;F1 Score&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Log Loss&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;Predicting a book&amp;#39;s genre&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="training_data&amp;nbsp;"&gt;Training Data&amp;nbsp;&lt;/h2&gt;
&lt;p&gt;To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values and the target attribute. This training data is the means by which the model understands and recognizes patterns about the data for which you ask it to make predictions. Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car.&amp;nbsp;In this use case, the column marked &amp;quot;Sale Price&amp;quot; would be identified to the model as the target attribute to predict for.&lt;/p&gt;
&lt;table width="579" height="128"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Year&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Make&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Color&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Transmission&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Mileage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Previous Owners&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Sale Price&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1997&lt;/td&gt;
&lt;td&gt;Ford&lt;/td&gt;
&lt;td&gt;Mustang&lt;/td&gt;
&lt;td&gt;Silver&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;201,298&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1,499&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2013&lt;/td&gt;
&lt;td&gt;Mazda&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Black&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;60,588&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;8,100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2005&lt;/td&gt;
&lt;td&gt;Honda&lt;/td&gt;
&lt;td&gt;Element&lt;/td&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;160,378&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4,760&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2009&lt;/td&gt;
&lt;td&gt;Toyota&lt;/td&gt;
&lt;td&gt;Camry&lt;/td&gt;
&lt;td&gt;Blue&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;87,380&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;7,290&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The details about how data should be ordered, formatted and uploaded to a machine learning tool for training vary depending on the specific tool being used, so refer to your tool&amp;#39;s documentation for specific information about appropriately presenting data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Best Practices and Tips for Training Data&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be.&lt;/li&gt;
&lt;li&gt;To the greatest extent possible,&amp;nbsp;provide training data that resembles the data you expect to see in production.&lt;/li&gt;
&lt;li&gt;Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data.&amp;nbsp;Read your tool&amp;#39;s documentation for more details.&lt;/li&gt;
&lt;li&gt;Some tools allow you modify the weight given to specific columns during training, or specify a &amp;quot;time&amp;quot; column if training data values are influenced by time. Read your tool&amp;#39;s documentation for more details.&lt;/li&gt;
&lt;li&gt;Models trained with skewed or unrepresentative data can result in unwanted bias when making predictions. Google has &lt;a href="https://cloud.google.com/inclusive-ml/#fairness-in-ml-automl"&gt;documentation&lt;/a&gt;&amp;nbsp;and a &lt;a href="https://www.youtube.com/watch?v=59bMh59JQDo"&gt;video&lt;/a&gt; regarding bias and machine learning that is helpful for learning more about this topic.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;See Also&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Websites:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/automl-tables/docs/data-best-practices"&gt;Best Practices for Creating Training Data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Videos:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=f_uwKZIAeM0"&gt;What is Machine Learning? (2 mins)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=2ePf9rue1Ao"&gt;What is Artificial Intelligence? (5 mins)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;

&lt;div style="font-size: 90%;"&gt;Tags: Platform, Machine Learning, Architecture&lt;/div&gt;
</description></item><item><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview/revision/5</link><pubDate>Thu, 22 Feb 2024 18:35:07 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:3439a4de-33fa-4d77-939f-3e5561eecd3a</guid><dc:creator>Appian Max Team</dc:creator><comments>https://community.appian.com/success/w/article/3392/machine-learning-overview#comments</comments><description>Revision 5 posted to Article by Appian Max Team on 2/22/2024 6:35:07 PM&lt;br /&gt;
&lt;p&gt;Appian is integration agnostic and has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. For&amp;nbsp;Amazon machine learning integrations, refer to &lt;a href="/success/w/guide/3407/integrating-with-amazon-machine-learning"&gt;Integrating with Amazon Machine Learning&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="what_is_machine_learning?"&gt;What is Machine Learning?&lt;/h2&gt;
&lt;p&gt;Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning is a subset of AI that focuses on the development of algorithms and models that enable computers to learn and make decisions based on data. The models can be thought of as black boxes that are created by processing many observations both supervised and semi-supervised. These machine learning models are then able to take in one or many observations without a known outcome and produce possible outcomes based on their probabilities.&lt;/p&gt;
&lt;p&gt;There are many different use cases and applications for machine learning. This article mainly focuses on machine learning technology that analyzes structured data, such as rows of an Excel spreadsheet or an Appian CDT, and delivers a prediction for a specific field or column in the data. The feature, value, or attribute that is being predicted for is often referred to as the target. Within the context of Appian, we&amp;rsquo;ll dive into the practical implementation of AI features that integrate with applications.&lt;/p&gt;
&lt;p&gt;Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as &lt;a href="https://www.ibm.com/watson/services/natural-language-understanding/?cm_mmc=Search_Google-_-Watson+AI_Watson+Core+-+Platform-_-WW_NA-_-watson%20sentiment%20analysis_e&amp;amp;cm_mmca1=000036IU&amp;amp;cm_mmca2=10010583&amp;amp;cm_mmca7=9007770&amp;amp;cm_mmca8=kwd-309710612366&amp;amp;cm_mmca9=_k_EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE_k_&amp;amp;cm_mmca10=338427417319&amp;amp;cm_mmca11=e&amp;amp;gclid=EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE"&gt;IBM&amp;#39;s Watson&lt;/a&gt; and &lt;a href="https://cloud.google.com/vision/"&gt;Google&amp;#39;s AutoML Vision&lt;/a&gt;, respectively.&lt;/p&gt;
&lt;h2 id="common_model_types"&gt;Common Model Types&lt;/h2&gt;
&lt;p&gt;There are two major categories of model types that are used for making machine learning predictions on structured data:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Regression&lt;/strong&gt;: predicts a numeric value.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Classification&lt;/strong&gt;: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Binary classification&lt;/strong&gt;: the model has only two prediction values to choose from (ex. true and false).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multiclassification&lt;/strong&gt;:&amp;nbsp; the model has more than two prediction values to choose from.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Which type you utilize is dependent on the target attribute you want to predict for and your overall objective in creating the model. Read the sections below to learn more about the purpose of each model type and see examples describing appropriate uses of each one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Regression&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a regression model when you want to predict for a numerical value that is not constrained to a finite or particular list of values.&lt;/li&gt;
&lt;li&gt;The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). A perfect model would have a RMSE of 0. The RMSE&amp;nbsp;represents the&amp;nbsp;standard deviation between predicted and actual values; thus good values are relative to your value ranges you are trying to predict.&lt;/li&gt;
&lt;li&gt;When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Regression models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The sale price of a home, given information about the home&amp;#39;s size, number of bedrooms, zip code, etc.&lt;/li&gt;
&lt;li&gt;The appropriate salary for a job posting, given information about that job&amp;#39;s difficulty and expected characteristics of qualified candidates.&lt;/li&gt;
&lt;li&gt;The number of viewers who will watch the premiere of a new TV series, given information about the show&amp;#39;s genre and cast.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Binary Classification&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a binary classification model when you want to predict for a value that has only two possible outcomes.&lt;/li&gt;
&lt;li&gt;A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5 than the predicted value will be true, but machine learning tools typically allow you to adjust the score threshold to alter the number of&amp;nbsp;true and false&amp;nbsp;values depending on your use case.&lt;/li&gt;
&lt;li&gt;The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Binary classification models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.&lt;/li&gt;
&lt;li&gt;Whether a loan application should be approved or rejected, given credit details about the applicant.&lt;/li&gt;
&lt;li&gt;Whether someone will sign up for a service, given their demographics.&lt;/li&gt;
&lt;li&gt;Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account&amp;#39;s typical usage patterns.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Multiclassification&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a multiclass model when you want to predict for a value that can take on a single categorical value from among a list of three or more discrete, finite possibilities.&lt;/li&gt;
&lt;li&gt;A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model&amp;#39;s best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.&lt;/li&gt;
&lt;li&gt;Since the target attribute&amp;#39;s possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.&lt;/li&gt;
&lt;li&gt;The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.&lt;/li&gt;
&lt;li&gt;Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values&amp;nbsp;can be&amp;nbsp;difficult to train and have a higher likelihood of failure and poor model performance.&lt;/li&gt;
&lt;li&gt;To use machine learning to make predictions from a group&amp;nbsp;of possibilities that is larger a&amp;nbsp;tool&amp;#39;s limit,&amp;nbsp;consider using series of different models. For example, imagine a car dealership that sells 75 different minivans, 50 different convertibles, and 100 different sedans. You may not be able to create one model to predict one of the 225 cars, but you could create a model to predict which type of car the customer is likely to buy (minivan, convertible, or sedan) and then one model for each type of car to predict the particular minivan, convertible, or sedan.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Multiclass classification models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Which category of car&amp;mdash;sedan, truck or SUV&amp;mdash;someone is likely to purchase, given their demographics.&lt;/li&gt;
&lt;li&gt;A book&amp;#39;s genre, given information about the book&amp;#39;s author, length, characters, storyline, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Model Types Summary&lt;/strong&gt;&lt;/p&gt;
&lt;table width="917" height="135"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Prediction Type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Common Performance Metrics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regression&lt;/td&gt;
&lt;td&gt;Predicts a numeric value&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;Root Mean Square Error (RMSE)&lt;/p&gt;
&lt;p&gt;Mean Absolute Error (MSE)&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;Predicting a home&amp;#39;s sale price&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary Classification&lt;/td&gt;
&lt;td&gt;Predicts binary values (ex. true or false)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;Area Under the Curve (AUC)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Predicting whether a job candidate should be offered employment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiclass Classification&lt;/td&gt;
&lt;td&gt;Predicts values that belong to a limited, predefined set of permissible values&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/F1_score"&gt;F1 Score&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Log Loss&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;Predicting a book&amp;#39;s genre&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="training_data&amp;nbsp;"&gt;Training Data&amp;nbsp;&lt;/h2&gt;
&lt;p&gt;To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values and the target attribute. This training data is the means by which the model understands and recognizes patterns about the data for which you ask it to make predictions. Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car.&amp;nbsp;In this use case, the column marked &amp;quot;Sale Price&amp;quot; would be identified to the model as the target attribute to predict for.&lt;/p&gt;
&lt;table width="579" height="128"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Year&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Make&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Color&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Transmission&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Mileage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Previous Owners&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Sale Price&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1997&lt;/td&gt;
&lt;td&gt;Ford&lt;/td&gt;
&lt;td&gt;Mustang&lt;/td&gt;
&lt;td&gt;Silver&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;201,298&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1,499&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2013&lt;/td&gt;
&lt;td&gt;Mazda&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Black&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;60,588&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;8,100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2005&lt;/td&gt;
&lt;td&gt;Honda&lt;/td&gt;
&lt;td&gt;Element&lt;/td&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;160,378&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4,760&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2009&lt;/td&gt;
&lt;td&gt;Toyota&lt;/td&gt;
&lt;td&gt;Camry&lt;/td&gt;
&lt;td&gt;Blue&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;87,380&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;7,290&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The details about how data should be ordered, formatted and uploaded to a machine learning tool for training vary depending on the specific tool being used, so refer to your tool&amp;#39;s documentation for specific information about appropriately presenting data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Best Practices and Tips for Training Data&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be.&lt;/li&gt;
&lt;li&gt;To the greatest extent possible,&amp;nbsp;provide training data that resembles the data you expect to see in production.&lt;/li&gt;
&lt;li&gt;Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data.&amp;nbsp;Read your tool&amp;#39;s documentation for more details.&lt;/li&gt;
&lt;li&gt;Some tools allow you modify the weight given to specific columns during training, or specify a &amp;quot;time&amp;quot; column if training data values are influenced by time. Read your tool&amp;#39;s documentation for more details.&lt;/li&gt;
&lt;li&gt;Models trained with skewed or unrepresentative data can result in unwanted bias when making predictions. Google has &lt;a href="https://cloud.google.com/inclusive-ml/#fairness-in-ml-automl"&gt;documentation&lt;/a&gt;&amp;nbsp;and a &lt;a href="https://www.youtube.com/watch?v=59bMh59JQDo"&gt;video&lt;/a&gt; regarding bias and machine learning that is helpful for learning more about this topic.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;See Also&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Websites:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/automl-tables/docs/data-best-practices"&gt;Best Practices for Creating Training Data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Videos:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=f_uwKZIAeM0"&gt;What is Machine Learning? (2 mins)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=2ePf9rue1Ao"&gt;What is Artificial Intelligence? (5 mins)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;

&lt;div style="font-size: 90%;"&gt;Tags: Platform, Machine Learning, Architecture&lt;/div&gt;
</description></item><item><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview/revision/4</link><pubDate>Tue, 31 Oct 2023 19:44:53 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:3439a4de-33fa-4d77-939f-3e5561eecd3a</guid><dc:creator>joel.larin</dc:creator><comments>https://community.appian.com/success/w/article/3392/machine-learning-overview#comments</comments><description>Revision 4 posted to Article by joel.larin on 10/31/2023 7:44:53 PM&lt;br /&gt;
&lt;div&gt;
&lt;p&gt;Appian is integration agnostic and&amp;nbsp;has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. For a list of the machine learning integrations that have been written about in detail in Appian&amp;#39;s documentation, refer to the articles below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="/w/guide/3407/integrating-with-amazon-machine-learning"&gt;Integrating with Amazon Machine Learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/w/guide/3252/integrating-with-google-automl-tables"&gt;Integrating with Google AutoML Tables&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what_is_machine_learning?"&gt;What is Machine Learning?&lt;/h2&gt;
&lt;p&gt;Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning models can be thought of as black boxes that are created by processing many observations with known outcomes. These models are then able to take in one or many observations without a known outcome and produce possible outcomes and their probabilities.&lt;/p&gt;
&lt;p&gt;There are many different&amp;nbsp;uses and applications for machine learning, but this article currently focuses on machine learning technology that analyzes structured data&amp;mdash;such as rows of an Excel spreadsheet or an Appian CDT&amp;mdash;and delivers a prediction for a specific field or column in the data. This feature, value or attribute that is being predicted for is often referred to as the&amp;nbsp;&lt;em&gt;target&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as &lt;a href="https://www.ibm.com/watson/services/natural-language-understanding/?cm_mmc=Search_Google-_-Watson+AI_Watson+Core+-+Platform-_-WW_NA-_-watson%20sentiment%20analysis_e&amp;amp;cm_mmca1=000036IU&amp;amp;cm_mmca2=10010583&amp;amp;cm_mmca7=9007770&amp;amp;cm_mmca8=kwd-309710612366&amp;amp;cm_mmca9=_k_EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE_k_&amp;amp;cm_mmca10=338427417319&amp;amp;cm_mmca11=e&amp;amp;gclid=EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE"&gt;IBM&amp;#39;s Watson&lt;/a&gt; and &lt;a href="https://cloud.google.com/vision/"&gt;Google&amp;#39;s AutoML Vision&lt;/a&gt;, respectively.&lt;/p&gt;
&lt;h2 id="common_model_types"&gt;Common Model Types&lt;/h2&gt;
&lt;p&gt;There are two major categories of model types that are used for making machine learning predictions on structured data:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Regression&lt;/strong&gt;: predicts a numeric value.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Classification&lt;/strong&gt;: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Binary classification&lt;/strong&gt;: the model has only two prediction values to choose from (ex. true and false).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multiclassification&lt;/strong&gt;:&amp;nbsp; the model has more than two prediction values to choose from.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Which type you utilize is dependent on the target attribute you want to predict for and your overall objective in creating the model. Read the sections below to learn more about the purpose of each model type and see examples describing appropriate uses of each one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Regression&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a regression model when you want to predict for a numerical value that is not constrained to a finite or particular list of values.&lt;/li&gt;
&lt;li&gt;The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). A perfect model would have a RMSE of 0. The RMSE&amp;nbsp;represents the&amp;nbsp;standard deviation between predicted and actual values; thus good values are relative to your value ranges you are trying to predict.&lt;/li&gt;
&lt;li&gt;When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Regression models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The sale price of a home, given information about the home&amp;#39;s size, number of bedrooms, zip code, etc.&lt;/li&gt;
&lt;li&gt;The appropriate salary for a job posting, given information about that job&amp;#39;s difficulty and expected characteristics of qualified candidates.&lt;/li&gt;
&lt;li&gt;The number of viewers who will watch the premiere of a new TV series, given information about the show&amp;#39;s genre and cast.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Binary Classification&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a binary classification model when you want to predict for a value that has only two possible outcomes.&lt;/li&gt;
&lt;li&gt;A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5 than the predicted value will be true, but machine learning tools typically allow you to adjust the score threshold to alter the number of&amp;nbsp;true and false&amp;nbsp;values depending on your use case.&lt;/li&gt;
&lt;li&gt;The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Binary classification models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.&lt;/li&gt;
&lt;li&gt;Whether a loan application should be approved or rejected, given credit details about the applicant.&lt;/li&gt;
&lt;li&gt;Whether someone will sign up for a service, given their demographics.&lt;/li&gt;
&lt;li&gt;Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account&amp;#39;s typical usage patterns.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Multiclassification&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a multiclass model when you want to predict for a value that can take on a single categorical value from among a list of three or more discrete, finite possibilities.&lt;/li&gt;
&lt;li&gt;A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model&amp;#39;s best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.&lt;/li&gt;
&lt;li&gt;Since the target attribute&amp;#39;s possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.&lt;/li&gt;
&lt;li&gt;The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.&lt;/li&gt;
&lt;li&gt;Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values&amp;nbsp;can be&amp;nbsp;difficult to train and have a higher likelihood of failure and poor model performance.&lt;/li&gt;
&lt;li&gt;To use machine learning to make predictions from a group&amp;nbsp;of possibilities that is larger a&amp;nbsp;tool&amp;#39;s limit,&amp;nbsp;consider using series of different models. For example, imagine a car dealership that sells 75 different minivans, 50 different convertibles, and 100 different sedans. You may not be able to create one model to predict one of the 225 cars, but you could create a model to predict which type of car the customer is likely to buy (minivan, convertible, or sedan) and then one model for each type of car to predict the particular minivan, convertible, or sedan.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Multiclass classification models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Which category of car&amp;mdash;sedan, truck or SUV&amp;mdash;someone is likely to purchase, given their demographics.&lt;/li&gt;
&lt;li&gt;A book&amp;#39;s genre, given information about the book&amp;#39;s author, length, characters, storyline, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Model Types Summary&lt;/strong&gt;&lt;/p&gt;
&lt;table height="135" width="917"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Prediction Type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Common Performance Metrics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regression&lt;/td&gt;
&lt;td&gt;Predicts a numeric value&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;Root Mean Square Error (RMSE)&lt;/p&gt;
&lt;p&gt;Mean Absolute Error (MSE)&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;Predicting a home&amp;#39;s sale price&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary Classification&lt;/td&gt;
&lt;td&gt;Predicts binary values (ex. true or false)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;Area Under the Curve (AUC)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Predicting whether a job candidate should be offered employment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiclass Classification&lt;/td&gt;
&lt;td&gt;Predicts values that belong to a limited, predefined set of permissible values&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/F1_score"&gt;F1 Score&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Log Loss&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;Predicting a book&amp;#39;s genre&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="training_data&amp;nbsp;"&gt;Training Data&amp;nbsp;&lt;/h2&gt;
&lt;p&gt;To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values and the target attribute. This training data is the means by which the model understands and recognizes patterns about the data for which you ask it to make predictions. Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car.&amp;nbsp;In this use case, the column marked &amp;quot;Sale Price&amp;quot; would be identified to the model as the target attribute to predict for.&lt;/p&gt;
&lt;table height="128" width="579"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Year&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Make&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Color&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Transmission&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Mileage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Previous Owners&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Sale Price&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1997&lt;/td&gt;
&lt;td&gt;Ford&lt;/td&gt;
&lt;td&gt;Mustang&lt;/td&gt;
&lt;td&gt;Silver&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;201,298&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1,499&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2013&lt;/td&gt;
&lt;td&gt;Mazda&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Black&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;60,588&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;8,100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2005&lt;/td&gt;
&lt;td&gt;Honda&lt;/td&gt;
&lt;td&gt;Element&lt;/td&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;160,378&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4,760&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2009&lt;/td&gt;
&lt;td&gt;Toyota&lt;/td&gt;
&lt;td&gt;Camry&lt;/td&gt;
&lt;td&gt;Blue&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;87,380&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;7,290&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The details about how data should be ordered, formatted and uploaded to a machine learning tool for training vary depending on the specific tool being used, so refer to your tool&amp;#39;s documentation for specific information about appropriately presenting data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Best Practices and Tips for Training Data&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be.&lt;/li&gt;
&lt;li&gt;To the greatest extent possible,&amp;nbsp;provide training data that resembles the data you expect to see in production.&lt;/li&gt;
&lt;li&gt;Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data.&amp;nbsp;Read your tool&amp;#39;s documentation for more details.&lt;/li&gt;
&lt;li&gt;Some tools allow you modify the weight given to specific columns during training, or specify a &amp;quot;time&amp;quot; column if training data values are influenced by time. Read your tool&amp;#39;s documentation for more details.&lt;/li&gt;
&lt;li&gt;Models trained with skewed or unrepresentative data can result in unwanted bias when making predictions. Google has &lt;a href="https://cloud.google.com/inclusive-ml/#fairness-in-ml-automl"&gt;documentation&lt;/a&gt;&amp;nbsp;and a &lt;a href="https://www.youtube.com/watch?v=59bMh59JQDo"&gt;video&lt;/a&gt; regarding bias and machine learning that is helpful for learning more about this topic.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;See Also&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Websites:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/automl-tables/docs/data-best-practices"&gt;Best Practices for Creating Training Data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Videos:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=f_uwKZIAeM0"&gt;What is Machine Learning? (2 mins)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=2ePf9rue1Ao"&gt;What is Artificial Intelligence? (5 mins)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;

&lt;div style="font-size: 90%;"&gt;Tags: Platform, Machine Learning, Architecture&lt;/div&gt;
</description></item><item><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview/revision/3</link><pubDate>Tue, 31 Oct 2023 19:38:42 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:3439a4de-33fa-4d77-939f-3e5561eecd3a</guid><dc:creator>joel.larin</dc:creator><comments>https://community.appian.com/success/w/article/3392/machine-learning-overview#comments</comments><description>Revision 3 posted to Article by joel.larin on 10/31/2023 7:38:42 PM&lt;br /&gt;
&lt;div&gt;
&lt;p&gt;Appian is integration agnostic and&amp;nbsp;has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. For a list of the machine learning integrations that have been written about in detail in Appian&amp;#39;s documentation, refer to the articles below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="/w/guide/3407/integrating-with-amazon-machine-learning"&gt;Integrating with Amazon Machine Learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/w/guide/3252/integrating-with-google-automl-tables"&gt;Integrating with Google AutoML Tables&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what_is_machine_learning?"&gt;What is Machine Learning?&lt;/h2&gt;
&lt;p&gt;Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning models can be thought of as black boxes that are created by processing many observations with known outcomes. These models are then able to take in one or many observations without a known outcome and produce possible outcomes and their probabilities.&lt;/p&gt;
&lt;p&gt;There are many different&amp;nbsp;uses and applications for machine learning, but this article currently focuses on machine learning technology that analyzes structured data&amp;mdash;such as rows of an Excel spreadsheet or an Appian CDT&amp;mdash;and delivers a prediction for a specific field or column in the data. This feature, value or attribute that is being predicted for is often referred to as the&amp;nbsp;&lt;em&gt;target&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as &lt;a href="https://www.ibm.com/watson/services/natural-language-understanding/?cm_mmc=Search_Google-_-Watson+AI_Watson+Core+-+Platform-_-WW_NA-_-watson%20sentiment%20analysis_e&amp;amp;cm_mmca1=000036IU&amp;amp;cm_mmca2=10010583&amp;amp;cm_mmca7=9007770&amp;amp;cm_mmca8=kwd-309710612366&amp;amp;cm_mmca9=_k_EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE_k_&amp;amp;cm_mmca10=338427417319&amp;amp;cm_mmca11=e&amp;amp;gclid=EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE"&gt;IBM&amp;#39;s Watson&lt;/a&gt; and &lt;a href="https://cloud.google.com/vision/"&gt;Google&amp;#39;s AutoML Vision&lt;/a&gt;, respectively.&lt;/p&gt;
&lt;h2 id="common_model_types"&gt;Common Model Types&lt;/h2&gt;
&lt;p&gt;There are two major categories of model types that are used for making machine learning predictions on structured data:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Regression&lt;/strong&gt;: predicts a numeric value.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Classification&lt;/strong&gt;: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Binary classification&lt;/strong&gt;: the model has only two prediction values to choose from (ex. true and false).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multiclassification&lt;/strong&gt;:&amp;nbsp; the model has more than two prediction values to choose from.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Which type you utilize is dependent on the target attribute you want to predict for and your overall objective in creating the model. Read the sections below to learn more about the purpose of each model type and see examples describing appropriate uses of each one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Regression&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a regression model when you want to predict for a numerical value that is not constrained to a finite or particular list of values.&lt;/li&gt;
&lt;li&gt;The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). A perfect model would have a RMSE of 0. The RMSE&amp;nbsp;represents the&amp;nbsp;standard deviation between predicted and actual values; thus good values are relative to your value ranges you are trying to predict.&lt;/li&gt;
&lt;li&gt;When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Regression models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The sale price of a home, given information about the home&amp;#39;s size, number of bedrooms, zip code, etc.&lt;/li&gt;
&lt;li&gt;The appropriate salary for a job posting, given information about that job&amp;#39;s difficulty and expected characteristics of qualified candidates.&lt;/li&gt;
&lt;li&gt;The number of viewers who will watch the premiere of a new TV series, given information about the show&amp;#39;s genre and cast.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Binary Classification&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a binary classification model when you want to predict for a value that has only two possible outcomes.&lt;/li&gt;
&lt;li&gt;A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5 than the predicted value will be true, but machine learning tools typically allow you to adjust the score threshold to alter the number of&amp;nbsp;true and false&amp;nbsp;values depending on your use case.&lt;/li&gt;
&lt;li&gt;The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Binary classification models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.&lt;/li&gt;
&lt;li&gt;Whether a loan application should be approved or rejected, given credit details about the applicant.&lt;/li&gt;
&lt;li&gt;Whether someone will sign up for a service, given their demographics.&lt;/li&gt;
&lt;li&gt;Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account&amp;#39;s typical usage patterns.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Multiclassification&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a multiclass model when you want to predict for a value that can take on a single categorical value from among a list of three or more discrete, finite possibilities.&lt;/li&gt;
&lt;li&gt;A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model&amp;#39;s best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.&lt;/li&gt;
&lt;li&gt;Since the target attribute&amp;#39;s possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.&lt;/li&gt;
&lt;li&gt;The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.&lt;/li&gt;
&lt;li&gt;Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values&amp;nbsp;can be&amp;nbsp;difficult to train and have a higher likelihood of failure and poor model performance.&lt;/li&gt;
&lt;li&gt;To use machine learning to make predictions from a group&amp;nbsp;of possibilities that is larger a&amp;nbsp;tool&amp;#39;s limit,&amp;nbsp;consider using series of different models. For example, imagine a car dealership that sells 75 different minivans, 50 different convertibles, and 100 different sedans. You may not be able to create one model to predict one of the 225 cars, but you could create a model to predict which type of car the customer is likely to buy (minivan, convertible, or sedan) and then one model for each type of car to predict the particular minivan, convertible, or sedan.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Multiclass classification models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Which category of car&amp;mdash;sedan, truck or SUV&amp;mdash;someone is likely to purchase, given their demographics.&lt;/li&gt;
&lt;li&gt;A book&amp;#39;s genre, given information about the book&amp;#39;s author, length, characters, storyline, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Model Types Summary&lt;/strong&gt;&lt;/p&gt;
&lt;table height="135" width="917"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Prediction Type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Common Performance Metrics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regression&lt;/td&gt;
&lt;td&gt;Predicts a numeric value&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://towardsdatascience.com/how-to-select-the-right-evaluation-metric-for-machine-learning-models-part-1-regrression-metrics-3606e25beae0"&gt;Root Mean Square Error (RMSE)&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://towardsdatascience.com/how-to-select-the-right-evaluation-metric-for-machine-learning-models-part-1-regrression-metrics-3606e25beae0"&gt;Mean Absolute Error (MSE)&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;Predicting a home&amp;#39;s sale price&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary Classification&lt;/td&gt;
&lt;td&gt;Predicts binary values (ex. true or false)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;Area Under the Curve (AUC)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Predicting whether a job candidate should be offered employment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiclass Classification&lt;/td&gt;
&lt;td&gt;Predicts values that belong to a limited, predefined set of permissible values&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/F1_score"&gt;F1 Score&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://wiki.fast.ai/index.php/Log_Loss"&gt;Log Loss&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;Predicting a book&amp;#39;s genre&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="training_data&amp;nbsp;"&gt;Training Data&amp;nbsp;&lt;/h2&gt;
&lt;p&gt;To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values and the target attribute. This training data is the means by which the model understands and recognizes patterns about the data for which you ask it to make predictions. Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car.&amp;nbsp;In this use case, the column marked &amp;quot;Sale Price&amp;quot; would be identified to the model as the target attribute to predict for.&lt;/p&gt;
&lt;table height="128" width="579"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Year&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Make&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Color&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Transmission&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Mileage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Previous Owners&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Sale Price&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1997&lt;/td&gt;
&lt;td&gt;Ford&lt;/td&gt;
&lt;td&gt;Mustang&lt;/td&gt;
&lt;td&gt;Silver&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;201,298&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1,499&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2013&lt;/td&gt;
&lt;td&gt;Mazda&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Black&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;60,588&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;8,100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2005&lt;/td&gt;
&lt;td&gt;Honda&lt;/td&gt;
&lt;td&gt;Element&lt;/td&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;160,378&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4,760&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2009&lt;/td&gt;
&lt;td&gt;Toyota&lt;/td&gt;
&lt;td&gt;Camry&lt;/td&gt;
&lt;td&gt;Blue&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;87,380&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;7,290&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The details about how data should be ordered, formatted and uploaded to a machine learning tool for training vary depending on the specific tool being used, so refer to your tool&amp;#39;s documentation for specific information about appropriately presenting data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Best Practices and Tips for Training Data&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be.&lt;/li&gt;
&lt;li&gt;To the greatest extent possible,&amp;nbsp;provide training data that resembles the data you expect to see in production.&lt;/li&gt;
&lt;li&gt;Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data.&amp;nbsp;Read your tool&amp;#39;s documentation for more details.&lt;/li&gt;
&lt;li&gt;Some tools allow you modify the weight given to specific columns during training, or specify a &amp;quot;time&amp;quot; column if training data values are influenced by time. Read your tool&amp;#39;s documentation for more details.&lt;/li&gt;
&lt;li&gt;Models trained with skewed or unrepresentative data can result in unwanted bias when making predictions. Google has &lt;a href="https://cloud.google.com/inclusive-ml/#fairness-in-ml-automl"&gt;documentation&lt;/a&gt;&amp;nbsp;and a &lt;a href="https://www.youtube.com/watch?v=59bMh59JQDo"&gt;video&lt;/a&gt; regarding bias and machine learning that is helpful for learning more about this topic.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;See Also&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Websites:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/automl-tables/docs/data-best-practices"&gt;Best Practices for Creating Training Data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Videos:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=f_uwKZIAeM0"&gt;What is Machine Learning? (2 mins)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=2ePf9rue1Ao"&gt;What is Artificial Intelligence? (5 mins)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;

&lt;div style="font-size: 90%;"&gt;Tags: Platform, Machine Learning, Architecture&lt;/div&gt;
</description></item><item><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview/revision/2</link><pubDate>Tue, 31 Oct 2023 19:34:06 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:3439a4de-33fa-4d77-939f-3e5561eecd3a</guid><dc:creator>joel.larin</dc:creator><comments>https://community.appian.com/success/w/article/3392/machine-learning-overview#comments</comments><description>Revision 2 posted to Article by joel.larin on 10/31/2023 7:34:06 PM&lt;br /&gt;
&lt;div&gt;
&lt;p&gt;Appian is integration agnostic and&amp;nbsp;has the ability to connect with any machine learning offering that exposes itself with a web API. The purpose of this article is to provide information that will empower your general understanding of machine learning technology regardless of the specific tool being used. For a list of the machine learning integrations that have been written about in detail in Appian&amp;#39;s documentation, refer to the articles below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="/w/guide/3407/integrating-with-amazon-machine-learning"&gt;Integrating with Amazon Machine Learning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="/w/guide/3252/integrating-with-google-automl-tables"&gt;Integrating with Google AutoML Tables&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="what_is_machine_learning?"&gt;What is Machine Learning?&lt;/h2&gt;
&lt;p&gt;Machine learning is a type of artificial intelligence that uses mathematical models to generate probabilistic predictions by finding patterns in historical data. Machine learning models can be thought of as black boxes that are created by processing many observations with known outcomes. These models are then able to take in one or many observations without a known outcome and produce possible outcomes and their probabilities.&lt;/p&gt;
&lt;p&gt;There are many different&amp;nbsp;uses and applications for machine learning, but this article currently focuses on machine learning technology that analyzes structured data&amp;mdash;such as rows of an Excel spreadsheet or an Appian CDT&amp;mdash;and delivers a prediction for a specific field or column in the data. This feature, value or attribute that is being predicted for is often referred to as the&amp;nbsp;&lt;em&gt;target&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Other uses for machine learning include natural language analysis and translation and the ability to decipher image contents, done using tools such as &lt;a href="https://www.ibm.com/watson/services/natural-language-understanding/?cm_mmc=Search_Google-_-Watson+AI_Watson+Core+-+Platform-_-WW_NA-_-watson%20sentiment%20analysis_e&amp;amp;cm_mmca1=000036IU&amp;amp;cm_mmca2=10010583&amp;amp;cm_mmca7=9007770&amp;amp;cm_mmca8=kwd-309710612366&amp;amp;cm_mmca9=_k_EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE_k_&amp;amp;cm_mmca10=338427417319&amp;amp;cm_mmca11=e&amp;amp;gclid=EAIaIQobChMItZuoy4T-4gIVgq_ICh2pBQdPEAAYASAAEgJUVvD_BwE"&gt;IBM&amp;#39;s Watson&lt;/a&gt; and &lt;a href="https://cloud.google.com/vision/"&gt;Google&amp;#39;s AutoML Vision&lt;/a&gt;, respectively.&lt;/p&gt;
&lt;h2 id="common_model_types"&gt;Common Model Types&lt;/h2&gt;
&lt;p&gt;There are two major categories of model types that are used for making machine learning predictions on structured data:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Regression&lt;/strong&gt;: predicts a numeric value.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Classification&lt;/strong&gt;: predicts a categorical value from a discrete, fixed number of possible categories. Classification models can be further broken down into two types:
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Binary classification&lt;/strong&gt;: the model has only two prediction values to choose from (ex. true and false).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multiclassification&lt;/strong&gt;:&amp;nbsp; the model has more than two prediction values to choose from.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Which type you utilize is dependent on the target attribute you want to predict for and your overall objective in creating the model. Read the sections below to learn more about the purpose of each model type and see examples describing appropriate uses of each one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Regression&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a regression model when you want to predict for a numerical value that is not constrained to a finite or particular list of values.&lt;/li&gt;
&lt;li&gt;The main metric used to determine accuracy of a regression model is the root mean square error (RMSE). A perfect model would have a RMSE of 0. The RMSE&amp;nbsp;represents the&amp;nbsp;standard deviation between predicted and actual values; thus good values are relative to your value ranges you are trying to predict.&lt;/li&gt;
&lt;li&gt;When using a regression model, be aware that the predicted value may not fall within the range of values provided in training data and might take on any positive or negative number. It is important to have a plan for how to address any values that would fall outside of acceptable ranges for your application.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Regression models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The sale price of a home, given information about the home&amp;#39;s size, number of bedrooms, zip code, etc.&lt;/li&gt;
&lt;li&gt;The appropriate salary for a job posting, given information about that job&amp;#39;s difficulty and expected characteristics of qualified candidates.&lt;/li&gt;
&lt;li&gt;The number of viewers who will watch the premiere of a new TV series, given information about the show&amp;#39;s genre and cast.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Binary Classification&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a binary classification model when you want to predict for a value that has only two possible outcomes.&lt;/li&gt;
&lt;li&gt;A binary classification model will return a value (true or false) and a predicted score (a number between 0 to 1). By default if a predicted score is greater than 0.5 than the predicted value will be true, but machine learning tools typically allow you to adjust the score threshold to alter the number of&amp;nbsp;true and false&amp;nbsp;values depending on your use case.&lt;/li&gt;
&lt;li&gt;The main metric used to evaluate performance of a binary classification model is Area Under the Curve (AUC). The AUC is represented as a number between 0 and 1. A number closer 1 indicates a highly accurate model. Values near 0.5 represent the model is no better than guessing at random. Values close to 0 indicate the model has learned correct patterns, but is using them to make inverse predictions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Binary classification models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether a job candidate should be given an offer of employment, given information about their qualifications and interview scores.&lt;/li&gt;
&lt;li&gt;Whether a loan application should be approved or rejected, given credit details about the applicant.&lt;/li&gt;
&lt;li&gt;Whether someone will sign up for a service, given their demographics.&lt;/li&gt;
&lt;li&gt;Whether a bank transaction is fraudulent, given information about how much that transaction deviates from the account&amp;#39;s typical usage patterns.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Multiclassification&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use a multiclass model when you want to predict for a value that can take on a single categorical value from among a list of three or more discrete, finite possibilities.&lt;/li&gt;
&lt;li&gt;A multiclass model will return a list of values and their related probabilities. The value with the highest probability represents the model&amp;#39;s best prediction. For example, if you are trying to predict which tier of support a customer service case should be routed to, a multiclass model might return: Tier 1 - 60%, Tier 2 - 13%, Tier 3 - 27%.&lt;/li&gt;
&lt;li&gt;Since the target attribute&amp;#39;s possible values are derived from training data, the model will never deliver a prediction value that did not occur in the training data.&lt;/li&gt;
&lt;li&gt;The main metric used to determine the accuracy of a multiclass model is called an F1 score. The F1 score is the harmonic mean between precision and recall. The range is 0 to 1. The closer the value is to 1, the better the model.&lt;/li&gt;
&lt;li&gt;Some machine learning tools set a limit on the number of possible predictable values that a multiclass model can have. This is because target attributes with hundreds or thousands of potential values&amp;nbsp;can be&amp;nbsp;difficult to train and have a higher likelihood of failure and poor model performance.&lt;/li&gt;
&lt;li&gt;To use machine learning to make predictions from a group&amp;nbsp;of possibilities that is larger a&amp;nbsp;tool&amp;#39;s limit,&amp;nbsp;consider using series of different models. For example, imagine a car dealership that sells 75 different minivans, 50 different convertibles, and 100 different sedans. You may not be able to create one model to predict one of the 225 cars, but you could create a model to predict which type of car the customer is likely to buy (minivan, convertible, or sedan) and then one model for each type of car to predict the particular minivan, convertible, or sedan.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Multiclass classification models can be used to predict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Which category of car&amp;mdash;sedan, truck or SUV&amp;mdash;someone is likely to purchase, given their demographics.&lt;/li&gt;
&lt;li&gt;A book&amp;#39;s genre, given information about the book&amp;#39;s author, length, characters, storyline, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Model Types Summary&lt;/strong&gt;&lt;/p&gt;
&lt;table height="135" width="917"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Prediction Type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Common Performance Metrics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regression&lt;/td&gt;
&lt;td&gt;Predicts a numeric value&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://towardsdatascience.com/how-to-select-the-right-evaluation-metric-for-machine-learning-models-part-1-regrression-metrics-3606e25beae0"&gt;Root Mean Square Error (RMSE)&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://towardsdatascience.com/how-to-select-the-right-evaluation-metric-for-machine-learning-models-part-1-regrression-metrics-3606e25beae0"&gt;Mean Absolute Error (MSE)&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;Predicting a home&amp;#39;s sale price&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary Classification&lt;/td&gt;
&lt;td&gt;Predicts binary values (ex. true or false)&lt;/td&gt;
&lt;td&gt;&lt;a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"&gt;Area Under the Curve (AUC)&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Predicting whether a job candidate should be offered employment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiclass Classification&lt;/td&gt;
&lt;td&gt;Predicts values that belong to a limited, predefined set of permissible values&lt;/td&gt;
&lt;td&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/F1_score"&gt;F1 Score&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://wiki.fast.ai/index.php/Log_Loss"&gt;Log Loss&lt;/a&gt;&lt;/p&gt;
&lt;/td&gt;
&lt;td&gt;Predicting a book&amp;#39;s genre&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="training_data&amp;nbsp;"&gt;Training Data&amp;nbsp;&lt;/h2&gt;
&lt;p&gt;To create a model, you must supply the machine learning tool with training data that it will use to learn about associations between different attribute values and the target attribute. This training data is the means by which the model understands and recognizes patterns about the data for which you ask it to make predictions. Below is an example of a data structure that might be used for training data for a model designed to predict the sale price of a used car.&amp;nbsp;In this use case, the column marked &amp;quot;Sale Price&amp;quot; would be identified to the model as the target attribute to predict for.&lt;/p&gt;
&lt;table height="128" width="579"&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Year&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Make&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Color&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Transmission&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Mileage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Previous Owners&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;b&gt;Sale Price&lt;/b&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1997&lt;/td&gt;
&lt;td&gt;Ford&lt;/td&gt;
&lt;td&gt;Mustang&lt;/td&gt;
&lt;td&gt;Silver&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;201,298&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;1,499&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2013&lt;/td&gt;
&lt;td&gt;Mazda&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Black&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;60,588&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;8,100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2005&lt;/td&gt;
&lt;td&gt;Honda&lt;/td&gt;
&lt;td&gt;Element&lt;/td&gt;
&lt;td&gt;Red&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;td&gt;160,378&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4,760&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2009&lt;/td&gt;
&lt;td&gt;Toyota&lt;/td&gt;
&lt;td&gt;Camry&lt;/td&gt;
&lt;td&gt;Blue&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;87,380&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;7,290&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The details about how data should be ordered, formatted and uploaded to a machine learning tool for training vary depending on the specific tool being used, so refer to your tool&amp;#39;s documentation for specific information about appropriately presenting data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Best Practices and Tips for Training Data&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The more training observations (ie. rows of data) that you provide during training, the more accurate the final model will be.&lt;/li&gt;
&lt;li&gt;To the greatest extent possible,&amp;nbsp;provide training data that resembles the data you expect to see in production.&lt;/li&gt;
&lt;li&gt;Machine learning tools typically have both minimal requirements and limits regarding the size and complexity of training data.&amp;nbsp;Read your tool&amp;#39;s documentation for more details.&lt;/li&gt;
&lt;li&gt;Some tools allow you modify the weight given to specific columns during training, or specify a &amp;quot;time&amp;quot; column if training data values are influenced by time. Read your tool&amp;#39;s documentation for more details.&lt;/li&gt;
&lt;li&gt;Models trained with skewed or unrepresentative data can result in unwanted bias when making predictions. Google has &lt;a href="https://cloud.google.com/inclusive-ml/#fairness-in-ml-automl"&gt;documentation&lt;/a&gt;&amp;nbsp;and a &lt;a href="https://www.youtube.com/watch?v=59bMh59JQDo"&gt;video&lt;/a&gt; regarding bias and machine learning that is helpful for learning more about this topic.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;See Also&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Websites:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/automl-tables/docs/data-best-practices"&gt;Best Practices for Creating Training Data&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Videos:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=f_uwKZIAeM0"&gt;What is Machine Learning? (2 mins)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=2ePf9rue1Ao"&gt;What is Artificial Intelligence? (5 mins)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;
</description></item><item><title>Machine Learning Overview</title><link>https://community.appian.com/success/w/article/3392/machine-learning-overview/revision/1</link><pubDate>Wed, 18 Oct 2023 13:43:05 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:3439a4de-33fa-4d77-939f-3e5561eecd3a</guid><dc:creator>joel.larin</dc:creator><comments>https://community.appian.com/success/w/article/3392/machine-learning-overview#comments</comments><description>Revision 1 posted to Article by joel.larin on 10/18/2023 1:43:05 PM&lt;br /&gt;
&lt;p&gt;fdsa&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;
</description></item></channel></rss>