Note: Amazon ML is no longer available to new Amazon customers
This article provides information about integrating with Amazon Machine Learning. If you are unfamiliar with machine learning, it is recommended that you read the Machine Learning Overview article for information about the technology, different model types and training data guidance.
Amazon offers a plethora of services that all fall under its machine learning arm from translation services (Amazon Translate) to video recognition (Amazon DeepLens). Appian can integrate with all of these services; however, this article solely focuses on the Amazon Machine Learning service through the use of the Appian AI Designer. Furthermore, there are many other machine learning offerings available including Google Cloud and Microsoft Azure. Appian is integration agnostic and has the ability to connect with all of them.
Amazon Machine Learning (AML) supports three different type of ML models. The type of model that Amazon will build depends on the type of target attribute that you want to predict.
The following steps outline how to create a model using the Appian AI Designer shared component. It is possible to create models directly in the AML admin console. It also possible to interact with models in Appian that already exist or have not been created using the Appian AI Designer (for more information on making predictions see next section).
Once a model is created you can make batch predictions or individual real-time predictions. There are two main ways to make real time predictions within Appian: you can use either the shared component function AML_getRealtimePrediction or you can use the connected system object in Appian versions 18.2 or later. The AML_getRealtimePrediction function takes in a model ID and two parallel arrays that hold attribute names and attribute values. If using this function it is recommended to create a mapping rule that takes in a CDT and converts the CDT values into a text array to be passed into AML_getRealtimePrediction. Before even creating a connected system or creating a rule to call the API you can test out real time predictions from the AML admin console or from the machine learning model record in the Appian AI designer site. It's recommended to test out the predictions and evaluate the model (more below) before deciding to move forward with an initial model.
Evaluating and Adjusting Model Performance
Whenever a new model is created there are four objects created in the AML Admin Console: One training data source, one evaluation data source, one model, and one evaluation object. As discussed above, Amazon uses different metrics to quantify performance. In addition, Amazon provides a different performance visualization for each model. To access the performance metric and visualizations navigate to the admin console and select the evaluation object. For binary classification models you are able to adjust output using the dual histogram visualization (pictured below) by raising or lowering the score threshold that is defaulted to 0.5. For example if you would like to automate a process by auto approving likely true values you may want to raise the score threshold to a value closer to 1 in order to limit the false positives (raising the score threshold has the effect of increasing the probability needed for the model to predict a value as true). Inversely, if you would like to flag values that are likely false for further review you may want to lower the score threshold in order to limit the false negatives.
Another way to evaluate the model is to take a look at how each feature correlates to the target value. Some values have more of an impact of the predicted outcomes and this is quantified by Amazon (to view these values navigate to either of the data sources in the AML admin console). It is generally a best practice to include as many relevant features as possible in your data set, but noise introduced by including too many variables with little predictive power may negatively impact your models performance.
Retraining Models
Feature Transformation
Splitting Data
Shuffling Data
See Also
Websites: