<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="https://community.appian.com/cfs-file/__key/system/syndication/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>Transferring/Processing Large Data Sets (ETL)</title><link>https://community.appian.com/success/w/guide/3316/transferring-processing-large-data-sets-etl</link><description /><dc:language>en-US</dc:language><generator>Telligent Community 12</generator><item><title>Transferring/Processing Large Data Sets (ETL)</title><link>https://community.appian.com/success/w/guide/3316/transferring-processing-large-data-sets-etl</link><pubDate>Tue, 23 Apr 2024 13:19:15 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:0726ad27-02f3-4f54-81b4-ff92edb08f69</guid><dc:creator>Appian Max Team</dc:creator><comments>https://community.appian.com/success/w/guide/3316/transferring-processing-large-data-sets-etl#comments</comments><description>Current Revision posted to Guide by Appian Max Team on 4/23/2024 1:19:15 PM&lt;br /&gt;
&lt;div style="margin:8px 16% 8px 8%;"&gt;
&lt;p&gt;This page explains how to design solutions for high-volume processing of large data sets or extract-transform-load (ETL) patterns, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Running a nightly sync on up to 2 million records&amp;nbsp;&lt;/li&gt;
&lt;li&gt;Performing a one-time transfer of 1 million rows from one database to another, transforming each row as they are processed&lt;/li&gt;
&lt;li&gt;Allowing users to upload Excel files with up to 10,000 rows, each of which must be processed through a series of validations and database updates&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;b&gt;Do&lt;/b&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="/w/article/3216/performance-and-load-testing"&gt;Load test&lt;/a&gt; at production volumes to confirm both functional and performance requirements are met&lt;/li&gt;
&lt;li&gt;Throttle or queue incoming requests to reduce peak processing requirements&lt;/li&gt;
&lt;li&gt;Convert real-time processing to background/asynchronous activities whenever possible&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;b&gt;Don&amp;#39;t&lt;/b&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Pass or store large amounts of data in process variables&lt;/li&gt;
&lt;li&gt;Manipulate data in long sequences of unattended nodes in series and/or heavy use of MNI/looping&lt;/li&gt;
&lt;li&gt;Activity chain through long sequences of script tasks and smart services&lt;/li&gt;
&lt;li&gt;Run large transaction volumes during business hours (and make sure nightly batch processes complete before users start logging in the next day)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="appian_records"&gt;Appian Records&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://docs.appian.com/suite/help/latest/Record_Type_Object.html"&gt;Appian Records&lt;/a&gt; makes it easy to unify data across your organization. Using Appian Records, you can converge your system data into a single point of management. Simply choose your data source (whether it be a database, web service, or Salesforce object) and follow the &lt;a href="https://docs.appian.com/suite/help/latest/configure-record-data-source.html"&gt;steps&lt;/a&gt; to configure your record type. From there, you can adjust your record fields, and even extend your data set by creating custom record fields that calculate and transform your data to fit your business needs.&lt;/p&gt;
&lt;p&gt;When needing to ETL data that resides in the Appian Cloud database, setting up the &lt;a href="https://docs.appian.com/suite/help/latest/Enhanced_Data_Pipeline_for_Appian_Cloud.html#using-enhanced-data-pipeline"&gt;Enhanced Data Pipeline&lt;/a&gt; is a viable approach. Data can be collected by querying the database with a read-only user or data can be replicated by exposing the Appian database&amp;rsquo;s binary logs to your CDC tool of choice.&lt;/p&gt;
&lt;h2 id="smart_services_and_plugins"&gt;Smart Services and Plugins&lt;/h2&gt;
&lt;p&gt;An alternative is to use Appian processes to perform processing with script tasks and out-of-the-box smart services. Typically this approach is quick to implement and doesn&amp;rsquo;t require any specialized knowledge. This could be considered a good fit when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data volumes are low and it is a lower complexity use case
&lt;ul&gt;
&lt;li&gt;The exact threshold depends on many factors including infrastructure, peak load, and processing requirements. Perform a &lt;a href="/w/article/3216/performance-and-load-testing"&gt;load test&lt;/a&gt; with expected production user and data volumes to assess the solution prior to go-live.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;It is only needed for background processing (not between activity chained activities)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Consider one of the other options if...&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You need to return results in real time&lt;/li&gt;
&lt;li&gt;You find yourself with a process of 10+ unattended nodes in series and/or heavy use of MNI/looping&lt;/li&gt;
&lt;li&gt;Volume or complexity will grow over time (avoid future refactoring)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following Appian smart services and plug-ins are commonly used together to efficiently load and process large data sets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use &lt;a href="https://docs.appian.com/suite/help/latest/Export_To_CSV_Smart_Service.html"&gt;Export to CSV smart service&lt;/a&gt; to move data from a source database directly into a CSV file.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="/b/appmarket/posts/excel-tools"&gt;Excel Tools&lt;/a&gt; plug-in to move data from CSV files directly into the target database.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://docs.appian.com/suite/help/latest/Execute_Stored_Procedure_Smart_Service.html"&gt;Execute Stored Procedure&lt;/a&gt; smart service to process/transform the data into the desired structure.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Smart service plug-ins can be used within an Appian process to optimize your more complex and/or high volume processing steps. Plug-ins consolidate long sequences of activity or MNI/looping by delegating the processing to Java, reducing the number of Appian processes and activities.&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;Plug-ins used for high volume processing should operate or move data entirely within the plug-in. They should not bring data into the Appian process, since this can lead to higher load and memory use. For this reason the following Shared Components are &lt;b&gt;not&lt;/b&gt; recommended for high volume processing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parse CSV File Plug-in&lt;/li&gt;
&lt;li&gt;Read Excel File Smart Service&lt;/li&gt;
&lt;li&gt;Excel Read Cells&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="custom_solutions"&gt;Custom Solutions&lt;/h2&gt;
&lt;p&gt;If you can&amp;#39;t get the functionality or performance you need from existing plug-ins then you can &lt;a href="https://docs.appian.com/suite/help/latest/Custom_Smart_Service_Plug-ins.html"&gt;write your own smart services&lt;/a&gt; to perform custom processing. When designing a custom smart service consider the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Requires a Java developer&lt;/li&gt;
&lt;li&gt;Pass references to the data source(s) of the processing/transformation rather than the data itself. For example, pass a data source name rather than the rows of data, or a document id instead of the text of the document.&lt;/li&gt;
&lt;li&gt;Perform looping inside the plug-in rather than executing multiple instances from the parent process&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Specialized solutions should be used for extreme cases of volume and complexity. In these cases an external rules engine, document generation, or data integration platform might provide more scalability. Examples of such tools include &lt;a href="https://www.mulesoft.com/platform/studio"&gt;Mulesoft&lt;/a&gt; or Informatica.&lt;/p&gt;
&lt;p&gt;When using external systems consider the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Requires working with the vendor team to configure and manage the third-party system&lt;/li&gt;
&lt;li&gt;Use any of Appian&amp;#39;s &lt;a href="https://docs.appian.com/suite/help/latest/Getting_Started_with_Connecting_Appian.html"&gt;integration&lt;/a&gt; capabilities to connect&lt;/li&gt;
&lt;li&gt;Use an integration method that supports passing data and results in bulk (avoid looping over individual calls within Appian and ideally use Appian Records to facilitate the synchronization)&lt;/li&gt;
&lt;li&gt;Test the performance of the integrated solution before using it in a real-time, user-facing scenario&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;

&lt;div style="font-size: 90%;"&gt;Tags: integrations, design patterns, Architecture&lt;/div&gt;
</description></item><item><title>Transferring/Processing Large Data Sets (ETL)</title><link>https://community.appian.com/success/w/guide/3316/transferring-processing-large-data-sets-etl/revision/4</link><pubDate>Tue, 31 Oct 2023 17:18:42 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:0726ad27-02f3-4f54-81b4-ff92edb08f69</guid><dc:creator>Kim Day</dc:creator><comments>https://community.appian.com/success/w/guide/3316/transferring-processing-large-data-sets-etl#comments</comments><description>Revision 4 posted to Guide by Kim Day on 10/31/2023 5:18:42 PM&lt;br /&gt;
&lt;div style="margin:8px 16% 8px 8%;"&gt;
&lt;p&gt;This page explains how to design solutions for high-volume processing of large data sets or extract-transform-load (ETL) patterns, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Running a nightly sync on up to 2 million records&amp;nbsp;&lt;/li&gt;
&lt;li&gt;Performing a one-time transfer of 1 million rows from one database to another, transforming each row as they are processed&lt;/li&gt;
&lt;li&gt;Allowing users to upload Excel files with up to 10,000 rows, each of which must be processed through a series of validations and database updates&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;b&gt;Do&lt;/b&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="/w/article/3216/performance-and-load-testing"&gt;Load test&lt;/a&gt; at production volumes to confirm both functional and performance requirements are met&lt;/li&gt;
&lt;li&gt;Throttle or queue incoming requests to reduce peak processing requirements&lt;/li&gt;
&lt;li&gt;Convert real-time processing to background/asynchronous activities whenever possible&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;b&gt;Don&amp;#39;t&lt;/b&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Pass or store large amounts of data in process variables&lt;/li&gt;
&lt;li&gt;Manipulate data in long sequences of unattended nodes in series and/or heavy use of MNI/looping&lt;/li&gt;
&lt;li&gt;Activity chain through long sequences of script tasks and smart services&lt;/li&gt;
&lt;li&gt;Run large transaction volumes during business hours (and make sure nightly batch processes complete before users start logging in the next day)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="appian_records"&gt;Appian Records&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://docs.appian.com/suite/help/latest/Record_Type_Object.html"&gt;Appian Records&lt;/a&gt; makes it easy to unify data across your organization. Using Appian Records, you can converge your system data into a single point of management. Simply choose your data source (whether it be a database, web service, or Salesforce object) and follow the &lt;a href="https://docs.appian.com/suite/help/latest/configure-record-data-source.html"&gt;steps&lt;/a&gt; to configure your record type. From there, you can adjust your record fields, and even extend your data set by creating custom record fields that calculate and transform your data to fit your business needs.&lt;/p&gt;
&lt;p&gt;When needing to ETL data that resides in the Appian Cloud database, setting up the &lt;a href="https://docs.appian.com/suite/help/latest/Enhanced_Data_Pipeline_for_Appian_Cloud.html#using-enhanced-data-pipeline"&gt;Enhanced Data Pipeline&lt;/a&gt; is a viable approach. Data can be collected by querying the database with a read-only user or data can be replicated by exposing the Appian database&amp;rsquo;s binary logs to your CDC tool of choice.&lt;/p&gt;
&lt;h2 id="smart_services_and_plugins"&gt;Smart Services and Plugins&lt;/h2&gt;
&lt;p&gt;An alternative is to use Appian processes to perform processing with script tasks and out-of-the-box smart services. Typically this approach is quick to implement and doesn&amp;rsquo;t require any specialized knowledge. This could be considered a good fit when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data volumes are low and it is a lower complexity use case
&lt;ul&gt;
&lt;li&gt;The exact threshold depends on many factors including infrastructure, peak load, and processing requirements. Perform a &lt;a href="/w/article/3216/performance-and-load-testing"&gt;load test&lt;/a&gt; with expected production user and data volumes to assess the solution prior to go-live.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;It is only needed for background processing (not between activity chained activities)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Consider one of the other options if...&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You need to return results in real time&lt;/li&gt;
&lt;li&gt;You find yourself with a process of 10+ unattended nodes in series and/or heavy use of MNI/looping&lt;/li&gt;
&lt;li&gt;Volume or complexity will grow over time (avoid future refactoring)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following Appian smart services and plug-ins are commonly used together to efficiently load and process large data sets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use &lt;a href="https://docs.appian.com/suite/help/latest/Export_To_CSV_Smart_Service.html"&gt;Export to CSV smart service&lt;/a&gt; to move data from a source database directly into a CSV file.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="/b/appmarket/posts/excel-tools"&gt;Excel Tools&lt;/a&gt; plug-in to move data from CSV files directly into the target database.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://docs.appian.com/suite/help/latest/Execute_Stored_Procedure_Smart_Service.html"&gt;Execute Stored Procedure&lt;/a&gt; smart service to process/transform the data into the desired structure.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Smart service plug-ins can be used within an Appian process to optimize your more complex and/or high volume processing steps. Plug-ins consolidate long sequences of activity or MNI/looping by delegating the processing to Java, reducing the number of Appian processes and activities.&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;Plug-ins used for high volume processing should operate or move data entirely within the plug-in. They should not bring data into the Appian process, since this can lead to higher load and memory use. For this reason the following Shared Components are &lt;b&gt;not&lt;/b&gt; recommended for high volume processing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parse CSV File Plug-in&lt;/li&gt;
&lt;li&gt;Read Excel File Smart Service&lt;/li&gt;
&lt;li&gt;Excel Read Cells&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="custom_solutions"&gt;Custom Solutions&lt;/h2&gt;
&lt;p&gt;If you can&amp;#39;t get the functionality or performance you need from existing plug-ins then you can &lt;a href="https://docs.appian.com/suite/help/latest/Custom_Smart_Service_Plug-ins.html"&gt;write your own smart services&lt;/a&gt; to perform custom processing. When designing a custom smart service consider the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Requires a Java developer&lt;/li&gt;
&lt;li&gt;Pass references to the data source(s) of the processing/transformation rather than the data itself. For example, pass a data source name rather than the rows of data, or a document id instead of the text of the document.&lt;/li&gt;
&lt;li&gt;Perform looping inside the plug-in rather than executing multiple instances from the parent process&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Specialized solutions should be used for extreme cases of volume and complexity. In these cases an external rules engine, document generation, or data integration platform might provide more scalability. Examples of such tools include &lt;a href="https://www.mulesoft.com/platform/studio"&gt;Mulesoft&lt;/a&gt; or Informatica.&lt;/p&gt;
&lt;p&gt;When using external systems consider the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Requires working with the vendor team to configure and manage the third-party system&lt;/li&gt;
&lt;li&gt;Use any of Appian&amp;#39;s &lt;a href="https://docs.appian.com/suite/help/latest/Getting_Started_with_Connecting_Appian.html"&gt;integration&lt;/a&gt; capabilities to connect&lt;/li&gt;
&lt;li&gt;Use an integration method that supports passing data and results in bulk (avoid looping over individual calls within Appian and ideally use Appian Records to facilitate the synchronization)&lt;/li&gt;
&lt;li&gt;Test the performance of the integrated solution before using it in a real-time, user-facing scenario&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;

&lt;div style="font-size: 90%;"&gt;Tags: integrations, design patterns, Architecture&lt;/div&gt;
</description></item><item><title>Transferring/Processing Large Data Sets (ETL)</title><link>https://community.appian.com/success/w/guide/3316/transferring-processing-large-data-sets-etl/revision/3</link><pubDate>Thu, 26 Oct 2023 18:04:32 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:0726ad27-02f3-4f54-81b4-ff92edb08f69</guid><dc:creator>joel.larin</dc:creator><comments>https://community.appian.com/success/w/guide/3316/transferring-processing-large-data-sets-etl#comments</comments><description>Revision 3 posted to Guide by joel.larin on 10/26/2023 6:04:32 PM&lt;br /&gt;
&lt;div style="margin:8px 16% 8px 8%;"&gt;
&lt;p&gt;This page explains how to design solutions for high-volume processing of large data sets or extract-transform-load (ETL) patterns, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Running a nightly sync on up to 2 million records&amp;nbsp;&lt;/li&gt;
&lt;li&gt;Performing a one-time transfer of 1 million rows from one database to another, transforming each row as they are processed&lt;/li&gt;
&lt;li&gt;Allowing users to upload Excel files with up to 10,000 rows, each of which must be processed through a series of validations and database updates&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;b&gt;Do&lt;/b&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="/w/article/3216/performance-and-load-testing"&gt;Load test&lt;/a&gt; at production volumes to confirm both functional and performance requirements are met&lt;/li&gt;
&lt;li&gt;Throttle or queue incoming requests to reduce peak processing requirements&lt;/li&gt;
&lt;li&gt;Convert real-time processing to background/asynchronous activities whenever possible&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;b&gt;Don&amp;#39;t&lt;/b&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Pass or store large amounts of data in process variables&lt;/li&gt;
&lt;li&gt;Manipulate data in long sequences of unattended nodes in series and/or heavy use of MNI/looping&lt;/li&gt;
&lt;li&gt;Activity chain through long sequences of script tasks and smart services&lt;/li&gt;
&lt;li&gt;Run large transaction volumes during business hours (and make sure nightly batch processes complete before users start logging in the next day)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="appian_records"&gt;Appian Records&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://docs.appian.com/suite/help/latest/Record_Type_Object.html"&gt;Appian Records&lt;/a&gt; makes it easy to unify data across your organization. Using Appian Records, you can converge your system data into a single point of management. Simply choose your data source (whether it be a database, web service, or Salesforce object) and follow the &lt;a href="https://docs.appian.com/suite/help/latest/configure-record-data-source.html"&gt;steps&lt;/a&gt; to configure your record type. From there, you can adjust your record fields, and even extend your data set by creating custom record fields that calculate and transform your data to fit your business needs.&lt;/p&gt;
&lt;p&gt;When needing to ETL data that resides in the Appian Cloud database, setting up the &lt;a href="https://docs.appian.com/suite/help/latest/Enhanced_Data_Pipeline_for_Appian_Cloud.html#using-enhanced-data-pipeline"&gt;Enhanced Data Pipeline&lt;/a&gt; is a viable approach. Data can be collected by querying the database with a read-only user or data can be replicated by exposing the Appian database&amp;rsquo;s binary logs to your CDC tool of choice.&lt;/p&gt;
&lt;h2 id="smart_services_and_plugins"&gt;Smart Services and Plugins&lt;/h2&gt;
&lt;p&gt;An alternative is to use Appian processes to perform processing with script tasks and out-of-the-box smart services. Typically this approach is quick to implement and doesn&amp;rsquo;t require any specialized knowledge. This could be considered a good fit when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data volumes are low and it is a lower complexity use case
&lt;ul&gt;
&lt;li&gt;The exact threshold depends on many factors including infrastructure, peak load, and processing requirements. Perform a &lt;a href="/w/article/3216/performance-and-load-testing"&gt;load test&lt;/a&gt; with expected production user and data volumes to assess the solution prior to go-live.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;It is only needed for background processing (not between activity chained activities)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Consider one of the other options if...&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You need to return results in real time&lt;/li&gt;
&lt;li&gt;You find yourself with a process of 10+ unattended nodes in series and/or heavy use of MNI/looping&lt;/li&gt;
&lt;li&gt;Volume or complexity will grow over time (avoid future refactoring)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following Appian smart services and plug-ins are commonly used together to efficiently load and process large data sets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use &lt;a href="https://docs.appian.com/suite/help/latest/Export_To_CSV_Smart_Service.html"&gt;Export to CSV smart service&lt;/a&gt; to move data from a source database directly into a CSV file.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="/b/appmarket/posts/excel-tools"&gt;Excel Tools&lt;/a&gt; plug-in to move data from CSV files directly into the target database.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://docs.appian.com/suite/help/latest/Execute_Stored_Procedure_Smart_Service.html"&gt;Execute Stored Procedure&lt;/a&gt; smart service to process/transform the data into the desired structure.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Smart service plug-ins can be used within an Appian process to optimize your more complex and/or high volume processing steps. Plug-ins consolidate long sequences of activity or MNI/looping by delegating the processing to Java, reducing the number of Appian processes and activities.&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;Plug-ins used for high volume processing should operate or move data entirely within the plug-in. They should not bring data into the Appian process, since this can lead to higher load and memory use. For this reason the following Shared Components are &lt;b&gt;not&lt;/b&gt; recommended for high volume processing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parse CSV File Plug-in&lt;/li&gt;
&lt;li&gt;Read Excel File Smart Service&lt;/li&gt;
&lt;li&gt;Excel Read Cells&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="custom_solutions"&gt;Custom Solutions&lt;/h2&gt;
&lt;p&gt;If you can&amp;#39;t get the functionality or performance you need from existing plug-ins then you can &lt;a href="https://docs.appian.com/suite/help/latest/Custom_Smart_Service_Plug-ins.html"&gt;write your own smart services&lt;/a&gt; to perform custom processing. When designing a custom smart service consider the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Requires a Java developer&lt;/li&gt;
&lt;li&gt;Pass references to the data source(s) of the processing/transformation rather than the data itself. For example, pass a data source name rather than the rows of data, or a document id instead of the text of the document.&lt;/li&gt;
&lt;li&gt;Perform looping inside the plug-in rather than executing multiple instances from the parent process&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Specialized solutions should be used for extreme cases of volume and complexity. In these cases an external rules engine, document generation, or data integration platform might provide more scalability. Examples of such tools include &lt;a href="https://www.mulesoft.com/platform/studio"&gt;Mulesoft&lt;/a&gt; or Informatica.&lt;/p&gt;
&lt;p&gt;When using external systems consider the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Requires working with the vendor team to configure and manage the third-party system&lt;/li&gt;
&lt;li&gt;Use any of Appian&amp;#39;s &lt;a href="https://docs.appian.com/suite/help/latest/Getting_Started_with_Connecting_Appian.html"&gt;integration&lt;/a&gt; capabilities to connect&lt;/li&gt;
&lt;li&gt;Use an integration method that supports passing data and results in bulk (avoid looping over individual calls within Appian and ideally use Appian Records to facilitate the synchronization)&lt;/li&gt;
&lt;li&gt;Test the performance of the integrated solution before using it in a real-time, user-facing scenario&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;
</description></item><item><title>Transferring/Processing Large Data Sets (ETL)</title><link>https://community.appian.com/success/w/guide/3316/transferring-processing-large-data-sets-etl/revision/2</link><pubDate>Thu, 26 Oct 2023 17:44:42 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:0726ad27-02f3-4f54-81b4-ff92edb08f69</guid><dc:creator>joel.larin</dc:creator><comments>https://community.appian.com/success/w/guide/3316/transferring-processing-large-data-sets-etl#comments</comments><description>Revision 2 posted to Guide by joel.larin on 10/26/2023 5:44:42 PM&lt;br /&gt;
&lt;div style="margin:8px 16% 8px 8%;"&gt;
&lt;p&gt;This page explains how to design solutions for high-volume processing of large data sets or extract-transform-load (ETL) patterns, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Running a nightly sync on up to 2 million records&amp;nbsp;&lt;/li&gt;
&lt;li&gt;Performing a one-time transfer of 1 million rows from one database to another, transforming each row as they are processed&lt;/li&gt;
&lt;li&gt;Allowing users to upload Excel files with up to 10,000 rows, each of which must be processed through a series of validations and database updates&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;b&gt;Do&lt;/b&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="/w/article/3216/performance-and-load-testing"&gt;Load test&lt;/a&gt; at production volumes to confirm both functional and performance requirements are met&lt;/li&gt;
&lt;li&gt;Throttle or queue incoming requests to reduce peak processing requirements&lt;/li&gt;
&lt;li&gt;Convert real-time processing to background/asynchronous activities whenever possible&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;&lt;b&gt;Don&amp;#39;t&lt;/b&gt;&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Pass or store large amounts of data in process variables&lt;/li&gt;
&lt;li&gt;Manipulate data in long sequences of unattended nodes in series and/or heavy use of MNI/looping&lt;/li&gt;
&lt;li&gt;Activity chain through long sequences of script tasks and smart services&lt;/li&gt;
&lt;li&gt;Run large transaction volumes during business hours (and make sure nightly batch processes complete before users start logging in the next day)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://docs.appian.com/suite/help/latest/Record_Type_Object.html"&gt;Appian Records&lt;/a&gt; makes it easy to unify data across your organization. Using Appian Records, you can converge your system data into a single point of management. Simply choose your data source (whether it be a database, web service, or Salesforce object) and follow the &lt;a href="https://docs.appian.com/suite/help/latest/configure-record-data-source.html"&gt;steps&lt;/a&gt; to configure your record type. From there, you can adjust your record fields, and even extend your data set by creating custom record fields that calculate and transform your data to fit your business needs.&lt;/p&gt;
&lt;p&gt;When needing to ETL data that resides in the Appian Cloud database, setting up the &lt;a href="https://docs.appian.com/suite/help/latest/Enhanced_Data_Pipeline_for_Appian_Cloud.html#using-enhanced-data-pipeline"&gt;Enhanced Data Pipeline&lt;/a&gt; is a viable approach. Data can be collected by querying the database with a read-only user or data can be replicated by exposing the Appian database&amp;rsquo;s binary logs to your CDC tool of choice.&lt;/p&gt;
&lt;p&gt;An alternative is to use Appian processes to perform processing with script tasks and out-of-the-box smart services. Typically this approach is quick to implement and doesn&amp;rsquo;t require any specialized knowledge. This could be considered a good fit when:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data volumes are low and it is a lower complexity use case
&lt;ul&gt;
&lt;li&gt;The exact threshold depends on many factors including infrastructure, peak load, and processing requirements. Perform a &lt;a href="/w/article/3216/performance-and-load-testing"&gt;load test&lt;/a&gt; with expected production user and data volumes to assess the solution prior to go-live.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;It is only needed for background processing (not between activity chained activities)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Consider one of the other options if...&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;You need to return results in real time&lt;/li&gt;
&lt;li&gt;You find yourself with a process of 10+ unattended nodes in series and/or heavy use of MNI/looping&lt;/li&gt;
&lt;li&gt;Volume or complexity will grow over time (avoid future refactoring)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following Appian smart services and plug-ins are commonly used together to efficiently load and process large data sets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use &lt;a href="https://docs.appian.com/suite/help/latest/Export_To_CSV_Smart_Service.html"&gt;Export to CSV smart service&lt;/a&gt; to move data from a source database directly into a CSV file.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="/b/appmarket/posts/excel-tools"&gt;Excel Tools&lt;/a&gt; plug-in to move data from CSV files directly into the target database.&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://docs.appian.com/suite/help/latest/Execute_Stored_Procedure_Smart_Service.html"&gt;Execute Stored Procedure&lt;/a&gt; smart service to process/transform the data into the desired structure.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Smart service plug-ins can be used within an Appian process to optimize your more complex and/or high volume processing steps. Plug-ins consolidate long sequences of activity or MNI/looping by delegating the processing to Java, reducing the number of Appian processes and activities.&lt;/p&gt;
&lt;table&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;p&gt;Plug-ins used for high volume processing should operate or move data entirely within the plug-in. They should not bring data into the Appian process, since this can lead to higher load and memory use. For this reason the following Shared Components are &lt;b&gt;not&lt;/b&gt; recommended for high volume processing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parse CSV File Plug-in&lt;/li&gt;
&lt;li&gt;Read Excel File Smart Service&lt;/li&gt;
&lt;li&gt;Excel Read Cells&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;If you can&amp;#39;t get the functionality or performance you need from existing plug-ins then you can &lt;a href="https://docs.appian.com/suite/help/latest/Custom_Smart_Service_Plug-ins.html"&gt;write your own smart services&lt;/a&gt; to perform custom processing. When designing a custom smart service consider the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Requires a Java developer&lt;/li&gt;
&lt;li&gt;Pass references to the data source(s) of the processing/transformation rather than the data itself. For example, pass a data source name rather than the rows of data, or a document id instead of the text of the document.&lt;/li&gt;
&lt;li&gt;Perform looping inside the plug-in rather than executing multiple instances from the parent process&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Specialized solutions should be used for extreme cases of volume and complexity. In these cases an external rules engine, document generation, or data integration platform might provide more scalability. Examples of such tools include &lt;a href="https://www.mulesoft.com/platform/studio"&gt;Mulesoft&lt;/a&gt; or Informatica.&lt;/p&gt;
&lt;p&gt;When using external systems consider the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Requires working with the vendor team to configure and manage the third-party system&lt;/li&gt;
&lt;li&gt;Use any of Appian&amp;#39;s &lt;a href="https://docs.appian.com/suite/help/latest/Getting_Started_with_Connecting_Appian.html"&gt;integration&lt;/a&gt; capabilities to connect&lt;/li&gt;
&lt;li&gt;Use an integration method that supports passing data and results in bulk (avoid looping over individual calls within Appian and ideally use Appian Records to facilitate the synchronization)&lt;/li&gt;
&lt;li&gt;Test the performance of the integrated solution before using it in a real-time, user-facing scenario&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;
</description></item><item><title>Transferring/Processing Large Data Sets (ETL)</title><link>https://community.appian.com/success/w/guide/3316/transferring-processing-large-data-sets-etl/revision/1</link><pubDate>Thu, 07 Sep 2023 15:20:45 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:0726ad27-02f3-4f54-81b4-ff92edb08f69</guid><dc:creator>joel.larin</dc:creator><comments>https://community.appian.com/success/w/guide/3316/transferring-processing-large-data-sets-etl#comments</comments><description>Revision 1 posted to Guide by joel.larin on 9/7/2023 3:20:45 PM&lt;br /&gt;
&lt;p&gt;DFSD&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;
</description></item></channel></rss>