<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="https://community.appian.com/cfs-file/__key/system/syndication/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>AI Skill  - Document Extraction</title><link>https://community.appian.com/discussions/f/rules/30738/ai-skill---document-extraction</link><description>Hi Appian Community, hope you are doing well I have the following use case, The solution should be intelligent enough to capture the payslip documents from the end user and extract the data from the payslip document. As with Appian 23.4 release the AI</description><dc:language>en-US</dc:language><generator>Telligent Community 12</generator><item><title>RE: AI Skill  - Document Extraction</title><link>https://community.appian.com/thread/122404?ContentTypeID=1</link><pubDate>Tue, 28 Nov 2023 23:03:27 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:d422bc89-8cd0-4d4f-9331-35cb4c92378c</guid><dc:creator>Louis Prensky</dc:creator><description>&lt;p&gt;Hi &amp;nbsp;, in 23.4, you should be able to train a model that can extract data from documents with varying structures. To get good results, it is important that you provide a dataset that is representative of the formats you expect to see in production. However, the model does not necessarily need to have been trained on a document template to extract data from it; if it has seen a wide variety of examples during training time, it should be able to extrapolate to new examples. Here is our &lt;a href="https://docs.appian.com/suite/help/23.4/collect-data.html"&gt;documentation&lt;/a&gt; on building a representative dataset. I have also reached out to you directly to discuss your use case in more detail.&amp;nbsp;However, &amp;nbsp;is correct that you may get&amp;nbsp;more consistent results by training models on specific templates if there is a subset of templates that accounts for a large portion of your overall volume.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: AI Skill  - Document Extraction</title><link>https://community.appian.com/thread/122339?ContentTypeID=1</link><pubDate>Tue, 28 Nov 2023 09:21:49 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:d4e08875-f85a-42b5-94ee-740d43dfd59d</guid><dc:creator>Stefan Helzle</dc:creator><description>&lt;p&gt;Documents having bigger deviations in their structure are alway a challenge. I assume that not every document is different. Can you build classes and then use a classification model to identify variants? Then build an extraction for each class. That might be able to cover at least a bigger part of the documents.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item></channel></rss>