<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="https://community.appian.com/cfs-file/__key/system/syndication/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Document Classifier failing to classify unstructured docs</title><link>https://community.appian.com/discussions/f/rules/38724/document-classifier-failing-to-classify-unstructured-docs</link><description>Hi Team, 
 I’m using the Document Classifier to distinguish between Medical Certificates and a Statutory Document. I’ve trained the model with 21 Medical Certs and 12 Statutory Docs. The incoming documents are predominantly (80%) Medical Certs, followed</description><dc:language>en-US</dc:language><generator>Telligent Community 12</generator><item><title>RE: Document Classifier failing to classify unstructured docs</title><link>https://community.appian.com/thread/146486?ContentTypeID=1</link><pubDate>Mon, 24 Mar 2025 02:07:39 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:643206d0-576d-47ff-943a-e0ac0747daac</guid><dc:creator>ranjitap410023</dc:creator><description>&lt;p&gt;Thank you! I started to that and i can see the metrics now changing!&amp;nbsp;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Document Classifier failing to classify unstructured docs</title><link>https://community.appian.com/thread/146407?ContentTypeID=1</link><pubDate>Thu, 20 Mar 2025 14:42:31 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:2771072b-1298-477d-8845-8a6791b6d8b7</guid><dc:creator>Louis Prensky</dc:creator><description>&lt;p&gt;Hi &lt;a href="/members/ranjitap410023"&gt;ranjitap410023&lt;/a&gt;&amp;nbsp;cross-posting this response from a similar thread:&lt;/p&gt;
&lt;p&gt;Thanks for reaching out about this! I would recommend the following:&lt;br /&gt;1) Train the model with an equivalent number of documents for each document type. For example, I would add more&amp;nbsp;Statuatory Docs so that there are ~50 in the training set. This is best practice, even if you expect the distribution of documents in production to be mostly Medical Certs.&lt;br /&gt;2) If you expect a variety of random documents that do not need to be handled in your downstream process, create a &amp;quot;Random/Invalid&amp;quot; document type. When providing training data for this document type, try to include as broad of a set of documents as possible so that it is representative of the documents that will be ingested in production.&lt;/p&gt;
&lt;p&gt;Please let me know if you have any questions!&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item></channel></rss>