RE: Document Classifier failing to classify unstructured docs

ranjitap410023 — Mon, 24 Mar 2025 02:07:39 GMT

Thank you! I started to that and i can see the metrics now changing!

RE: Document Classifier failing to classify unstructured docs

Louis Prensky — Thu, 20 Mar 2025 14:42:31 GMT

Hi ranjitap410023 cross-posting this response from a similar thread:

Thanks for reaching out about this! I would recommend the following:
1) Train the model with an equivalent number of documents for each document type. For example, I would add more Statuatory Docs so that there are ~50 in the training set. This is best practice, even if you expect the distribution of documents in production to be mostly Medical Certs.
2) If you expect a variety of random documents that do not need to be handled in your downstream process, create a "Random/Invalid" document type. When providing training data for this document type, try to include as broad of a set of documents as possible so that it is representative of the documents that will be ingested in production.

Please let me know if you have any questions!