Document Classifier is classifying incorrectly with high confidence score

Certified Associate Developer

Hi Team,

Hi Team,

I’m using the Document Classifier to distinguish between Medical Certificates and a Statutory Document. I’ve trained the model with 47 Medical Certs and 11 Statutory Docs. The incoming documents are predominantly (80%) Medical Certs, followed by Statutory Docs and other unrelated documents.

The goal is to classify incoming documents and extract data using AI only if the document is a Medical Certificate. However, I’ve observed that the model is classifying random, unrelated documents as Medical Certificates with high confidence.(note the one - "use arrays functions exercise" and "blood report")

Has anyone encountered a similar issue? Any suggestions on improving classification accuracy and reducing false positives? The metrics are not helping too, they are all reported as 1.000

  Discussion posts and replies are publicly visible