AI Skill - Document Extraction: Files with pages with too many tokens (Warning)

Certified Associate Developer

What does the below warning mean in AI skills document extraction?

Files with pages with too many tokens

These files contained pages with too many tokens (token limit: 512) for the model to process. Labeled entities on these pages may not be included in training if they fall beyond the token limit.

  Discussion posts and replies are publicly visible

Parents
  • Hi  , this warning will be thrown whenever a document contains pages with dense text. The behavior is as follows:
    If a page contains over 512 tokens (~500 words), than any field you have labeled that comes after the 512th token will not be included when training the model. If you have some fields in your model that have particularly low recall while others are much higher, than the low recall fields are likely impacted by this warning.

    We are evaluating options for raising this limit so that customers do not encounter it.

Reply
  • Hi  , this warning will be thrown whenever a document contains pages with dense text. The behavior is as follows:
    If a page contains over 512 tokens (~500 words), than any field you have labeled that comes after the 512th token will not be included when training the model. If you have some fields in your model that have particularly low recall while others are much higher, than the low recall fields are likely impacted by this warning.

    We are evaluating options for raising this limit so that customers do not encounter it.

Children
No Data