AI Skill - Document Extraction: Files with pages with too many tokens (Warning)

Appian Boy

Certified Associate Developer

over 1 year ago

What does the below warning mean in AI skills document extraction?

Files with pages with too many tokens

These files contained pages with too many tokens (token limit: 512) for the model to process. Labeled entities on these pages may not be included in training if they fall beyond the token limit.

Discussion posts and replies are publicly visible

0 Louis Prensky
Appian Employee
over 1 year ago

Hi Appian Boy , this warning will be thrown whenever a document contains pages with dense text. The behavior is as follows:
If a page contains over 512 tokens (~500 words), than any field you have labeled that comes after the 512th token will not be included when training the model. If you have some fields in your model that have particularly low recall while others are much higher, than the low recall fields are likely impacted by this warning.

We are evaluating options for raising this limit so that customers do not encounter it.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel