Is there a way of searching for a certain paragraph in a list of paragraph

felixr

Certified Associate Developer

over 2 years ago

I have a list of paragraph in one of those there should be two key words in separate places could be one before another for example sometimes could be a text like this " hi carls your property is ready you can verify in 96 days" or the other way around " in 96 days your new property would be ready" in this case i would like to be able to find this two key word and get the exact paragraph the key words that will always be in this scenario are "property" and "## days" cause there are some paragraph that could have one of this words but when we find them combined it is a match. i was trying with regexarrayindexfirstmatch but i could not make it work also i tried using like() function and find().

another thing all this paragraph are coming form a PDF file with 20 pages more or less.and I'm getting them with the getpdftext() function.

is there any function or something that could help achieve this efficiently cause the only way i find is looping a lot character by character in every single paragraph .

thanks in advance!

Discussion posts and replies are publicly visible

Top Replies

+1 Gabriele Camilli
Appian Employee
over 2 years ago

Hi,

A solution (very dependent on the PDF formatting,tho) is to try to split the paragraphs using split(text,char(10)). If this works for you pdf, you could then loop on the paragraphs and check for both keyword in an AND statement to find the matching paragraph
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Reject Answer

Cancel
0 felixr
Certified Associate Developer
over 2 years ago in reply to Gabriele Camilli

yes i got you but in my case is not like i'm looking for an especific number could be any number from 0 to 999 for example then is follows by the word days and and also the paragraph need to look for the other word
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Gabriele Camilli
Appian Employee
over 2 years ago in reply to felixr

Then one of the Find can be replaced by a regexmatch expression, to find number+" fixedstring", it seems that you have access to the plugin, right?
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel

0 felixr

Certified Associate Developer

over 2 years ago in reply to Gabriele Camilli

a!localVariables(
  /*local!test:*/
  /*"hiugjkindiaChinaFrance (8) days aklsjdf;lajsdlkfja",*/
  /*local!prueba: regexfirstmatch( "^[(]\d{1,3}[)]\s(days)$", local!test  ),*/
  
  
  
  local!a: "sECTION 7.6 The Accountant. The Partners shall agree upon a mutually acceptable
accountant to be the initial accountant and auditor for the Company and each Subsidiary Entity
(the “Accountant”). The fees and expenses of the Accountant shall be a Company expense.
SECTION 7.7 Company Audit. Subject to Section 5.4(c) and Section 7.4(e), Investor

Partner (or its Affiliate) is hereby designated as the partnership representative of the Company,
in accordance with Section 6223 of the Code and any similar provision under any state or local
tax laws, and all decisions and elections of Investor Partner as such are subject to General,

  Partner’s prior written consent (not to be unreasonably withheld, conditioned or delayed). The
partnership representative, subject in all cases to the provisions of Section 6.2(a)(xxi) and
Section 6.2(a)(xxii), shall be authorized to take any actions necessary with respect to any audit,
examination or investigation (including any judicial or administrative proceeding) of the

Company by any U.S. federal, state or local or non-U.S. taxing authority. Each Partner shall
keep the other Partners informed of the progress of any tax audits or examinations. Each Partner
shall give prompt notice to each other Partner of any and all notices it receives from the Internal
Revenue Service concerning the Company or any Subsidiary Entity, including any notice of
audit, any notice of action with respect to a revenue agent’s report, any notice of a thirty (30) days
appeal letter and any notice of a deficiency in tax concerning the Company’s and or any
Subsidiary Entity’s federal income tax return. Each Partner shall, at the Company’s expense,
furnish each other Partner with status

hiugjkindiaChinaFrance (8) days aklsjdf;lajsdlkfja;
a,sdjflkajs (88) days dlk
akjsdl;fja",
 local!b:split( local!a ,  char(10)&char(10) ),
 
 
 
 local!arrayvars:a!forEach(
   
   items: local!b,
   expression: 
   
   if( and(regexmatch( "^[(]\d{1,3}[)]\s(days)$", fv!item   )=true(), find( "hiugjkindiaChinaFrance", fv!item,0 )>0  )
   
   
   ,
   
   fv!item
   
   ,
   
   {}
   
   
 )
 )

 ,
  
  
  
 
  
 
 local!arrayvars
)

i made this code for testing purpose but is not working for the regex plugin or either i have some trouble using it cause it never matches

0 felixr
Certified Associate Developer
over 2 years ago in reply to felixr

sorry i was looking for it in the wrong way now i fixed it with this regexfirstmatch( ".*[(]\d{1,3}[)]\s(days).*", local!test ), but a i have a question how do you see an implementation like this in a production environment having documents with hundreds of pages and hundreds of user uploading pdf documents trying to extract this kind of data
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Gabriele Camilli
Appian Employee
over 2 years ago in reply to felixr

If the documents aren't fairly big (20 pages as in the first post) this should be manageable. However this is not something that scales well. The next step could be to use RPA or IDP, this solution scale better on repetitive and large tasks
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel