We are currently performing maintenance on Appian Community. As a result, discussions posts and replies are temporarily unavailable. We appreciate your patience.

Is there a way of searching for a certain paragraph in a list of paragraph

Certified Associate Developer

I have a list of paragraph  in one of those  there should be two key words  in separate  places  could  be one before another  for example sometimes  could be a text like this " hi  carls your property is  ready   you can verify in  96 days"  or the other way around " in 96  days  your new property would be ready" in this  case i would like to be able to find this two key word    and    get the exact paragraph  the key words  that will always be in this scenario are "property" and "## days" cause there are some paragraph that  could have one of this words but when we find them combined  it is a match. i was trying with regexarrayindexfirstmatch  but i could not make it work also i tried using  like() function  and find().

another thing all this  paragraph are coming form a PDF  file with  20 pages  more or less.and I'm getting them with the getpdftext() function.

is there any function or something that  could help  achieve this  efficiently cause  the only way  i find is looping a lot character by character in every single paragraph .

thanks in advance!

  Discussion posts and replies are publicly visible

  • Hi,

    A solution (very dependent on the PDF formatting,tho) is to try to split the paragraphs using split(text,char(10)). If this works for you pdf, you could then loop on the paragraphs and check for both keyword in an AND statement to find the matching paragraph

  • 0
    Certified Associate Developer
    in reply to Gabriele Camilli

    yes i got you but in my case   is not like i'm  looking for  an especific number  could be any number from  0  to 999  for example then is follows by the word days and  and also the paragraph need to  look for the other  word

  • Then one of the Find can be replaced by a regexmatch expression, to find number+" fixedstring", it seems that you have access to the plugin, right?

  • 0
    Certified Associate Developer
    in reply to Gabriele Camilli

    a!localVariables(
      /*local!test:*/
      /*"hiugjkindiaChinaFrance (8) days aklsjdf;lajsdlkfja",*/
      /*local!prueba: regexfirstmatch( "^[(]\d{1,3}[)]\s(days)$", local!test  ),*/
      
      
      
      local!a: "sECTION 7.6 The Accountant. The Partners shall agree upon a mutually acceptable
    accountant to be the initial accountant and auditor for the Company and each Subsidiary Entity
    (the “Accountant”). The fees and expenses of the Accountant shall be a Company expense.
    SECTION 7.7 Company Audit. Subject to Section 5.4(c) and Section 7.4(e), Investor
    
    Partner (or its Affiliate) is hereby designated as the partnership representative of the Company,
    in accordance with Section 6223 of the Code and any similar provision under any state or local
    tax laws, and all decisions and elections of Investor Partner as such are subject to General,
    
      Partner’s prior written consent (not to be unreasonably withheld, conditioned or delayed). The
    partnership representative, subject in all cases to the provisions of Section 6.2(a)(xxi) and
    Section 6.2(a)(xxii), shall be authorized to take any actions necessary with respect to any audit,
    examination or investigation (including any judicial or administrative proceeding) of the
    
    Company by any U.S. federal, state or local or non-U.S. taxing authority. Each Partner shall
    keep the other Partners informed of the progress of any tax audits or examinations. Each Partner
    shall give prompt notice to each other Partner of any and all notices it receives from the Internal
    Revenue Service concerning the Company or any Subsidiary Entity, including any notice of
    audit, any notice of action with respect to a revenue agent’s report, any notice of a thirty (30) days
    appeal letter and any notice of a deficiency in tax concerning the Company’s and or any
    Subsidiary Entity’s federal income tax return. Each Partner shall, at the Company’s expense,
    furnish each other Partner with status
    
    hiugjkindiaChinaFrance (8) days aklsjdf;lajsdlkfja;
    a,sdjflkajs (88) days dlk
    akjsdl;fja",
     local!b:split( local!a ,  char(10)&char(10) ),
     
     
     
     local!arrayvars:a!forEach(
       
       items: local!b,
       expression: 
       
       if( and(regexmatch( "^[(]\d{1,3}[)]\s(days)$", fv!item   )=true(), find( "hiugjkindiaChinaFrance", fv!item,0 )>0  )
       
       
       ,
       
       fv!item
       
       ,
       
       {}
       
       
     )
     )
    
     ,
      
      
      
     
      
     
     local!arrayvars
    )

    i made this  code for testing purpose but  is not working for the regex plugin or either i have some trouble using it cause   it never matches 

  • 0
    Certified Associate Developer
    in reply to felixr

    sorry i was looking for it in the wrong way now i fixed it  with this regexfirstmatch( ".*[(]\d{1,3}[)]\s(days).*", local!test  ), but a i have a question how do you  see an implementation like  this    in a production  environment   having documents with hundreds of pages  and    hundreds of  user uploading pdf documents  trying to extract this  kind of data

  • If the documents aren't fairly big (20 pages as in the first post) this should be manageable. However this is not something that scales well. The next step could be to use RPA or IDP, this solution scale better on repetitive and large tasks