Extracting data from string (HTML tags)

I have a string which is in form of HTML tags. The header starts with activity and ends at comments. After that corresponding values of headers are enclosed with <tr> tags. is there any way where i can get the corresponding values(Ex: Activity="GRACE PERIOD ACCEPTED",trip ID= ...,Start/Arrival="2020-08-31 05:25",... etc )  and create a dictionary out of it ?



<table><tr style="background-color:#e2e2e2">
<th>Activity</th>
<th>Trip ID</th>
<th>Load ID</th>
<th>Stop #</th>
<th>Stop Name</th>
<th>Start/Arrival</th>
<th>End/Depart</th>
<th>Duration</th>
<th>Trailer#</th>
<th>Status</th>
<th>Comments</th>
</tr>
<tr style="background-color:#cad8ff">
<td>GRACE PERIOD REQUESTED</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>2020-08-31 05:25</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>request</td>
</tr>
<tr style="background-color:#cad8ff">
<td>GRACE PERIOD ACCEPTED</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>2020-08-31 05:30</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr style="background-color:#cad8ff">
<td >CHECKED IN*</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td >2020-08-31 05:31</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>

<tr style="background-color:#ffffbc">
<td style="padding: 3px">ASSIGNED</td>
<td>31081857</td>
<td></td>
<td></td>
<td></td>
<td>2020-08-31 05:31</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>

<tr style="background-color:#ffffbc">
<td>ACCEPTED</td>
<td>31081857</td>
<td></td>
<td></td>
<td></td>
<td>2020-08-31 05:32</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>


<tr style="background-color:#ffffff">
<td>PICK UP*</td>
<td>31081857</td>
<td>310818571</td>
<td>D028</td>
<td>DC Aspect Logistics</td>
<td></td>
<td>2020-08-31 05:32</td>
<td></td>
<td>1551749</td>
<td>EXECUTED</td>
<td></td>
</tr>

<tr style="background-color:#ffffff">
<td>DELIVERY*</td>
<td>31081857</td>
<td>310818571</td>
<td>0000001394</td>
<td>Baldree's NF Gananoque</td>
<td>2020-08-31 05:33</td>
<td>2020-08-31 05:33</td>
<td>00:00</td>
<td>1551749</td>
<td>EXECUTED</td>
<td></td>
</tr>

<tr style="background-color:#ffffff">
<td>DELIVERY</td>
<td>31081857</td>
<td>310818571</td>
<td>0000004080</td>
<td>DS Andrew Emily's NF Picton</td>
<td></td>
<td></td>
<td></td>
<td>1551749</td>
<td>STOP NOT EXECUTED</td>
<td></td>
</tr>

<tr style="background-color:#ffffff">
<td>DELIVERY</td>
<td>31081857</td>
<td>310818571</td>
<td>0000003451</td>
<td>Andrew Emily's NF Picton</td>
<td></td>
<td></td>
<td></td>
<td>1551749</td>
<td>STOP NOT EXECUTED</td>
<td></td>
</tr>

<tr style="background-color:#ffffff">
<td>DELIVERY</td>
<td>31081857</td>
<td>310818571</td>
<td>0000004106</td>
<td>DS Smylie's YIG Trenton</td>
<td></td>
<td></td>
<td></td>
<td>1551749</td>
<td>STOP NOT EXECUTED</td>
<td></td>
</tr>

<tr style="background-color:#ffffff">
<td>DELIVERY</td>
<td>31081857</td>
<td>310818571</td>
<td>0000002608</td>
<td>Smylie's YIG Trenton</td>
<td></td>
<td></td>
<td></td>
<td>1551749</td>
<td>STOP NOT EXECUTED</td>
<td></td>
</tr>

<tr style="background-color:#ffffff">
<td>PICK UP</td>
<td>31081857</td>
<td>310818572</td>
<td>0000002608</td>
<td>Smylie's YIG Trenton</td>
<td></td>
<td></td>
<td></td>
<td>1551749</td>
<td>STOP NOT EXECUTED</td>
<td></td>
</tr>

<tr style="background-color:#ffffff">
<td>DELIVERY</td>
<td>31081857</td>
<td>310818572</td>
<td>D028</td>
<td>DC Aspect Logistics</td>
<td>2020-08-31 05:35</td>
<td>2020-08-31 05:36</td>
<td>00:01</td>
<td>1551749</td>
<td>EXECUTED</td>
<td></td>
</tr>

<tr style="background-color:#ffffff">
<td>PICK UP*</td>
<td>31081857</td>
<td>310818573</td>
<td>D028</td>
<td>DC Aspect Logistics</td>
<td>2020-08-31 05:35</td>
<td>2020-08-31 05:36</td>
<td>00:01</td>
<td>1551749</td>
<td>EXECUTED</td>
<td></td>
</tr>

<tr style="background-color:#ffffff">
<td>DELIVERY*</td>
<td>31081857</td>
<td>310818573</td>
<td>0000001394</td>
<td>Baldree's NF Gananoque</td>
<td>2020-08-31 05:36</td>
<td>2020-08-31 05:37</td>
<td>00:00</td>
<td>1551749</td>
<td>EXECUTED</td>
<td></td>
</tr>

<tr style="background-color:#ffffff">
<td>DELIVERY</td>
<td>31081857</td>
<td>310818573</td>
<td>0000004080</td>
<td>DS Andrew Emily's NF Picton</td>
<td></td>
<td></td>
<td></td>
<td>1551749</td>
<td>STOP NOT EXECUTED</td>
<td></td>
</tr>

<tr style="background-color:#ffffff">
<td>DELIVERY</td>
<td>31081857</td>
<td>310818573</td>
<td>0000003451</td>
<td>Akaash's NF Kingston</td>
<td></td>
<td></td>
<td></td>
<td>1551749</td>
<td>STOP NOT EXECUTED</td>
<td></td>
</tr>

<tr style="background-color:#ffffff">
<td>DELIVERY</td>
<td>31081857</td>
<td>310818573</td>
<td>0000004106</td>
<td>DS Smylie's YIG Trenton</td>
<td></td>
<td></td>
<td></td>
<td>1551749</td>
<td>STOP NOT EXECUTED</td>
<td></td>
</tr>

<tr style="background-color:#ffffff">
<td>DELIVERY</td>
<td>31081857</td>
<td>310818573</td>
<td>0000002608</td>
<td>Smylie's YIG Trenton</td>
<td></td>
<td></td>
<td></td>
<td>1551749</td>
<td>STOP NOT EXECUTED</td>
<td></td>
</tr>

<tr style="background-color:#ffffff">
<td>DOMICILE</td>
<td>31081857</td>
<td></td>
<td>D060</td>
<td>DC Ajax Bayly</td>
<td>2020-08-31 05:38</td>
<td></td>
<td></td>
<td></td>
<td>EXECUTED</td>
<td></td>
</tr>

<tr style="background-color:#cad8ff">
<td >CHECKED OUT</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td >2020-08-31 07:55</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</table>

  Discussion posts and replies are publicly visible

  • +1
    Certified Lead Developer

    As long as the syntax is fairly consistent throughout the HTML, I assume you can design scripting functionality to loop across the table headers, then loop across all table rows (and then columns).  It'd take some work and trial/error, but I've seen more complicated stuff accomplished with OOB appian functions.

    Edit: ok, curiosity got the better of me.  This code digests your original HTML into dictionaries with key/value pairing where each row value is matched with its appropriate column.  You might need to do some further work to process this into membership of a proper dictionary with hardcoded property names, though.

    a!localVariables(
      
      local!text: (/* put your HTML here */),
      
      local!rows: extract(
        local!text,
        "<tr", "</tr>"
      ),
      
      local!headers: extract(
        local!rows[1],
        "<th>", "</th>"
      ),
      
      local!dataRows: remove( local!rows, 1 ),
      
      a!forEach(
        local!dataRows,
        
        a!localVariables(
          local!cols: 
          a!forEach(
            ldrop(split(fv!item, "<td"), 1),
            extract(fv!item, ">", "</td>")[1]
          ),
          {
            row: fv!item,
            cols: a!forEach(
              local!headers,
              {
                header: fv!item,
                value: index(local!cols, fv!index, "invalid")
              }
            )
          }
        )
      )
    )