Data clustering - Classification Problem

Certified Senior Developer

I have a CDT with some kinds of information

Alle data in these cdt list have a start date and an end data. I have to find period overlaps. But I'm just resolved that problem. My req now is another, after I find all period that overlaps Ex: index of ordered by date array 2,3,4,5,6 of an array of 7 elements.

Now seeing data I know that invervals that overlaps are 2,with 3 and 4 and 5 with 6.

My problem is to find that kind of subgroups

2-3-4 and 5-6.

  Discussion posts and replies are publicly visible

  • 0
    Certified Senior Developer

    [ start_date:  01/01/2020   -   end_date: 31/01/2020 ]

    [ start_date:  01/03/2020   -   end_date: 31/03/2020 ]

    [ start_date:  29/03/2020   -   end_date: 02/04/2020 ]

    [ start_date:  01/04/2020   -   end_date: 30/04/2020 ]

    [ start_date:  01/05/2020   -   end_date: 31/05/2020 ]

    [ start_date:  18/05/2020   -   end_date: 31/05/2020 ]

    [ start_date:  01/06/2020   -   end_date: 30/06/2020 ]

    now i'm find that index of array that have overlaps are 2,3,4,5,6

    but i'm not capable to find that 2 overlaps with 3 and 4 and that 5 overlaps with 6

  • This may get you closer, this expression returns a list of maps containing overlaps.  Current version will list both ways, for instance that 2 overlaps with 3 and 3 overlaps with 2:

    a!localVariables(
      local!data: {
        a!map(start_date: todate("01/01/2020"), end_date: todate("01/31/2020")),
        a!map(start_date: todate("03/01/2020"), end_date: todate("03/31/2020")),
        a!map(start_date: todate("03/29/2020"), end_date: todate("04/02/2020")),
        a!map(start_date: todate("04/01/2020"), end_date: todate("04/30/2020")),
        a!map(start_date: todate("05/01/2020"), end_date: todate("05/31/2020")),
        a!map(start_date: todate("05/18/2020"), end_date: todate("05/31/2020")),
        a!map(start_date: todate("06/01/2020"), end_date: todate("06/33/2020"))
      },
      
      a!forEach(
        items: local!data,
        expression: {
          a!localVariables(
            local!index: fv!index,
            local!row: local!data[local!index],
            a!flatten(
              fn!reject(
                fn!isnull,
                a!forEach(
                  items: local!data,
                  expression: {
                    if(
                      fv!index=local!index,
                      null, /* do not review row against itself */
                      if(
                        or(
                          and( /* start date is within start/end for another row */
                            local!row.start_date>=fv!item.start_date,
                            local!row.start_date<=fv!item.end_date
                          ),
                          and( /* end date is within start/end for another row */
                            local!row.end_date>=fv!item.start_date,
                            local!row.end_date<=fv!item.end_date
                          ),
                          and( /* row starts before and ends after evaluating row */
                            local!row.start_date<fv!item.start_date,
                            local!row.end_date>fv!item.end_date
                          )
                        ),
                        a!map( /* return an overlap */
                          row: local!index,
                          overlapsWith: fv!index
                        ),
                        null /* no overlap */
                      )
                    )
                  }
                )
              )
            )
          )
        }
      )
    )

    We could refine further if you can define what you would like to see exactly for an output?  E.g. how are the "sub groups" to be found, any chain of overlaps essentially? 

  • 0
    Certified Senior Developer
    in reply to Chris

    thankyou soo much for your answer, meanwhile i'm arrived to find all the "who overlaps with who" the problem is that with thats information i have to find all the data subgroups. so for examples if 3 overlaps with 2 and with 4 index in the map, i have to find a structure that say me the who overlaps are 2,3,4 together for a group and 5,6 together for another group

  • 0
    Certified Lead Developer
    in reply to carminem0002

    Took me a while, but here you go. The trick is to use the reduce function which allows you to keep a "shared memory" and modify this during the loop iterations. Each iteration checks one item for an overlaps with the other items. If there is a overlap, I calculate a unique identifier and check whether this already exists in the list of overlaps. If no, I add it to the list, else I just increase the size.

    There is a related use case to calculate the IBAN checksum you can find in the forum as well.

    It requires two expressions:

    a!localVariables(
      local!data: {
        a!map(start_date: todate("01/01/2020"), end_date: todate("01/31/2020")),
        a!map(start_date: todate("03/01/2020"), end_date: todate("03/31/2020")),
        a!map(start_date: todate("03/29/2020"), end_date: todate("04/02/2020")),
        a!map(start_date: todate("04/01/2020"), end_date: todate("04/30/2020")),
        a!map(start_date: todate("05/01/2020"), end_date: todate("05/31/2020")),
        a!map(start_date: todate("05/18/2020"), end_date: todate("05/31/2020")),
        a!map(start_date: todate("06/01/2020"), end_date: todate("06/33/2020"))
      },
      reduce(
        rule!SSH_GroupOverlapsHelper(
          shared:_,
          item:_,
          data:_
        ),
        {},
        local!data,
        local!data,
      )
    )

    Rule inputs: shared(any), item(any), data(any):

    a!localVariables(
      local!overlaps: a!forEach(
        items: ri!data,
        expression: and(
          ri!item.start_date <= fv!item.end_date,
          ri!item.end_date >= fv!item.start_date
        )
      ),
      local!groupId: joinarray(where(local!overlaps), "-"),
      if(
        sum(local!overlaps) > 1,
        /* We have > 1 overlap */
        if(
          contains(touniformstring(index(ri!shared, "id", {})), local!groupId),
          /* Existing group */
          a!localVariables(
            local!index: lookup(ri!shared.id, local!groupId, 0),
            a!update(
              ri!shared,
              local!index,
              a!update(
                ri!shared[local!index],
                "size",
                ri!shared[local!index].size + 1,
              )
            )
          ),
          /* New group */
          append(
            ri!shared,
            a!map(
              id: local!groupId,
              ovs: ri!data[where(local!overlaps)],
              size: 1
            )
          )
        ),
        /* An item always overlaps with itself, so ignore that case */
        ri!shared
      )
    )

  • 0
    Certified Senior Developer
    in reply to Stefan Helzle

    Wow this is a precious contribute, and push me to better study the reduce function.

    But in a more articulated Manner i'm just do this, the problem is to have, for example:

    if

    1-2-3 overlaps eachother

    and then

    5-6

    With my solution and in yours we have 5 with 6 2 times, but remove duplication is not a problem (we can just use a union).

    the real problem is that i want to have only on times 1-2-3 and not

    1-2, 2-3 and then 1-2-3

    So if because of 2, 1 and 3 overlaps eachother i want to find only onetime 1-2-3 and not 1-2, 2-3 and then 1-2-3.

    Obv i want to have even 5-6 and so on, with all possibile combination.

    So a single entry for the maxium overlaps gradient of every single subgroup.

    I'm scaried about the possibility to try to apply matemathics Group Theory :\ 

  • 0
    Certified Lead Developer
    in reply to carminem0002

    I figured this solution wasn't quite what you were looking for. But you can now modify it to exactly match your use case. Adapt the logic in the helper expression directly.

  • 0
    Certified Senior Developer
    in reply to Stefan Helzle

    yes obviously it's a very good start and it's a smartes solution than mine.

    So thankyou soo much for sharing this solution with me