KPI on more than 5000 records

Certified Senior Developer

For a Dashboard with a series of KPI's we have to elaborate more than 5000 entries,

Because we have to calculate median, mode, min, max on a series on time range on a actions log.

How we can achieve that using only records and not doing some use of stored procedure?

  Discussion posts and replies are publicly visible

Parents
  • 0
    Certified Lead Developer

    Median can be determined if you have a totalCount already. You can set the startIndex of the query for exact middle element of your data using a!pagingInfo(startIndex: totalCount/2 , batchSize: 1). This requires a separate query before your KPI's load, however, but works just as well for a date/time field as it does for any other field.

    Mode is a query that groups on the column you need the mode from, and then select the data element with the highest count. This might be doable directly within the KPI element, but it's probably easier to do as a separate query. If you reverse sort by the alias that has the COUNT on it, then you only need a paging size of 1. However, it is very likely that a date time (that includes seconds) aggregation with COUNT will be prohibitively expensive, even with the interval field of the grouping function. So it's worth testing query performance on your actual dataset or a representation of it. If you find that the query generates slowness in the data fabric (even for other apps), it may be worthwhile to have a sync-time custom record field that rounds the timestamp to the nearest day, hour, or minute, depending on your requirements. Your KPI would determine the mode based upon the custom record field rather than the original.

    I know you didn't list this one, but it often comes up with other statistics. If you need standard deviation - this one is somewhat tricky to do with the Data Fabric and I wouldn't recommend it with a date/time field. I have accomplished this in the distant past with a database view and its built-in MySQL functions. The main issue with this approach is that the standard deviations could be calculated *before* Appian filters are applied - i.e. the standard deviation is calculated for all rows within the view for every query to the view, and then the filters for specific rows are applied. There are some SQL-specific workarounds that include wrapping the view in a stored procedure and other shenanigans. Hopefully your requirements don't send you down that rabbit hole.

Reply
  • 0
    Certified Lead Developer

    Median can be determined if you have a totalCount already. You can set the startIndex of the query for exact middle element of your data using a!pagingInfo(startIndex: totalCount/2 , batchSize: 1). This requires a separate query before your KPI's load, however, but works just as well for a date/time field as it does for any other field.

    Mode is a query that groups on the column you need the mode from, and then select the data element with the highest count. This might be doable directly within the KPI element, but it's probably easier to do as a separate query. If you reverse sort by the alias that has the COUNT on it, then you only need a paging size of 1. However, it is very likely that a date time (that includes seconds) aggregation with COUNT will be prohibitively expensive, even with the interval field of the grouping function. So it's worth testing query performance on your actual dataset or a representation of it. If you find that the query generates slowness in the data fabric (even for other apps), it may be worthwhile to have a sync-time custom record field that rounds the timestamp to the nearest day, hour, or minute, depending on your requirements. Your KPI would determine the mode based upon the custom record field rather than the original.

    I know you didn't list this one, but it often comes up with other statistics. If you need standard deviation - this one is somewhat tricky to do with the Data Fabric and I wouldn't recommend it with a date/time field. I have accomplished this in the distant past with a database view and its built-in MySQL functions. The main issue with this approach is that the standard deviations could be calculated *before* Appian filters are applied - i.e. the standard deviation is calculated for all rows within the view for every query to the view, and then the filters for specific rows are applied. There are some SQL-specific workarounds that include wrapping the view in a stored procedure and other shenanigans. Hopefully your requirements don't send you down that rabbit hole.

Children
No Data