Record vs CDT. Performance comparison

Hi,

I've have a use case where the customer load some data through uploading to Appian an excel sheet. The requirement is implement the load process with two main steps:

1) Load excel data to temporary tables, validate data and inform user about the data read, number records to be updated/inserted, errors found if any.

2) User gets these information on a form where he can confirm or cancel data load.

In order to implement step 1, I've written a rule expression that loop on the data saved on temporary tables and make some validations. One of this validatiions is check for the existence of every loaded Id in the current data, in order to know if we have a create a new record or we have to update an existing one. Lets call this rule checkRule, full code follows,:

a!localVariables(
  local!cargaUFGs: rule!TGSS_GetDatosCargaByDSE(dse: cons!TGSS_ENT_CARGA_UFG),
  local!idsUFGsActuales: tointeger(index(rule!TGSS_GetUFGS(pagingInfo: null),"idCliente",{})),
  cast(
    'type!{urn:com:appian:types:TGSS}TGSS_Mensaje?list',
    a!forEach(
      local!cargaUFGs,
      {
        if(
          or(isnull(fv!item.descripcion),isnull(fv!item.funcion)),
          'type!{urn:com:appian:types:TGSS}TGSS_Mensaje'(
            tipo: "error",
            entidad: "ufg",
            accion: "",
            texto: "El registro " & fv!item.identificador & " contiene datos nulos"
          ),
          {}
        ),
        if(
          contains(local!idsUFGsActuales,tointeger(fv!item.identificador)),
          'type!{urn:com:appian:types:TGSS}TGSS_Mensaje'(
            tipo: "info",
            entidad: "ufg",
            accion: "update",
            texto: fv!item.identificador 
          ),
          'type!{urn:com:appian:types:TGSS}TGSS_Mensaje'(
            tipo: "info",
            entidad: "ufg",
            accion: "insert",
            texto: fv!item.identificador 
          )  
        )
      }
    )
  )
)

My doubt is about the rule used in the third line above: rule!TGSS_GetUFGs.

This rule gets a list of current records of UFG data entity, and I made two versions: one that use queryRecordType to gets the data from UFG record type (the record is based on database table), one that use queryEntity for gets the data from the corresponding UFG data type (the one that is UFG record based on).

So, I compare performance of checkRule using queryRecordType and using queryEntity, and what I get is not what I expect.

The response time using queryEntity es five times shorter than using queryRecordType.

Is that a normal outcome? I was expecting that record performance were faster than CDT, because they are based on a memory database, which is supposed to be faster when your are querying.

I appreciate some explanation of that outcome, it is likely that I'm getting something wrong hear, but defintely I do not understand this difference of performance.

Regards

  Discussion posts and replies are publicly visible

Parents
  • Can you share the expression for each of the queries you tried? Also can you show what times you're seeing for each of the queries?

  • 0
    Certified Lead Developer
    in reply to Peter Lewis

    Hi Peter,

    Thank for your response.

    This is the record based expression, named TGSS_GetUFGs

    a!localVariables(
      local!dataSS: a!queryRecordType(
        recordType: 'recordType!{49732679-857b-4f64-bbb0-f196c1241569}TGSS Ufg',
        pagingInfo: if(
          isnull(ri!pagingInfo),
          a!pagingInfo(
            startIndex: 1,
            batchSize: 5000,
            sort:a!sortInfo(field: 'recordType!{49732679-857b-4f64-bbb0-f196c1241569}TGSS Ufg.fields.{1dfaedb3-1144-41d4-879e-7c66c42217a2}idUfg', ascending: true)
          ),
          ri!pagingInfo
        ),
        fetchTotalCount: if(
          isnull(ri!pagingInfo),
          false,
          true
        )
      ),
      
      if(
        isnull(ri!pagingInfo),
        cast(
          'type!{urn:com:appian:types:TGSS}TGSS_Ufg?list',
          local!dataSS.data
        ),
        local!dataSS
      )
    )

    And this is the CDT based expresion, named TGSS_GetUFGScdt

    a!localVariables(
      local!dataSS: a!queryEntity(
        entity: cons!TGSS_ENT_UFG,
        query: a!query(
          pagingInfo: if(
            isnull(ri!pagingInfo),
            a!pagingInfo(
              startIndex: 1,
              batchSize: -1,
              sort:a!sortInfo(field: "idUfg", ascending: true)
            ),
            ri!pagingInfo
          )
        ),
        fetchTotalCount: if(
          isnull(ri!pagingInfo),
          false,
          true
        )
      ),
    
      if(
        isnull(ri!pagingInfo),
        cast(
          'type!{urn:com:appian:types:TGSS}TGSS_Ufg?list',
          local!dataSS.data
        ),
        local!dataSS
      )
    )

    About execution times, I get the following ones in ten executions in a row (times in ms):

    - Record based. TGSS_GetUFGs: 78, 113, 59, 106, 61, 177, 70, 111, 79, 78

    - CDT based. TGSS_GetUFGScdt: 9, 10, 8, 9, 59, 19, 22, 8, 42, 8

    As you can see the difference is remarkable.

    Just in case it helps, I want you to know that the database table is very small right now, it has just 5 records. I'm wondering if it is possible that execution times be longer in record based with few records, but it stay similar when we reach big number of records in the base table. It's just an idea.

    Regards

  • Thanks for the context! As Chris mentioned below in some cases with only a few rows you may see higher times for a!queryRecordType() because it involves some additional overhead for the query. However, as you return more rows or perform more complex queries, you should see the gap eliminated.

    Can you also clarify - are you using synced records for the a!queryRecordType() case? I would generally expect that a!queryRecordType() against a synced source would be faster than a!queryRecordType() against a non-synced source.

    In addition, different queries will have different effects. For example, your current queries are returning all of the columns; however, in your comment above you mentioned that you were using this expression rule to identify the IDs that have changed. If you reduce your query to only return the IDs, you will likely see faster response times as well.

Reply
  • Thanks for the context! As Chris mentioned below in some cases with only a few rows you may see higher times for a!queryRecordType() because it involves some additional overhead for the query. However, as you return more rows or perform more complex queries, you should see the gap eliminated.

    Can you also clarify - are you using synced records for the a!queryRecordType() case? I would generally expect that a!queryRecordType() against a synced source would be faster than a!queryRecordType() against a non-synced source.

    In addition, different queries will have different effects. For example, your current queries are returning all of the columns; however, in your comment above you mentioned that you were using this expression rule to identify the IDs that have changed. If you reduce your query to only return the IDs, you will likely see faster response times as well.

Children
  • 0
    Certified Lead Developer
    in reply to Peter Lewis

    are you using synced records for the a!queryRecordType() case?  Yes, my record is synced.

    If you reduce your query to only return the IDs, you will likely see faster response times   I agree with that Peter, but it will be faster for both queries, queryRecordType and queryEntity. So, it will keep the balance on queryEntity side, I think.

    I'll do some testing with a higher number of records on base table, to check if queryRecordType performance is better in that case than the one of queryEntity. If it is not, it will be a key point to make the decision about use or not record approach for a pure database source.

    Thank you

  • You're correct that it will be faster for both queries, but the impact will likely be larger for a!queryRecordType() and I would suggest as a general practice to limit your queries to only return the data you need. I'd also caution you that there are a lot of things that go into performance - for instance is this test being done on a Dev site or a Test / Prod site? Often the usage on Dev sites is much different than on other sites so the performance may not be completely accurate.

    Another thing to keep in mind is the actual impact of these queries - how is this query being used in your application? Is this a part of many queries on an interface or is this running in a process? For instance suppose you are using this query on an interface that returns data from many different records and related records. In that case it is likely to be more beneficial to use synced record types and related records because you will see better relative performance with more complex queries in a!queryRecordType().