Query Entity fails when using operator "in" against a very large array. It succeeds if I leave out the operator and return every result.

Certified Senior Developer

I'm using a query against a table that contains a column with Appian usernames, and I am passing it a very large list of usernames to search for (3,900) using the "in" operator. When I call this query entity, I am getting the following error:

 

Expression evaluation error at function a!queryEntity: An error occurred while retrieving the data.

 
 
An example query entity that demonstrates the issue:
 

a!queryEntity(
  entity: cons!ENTITY_T_USER,
  query: a!query(
    filter: a!queryFilter(
      field: "UserName",
      operator: "in",
      value: touniformstring(enumerate(3000))
    ),
    pagingInfo: a!pagingInfo(
      startIndex: 1,
      batchSize: - 1
    )
  )
).data


If you play with the size of the enumerate, it will work when it's lower (i.e. 2,000 still works for me), but fail when it gets higher. Setting a batchSize doesn't help (since that would effect the number of results returned, but this is failing because a filter is too long).

I've been able to work-around this issue by instead removing the filter, and using looping functions to filter the results after I return everything. This doesn't seem ideal, since it's now a multi-step process (get everything from db, then filter it - rather than using the query to filter).

  Discussion posts and replies are publicly visible

  • There are a couple other options you could explore. Could you provide some information on your use case?
  • 0
    Certified Senior Developer
    in reply to Colton Beck
    Specifically, I am using this as part of synchronizing users between LDAP, Appian, and our datasource. So there is a nightly LDAP Synch process which pulls in users to Appian. After that runs, I need to determine which users to add or update in the database, so I need to query on all the usernames to get the CDTs that match those users that already have rows in the User table. (Then users that aren't found are created, and users that need to be updated are modified).

    Another approach I had looked into (and am using in some other places) was breaking down the list of usernames into smaller batches and querying multiple times and appending together all the results, but this seems even less desirable, since it creates multiple database reads where there used to be just one.
  • Given that this is nightly process that doesn't have user interaction, I don't think it's a big concern to make multiple trips to the database, so yes, you should try to break your list into smaller batches.
  • You could definitely use the batching approach. This number of usernames wouldn't create too much performance overhead. To improve your QE performance, you could add a selection parameter, so you don't retrieve all fields of the relevant table, only the one(s) you need to make this determination.

    Another option you could explore is using the Query Database smart service. This performs significantly better, but adds maintenance and data manipulation considerations.
  • 0
    A Score Level 2
    in reply to Jason Ruvinsky
    Out of curiosity, do you not have any other unique identifiers after the LDAP Synch process that would determine which users needs to be created/updated? I figured you might have already considered this but wanted to ask anyway.

    Also, given the number of users you have to sort through, you may consider having a stored procedure do the heavy lifting (via Execute Stored Procedure node in a process model) instead of doing so via query entity in Appian and have a split pathway; one pathway takes a subset of users that need to be updated and the other path creating new users (essentially 2 separate sub-processes).

    Regardless, since what you're doing is a nightly process, I wouldn't worry too much about multiple DB reads especially if it avoids timeouts/excessive use of system resources.
  • 0
    Certified Senior Developer
    in reply to Reggie
    The unique identifier is the username, since that is what Appian uses as the internal primary key for a user.

    Thanks all for the tips! I agree that since this is not running on a chain, and is scheduled to run after hours, performance isn't the biggest concern. Primarily, I wanted to bring to Appian's attention the limitations in query entities. It would also be useful to have a better error message indicating that it timed out, or even why (that the filter is too large).
  • It looks like timeout issue. Since you are passing large array of usernames, it is taking long time.

    The maximum amount of time in seconds that a query waits for a response from the database before timing out is configured using the following property:

    conf.data.query.timeout=10

    The default value is 10 (ten seconds).
  • Another approach is fetch all the Users from the User Table and use the appian function to retrieve the match Users based on UserName,

    =load(
    local!AllUsers : rule!GetAllUsers(),

    index(
    local!AllUsers,
    wherecontains(
    touniformstring(enumerate(3000)),
    tointeger(local!AllUsers.UserName)
    ),{}
    )

    )
  • 0
    Certified Senior Developer
    in reply to Vinay Kumar Rai
    My workaround is similar to this approach (I'm using filter() and a rule that checks on contains(), but the same idea).

    My main purpose of this thread is to bring to Appian's attention that the filter in a query entity has these limitations, and there are use cases to improve it. I shouldn't have to apply the filtering after querying on everything - it seems inefficient to pull back all results when I know I only want some of them, but somehow that has better performance than sending a large filter (which causes it to error out due to timing out).
  • 0
    Certified Senior Developer
    in reply to kondetiv
    We've already increased our timeout to 30 seconds. For now I'm having to use the work-around to do the filtering in the Appian expression instead of the query.