Web API Limitations

judym598

Certified Senior Developer

over 8 years ago

What's the best way to 'GET' a large amount of data via Web API? I initially created Web APIs that used the a!queryEntity function to pull incremental changes from several tables, but have been asked to revise them to pull all records from these tables. There are several tables that have ~ 1000 records, which is not a problem (yet), but others are quite large. In the testing I've done so far on one table, I am able to retrieve 15,000 records but any more returns an error. Is there a way to include a looping function in the API to retrieve multiple extracts if the number of records exceeds a certain count?

Discussion posts and replies are publicly visible

0 chetany
A Score Level 1
over 8 years ago

I think for 15,000 records the queryEntity returns data more than 1 MB and takes more than 10sec, and hence it timesout, which maybe causing an issue.
One solution would be - let the caller of the web api pass a startIndex and batchSize. In your web api you can extract those from the query params. And then you can send only requested amount of data rather than all of it.
The caller can then make another web api call with startIndex = oldStartIndex+batchSize. This will fetch the next batch of records. You will have to decide what batch Size is suitable.
So, in a way you will have multiple webapi calls to get all the data.

If on the other hand you want to get all the data in one-go, you will need to see if you can optimize at database level, maybe use indexes on proper columns, that may help with queryEntity timeout issue.
Another idea could be to return data in aggregate form - maybe create a DB View for the table which has lot of data. The View can aggregate data of multiple rows based on some column, this will reduce the number of rows returned. Then your web api will return less than 15000 rows. But, now the caller of the web api will need to apply some transformation on the data returned to get the exact data (since at DB side we grouped and aggregated the data in the View).
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 judym598
Certified Senior Developer
over 8 years ago

Thank you for the suggestion @chetany! I don't think the view option will work but maybe the other one will. I'm not sure how to let the caller of the web api pass a startIndex and batchSize but I'm guessing it would be similar to allowing them to provide a date range. The one problem with this is that we're going to try and set this up as a system call - not a user call (i.e., automatic retrieval of data).
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 reginaldm377 over 8 years ago

@judym598 I could be wrong but I think chetany meant your process would automatically do the multiple web service calls once the process is initiated (via looping) and not literally have the caller provide startIndex/batchSize for each loop.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 judym598
Certified Senior Developer
over 8 years ago

That would be great - but I don't know how I would incorporate the 'looping' part. Do you have an example?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 reginaldm377 over 8 years ago

@judym598 you'll have to play around with this tactic for a bit to make sure it works but i would start with creating the following variables in your process:

1. counter
2. startIndex
3. batchSize

For the 1st loop, set the counter to 1 and startIndex/batchSize accordingly (i.e. startIndex = 1, and batchSize = 3000). After the webservice call is done, append the data the webservice retrieved to another variable and do the following in a script task:

1.increment count by 1 (this'll be done each time a loop is done)
2.clear out pv!startIndex and append batchSize * pv!counter + 1 to startIndex (this is so each loop after the 1st loop knows to look at the next 3000 rows)

in your query entity.....
3.add logic that determines if pv!batchSize should be 3000 or 3000-1 if counter is equal to or greater than 2.
4.add logic that the startIndex will be 1 UNLESS counter is equal to or greater than 2.

exit this loop when counter = 5 (or greater...depends)
OR
when startindex = 15,001

-here's an example of what rows should be retrieved each loop
1st loop - 3000 (1-3000)
2nd loop - 3000 (3001-6000)
3rd loop - 3000 (6001-9000)
4th loop - 3000 (9001-12,000)
5th loop - 3000 (12,0001-15,000)

hope this helps. this solution is open to critique if there's a more efficient method.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 judym598
Certified Senior Developer
over 8 years ago

Thank you @reginaldm377 for your help. However, this Web API is not part of an Appian process. It would be invoked from outside of Appian. If it was in an Appian process model, I would definitely use the approach you suggested. I was wondering if there's a way to incorporate the 'looping' part in the Web API.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 chetany
A Score Level 1
over 8 years ago

@judym598, I think I have a solution based on idea of @reginaldm377.. Have one webapi which will just start a process(using a!startProcess) - this process instance should get the data in batches. Lets call this webAPI1. webAPI1 should return the processId to the caller.

The process started should execute and get the data in batches (by looping) using idea of @reginaldm377. And it should be stored in some CDT (list of CDT).

Then, have another webAPI - lets call this webAPI2. This webAPI2 will take a processInstance as input - the caller will pass the process instance which it got from response of webAPI1.

webAPI2 will just query that process instance (which will be passed as query param by the caller). It will check if the status of the process is complete, if it is complete it will get the data for the value of the list of CDT which it contains. If the status is complete, it means that the process has completed all the loops and extracted all the data in that multiple CDT.
You can use the a!queryProcessAnalytics function to check the status of the process.
To get the value of the multiple CDT PV, there is some plugin function which can allow you to get the value of PV when you pass the processInstanceId and the name of the PV.
The webAPI2 can then return this data. webAPI2 should return the data only if the status is "Complete" for the passed process instance - otherwise it will return null.
If the caller of webAPI2 gets a null, then its obvious that the looping has not yet completed, and the caller needs to call webAPI2 again after some time.

This sounds little complicated, but it should solve the looping problem in webAPI
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 judym598
Certified Senior Developer
over 8 years ago

I like it @chetany! That may work. I'll give it a shot. Thanks so much (all) for the follow-up!
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 chetany
A Score Level 1
over 8 years ago

Welcome
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 reginaldm377 over 8 years ago

No problem at all
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel