I’ve recently been doing some work building RESTFul API’s backed by a Mark Logic XML Content Store utilising XQuery for document retrieval. This post details the steps involved in tuning what was deemed to be the most simplest of queries for optimum performance using some useful Mark Logic extensions for profiling.
XQuery for looking up documents based on the value of a given attribute in the xml using XPath. (What could be simpler!)
By default Mark Logic indexes the document structure and attributes are indexed by default as part of the universal index.
1. Adding xdmp:query-meters
By adding xdmp:query-meters() to the query we get some immediate feedback about how the query performs including elapsed time and the number of fragments and documents that were selected. Altering the above query as below gives us some interesting metrics and the query is taking nearly 2 seconds.
Immediately something looks a bit suspicious as all the documents in the database are being returned which would indicate that the query is not making effective use of Mark Logic’s Indexes.
2. Verifying with xdmp:estimate
This can be verified with xdmp:estimate(), purely focusing on the XPath part of the query e.g.
The evaluator sees the XPath expression above and uses index lookup’s to match some sequence of fragments in the database. xdmp:estimate() gives an estimate of the number of documents in a sequence and is directed at the index-lookup phase, i.e “search”.
Next, the evaluator will fetch those matching fragment(s), if any, from the database. Now we are back in the evaluation phase. It will check to make sure the nodes really match: this is known as “filtering”. Then it will evaluate the entire XPath.
So what we are saying for the xquery above is that the number of matching fragments is all the documents in the database which will then get filtered so we are not making use of any of available Mark Logic indexing which means the query is very inefficient.
3. Looking at the query plan with xdmp:plan
This further verifies that all the documents in the database are being selected and we are not fully leveraging indexes
Looking at the XPath in more detail
/* accesses the entire database and returns every root element in the database, but we do it a second time in the predicate which is very expensive.
Changing the XPath to below and re-running the above steps results in a much more positive result, and look how quick the query is!
Plugging in xinc:node-expand
So far we have done our query evaluation ignoring the final piece which is to plugin the marklogic xinc:node-expand function which will resolves any x:includes in the results
Running the original query using xinc:node-expand
Not cached – 6 seconds!!
Cached – 2 seconds
With our new optimised query we can see the time is much reduced below. This is a Mark Logic extension so we can’t really do much about the performance of this. However it is interesting to see how much additional time this adds to the processing even for a fully optimised query.
From the above it is easy to see the majority of the query is spent in the xinc:node-expand function but we have increased the overall performance dramatically.
Even what is deemed to be the simplest of xquery/xpath expressions might be inefficient. Mark Logic won’t tell you how to fix your xquery/xpath but it will provide insight into whether your query is utilizing indexes and how it is actually running.