Skip to main content

Sparse Tables Range Query

Range query problems are generally solved by either Binary Indexed Tree or Segment Tree. But sometimes when total queries are very large then you need some another data structure such that query time is almost constant and prefetching or pre-processing time is \(N log (N)\). The data structure is Sparse Tables.





Assume a range query problem where \(10^7\) queries will be asked.
Problem Statement : You are given an array of size \(10^5\) and \(10^7\) queries to find the minimum of all the numbers in range \([L..R]\) then segment tree will be too slow, so we need something very fast.
In sparse table we use the concept of binary numbers (Binary Lifting to be specific).
The concept is to break down the complete linear array in chunks of powers of 2 and then utilize them to solve the queries.

For example - let array indexes be \(0,1,2,3,4\) then break the array in chunks of powers of two as below -

  • \(Row_0 : [0..0], [0..1], [0..3]\)
  • \(Row_1 : [1..1], [1..2], [1..4]\)
  • \(Row_2 : [2..2], [2..3]\)
  • \(Row_3 : [3..3], [3..4]\)
  • \(Row_4 : [4..4]\)
We need to pre process results for all these ranges above and if any query comes, we can utilize the pre-processed results to solve them fast. The main thing is how to break into these ranges and calculate the data efficiently for this and store.

To store values we will use a 2-D array where \(A[i][j]\) means the answer for range
\(i\) to \(i + 2^j- 1\).

Simulate the above formula and you will get the values which are given above. Also, we already know the value for \(A[i][0]\) because \(i + 2^0 – 1 = i\) itself ,which means that the answer is \(a[i]\).

Now we will be calculating answer in column major which means \(A[0][0]\) then \(A[1][0]\) then \(A[2][0]\) and so on. Then after that \(A[0][1],  A[1][1],  A[2][1]\) etc. and similarly for all other values.

So when you are going to calculate \(A[i][j]\) you can calculate it using the following formula -
$$A[i][j] = f(A[i][j - 1]  , A[i + 2^{j-1}][j - 1])$$
In this case \( f(x,y) \) is the minimum function.

This is correct because \(A[i][j - 1]\) stores answer for the range \([i.. i + 2^{j-1} - 1]\) and \(A[i + 2^{j - 1}][j - 1]\) stores answer for the range \([i + 2^{j-1}..i+2^{j-1} + 2^{j-1} - 1]\) or simply \([i + 2^{j-1} .. i + 2^j - 1]\). And if you combine both, you get the answer for the range \([i .. 2^{j} - 1]\) which is basically \(A[i][j]\).

I would suggest working on few examples before proceeding further. Also if you want an intution behind it, you can think it in this way - Suppose you want to jump \(2^x\) steps from any position i. Then you can do it by jumping \(2^{i-1}\) steps from position \(x\), and you will reach \(x + 2^{i-1}-1\). Now from that position again jump \(2^{i-1}\) steps, so you jump a total of \(2^{i-1} + 2^{i-1}\) steps.

This was the pre-processing step. Now we need to find a way how to answer the query in constant time.
To find the range minimum of the range \([L..R]\), we need two values \(x\)  and \(y\)  such that we can use the answer \(A[L][x]\) and answer of \(A[y][x]\) to solve the query \(L..R\) in constant time.

Note the following conditions while calculating 
\(x\) and \(y\) -
  • \(A[L][x]\) should cover a range which does not exceed \([L..R]\)
  • \(A[y][x]\) should cover a range which does not exceed \([L..R]\)
  • combination of \(A[L][x]\) and \(A[y][x]\) should cover the full range \([L..R]\)
So keeping the conditions above, \(x = log_2(R - L + 1)\) and \( y = R - 2^x + 1\).


Fit these values in the formula above and you will get a range that covers the complete range \([L..R]\). Note that some part of the range is repeated twice while covering, but since it is a range minimum function, it won't affect our answer.
Can we use sparse table to answer range sum queries in constant time ? Lets have a discussion in comments about it.


Code for Sparse table -
https://github.com/Apptica/Programming/blob/master/sparse_table.cpp

Practice -
https://www.codechef.com/problems/FRMQ

Comments