An interesting data structure with search time O(sqrt n)
Andrei Alexandrescu via Digitalmars-d
digitalmars-d at puremagic.com
Mon Nov 30 12:13:11 PST 2015
Okasaki's book is a continued inspiration of data structures and algorithms.
I was thinking along the following lines: typical collections are
searchable in linear time. Then we have highly structured collections
that feature logarithmic search time. But there seems to be nothing in
between. So I was thinking, what would be a data structure that allows
O(sqrt n) search times?
After a little tinkering, here's what I came up with.
Start with a simple array of data. Then mentally decompose that array
into a concatenation of smaller arrays: first has size 1, second has
size 4, third has size 9, then 16, 25, 36, ... Generally the size of
these imaginary subarrays grows quadratically. And they're adjacent to
each other. The last array may be incomplete.
Example: we decompose an array of size 35 into: an array of size 1
followed by arrays of sizes 4, 9, 16, 5 (last being an incomplete
fragment of an array of size 25).
Now each of these arrays we organize as a max heap. Moreover, we arrange
data such that the maximums of these heaps are in INCREASING order. That
means the smallest element of the entire (initial) array is at the first
position, then followed by the next 4 smallest organized in a max heap,
followed by the next 9 smallest organized in a max heap, ... etc. There
are a total of O(sqrt n) heaps, and each has O(sqrt n) elements.
That's the layout! Now, to search we look at one heap at a time.
Whenever the maximum element (first element in the subarray) is smaller
than the searched value, we skip over that entire heap and go to the
next one. In the worst case we'll skip O(sqrt n) heaps. When the max
value in a heap is less than the searched element, we found the heap and
we run a linear search among O(sqrt n) elements.
The structure is nice for sorting and in particular parallel sorting
because each subarray can be sorted independently - there's no migration
into or out of each subarray. Inside each subarray, of course heapsort
would be a good choice because it takes advantage of the existing max
heap.
I haven't worked out the math for insertion and deletion yet, but they
seem to also be around O(sqrt n) or O(log(n) * sqrt(n)). Just wanted to
share with you and discuss what seems to be an interesting data structure.
Please share any thoughts!
Andrei
More information about the Digitalmars-d
mailing list