aboutsummaryrefslogtreecommitdiff
Hi!

This work represents an interesting idea. That is, what would the performance improvement be 
using OpenCL/Renderscript to accelerate sqlite on an ARM based device. (Yes non-ARM 
counts too.)

Goals:
0) stay compatible with SQLite APIs
1) Accelerate SQLite for the general case.
  a) SQL -> machine generated OpenCL
  b) SQL -> HSAIL / SPIR
  c) SQL -> Renderscript
3) Support running on Android, Linux and OSX.
4) Support use of Renderscript on Android

Much of what is possible isn't tied to sqlite and could easily work for other databases or
object data stores. 

You can find me on irc via irc.freenode.net. (Look for tgall, tgall_foo or Dr_Who) I'm
usually in #linaro, #linaro-gfx as well as other channels.

I'm also tom_gall on twitter and g+ too. My blog can be found at 
http://fullshovel.wordpress.com. I post performance fndings from time to time.

-----------------------------------------------------------------------------------------

v.01
Initial git commit (and push)

It is in every way an early prototype. It has bugs. It is incomplete. It does not 
solve general purpose problems.

13 test sql statements run and yield positive results. 
I'm in the midst of converting over to make more use of vectors which is yielding 
even more performance.

Known bugs:
	- vectorized versions sql1rc.cl sql3rc.cl both miss two rows that they should be
	  matching. 

-----------------------------------------------------------------------------------------

Build:

./build.sh to build. Note the build system sucks. And when I say sucks I really mean it.
I'm basically hard locked to the Mali device drivers on a chromebook but obviously it
would be really simple to adjust that for your system.

This will build sq-cl and sq-cl.dbg

sq-cl is nothing more than a driver for the 13 test sql statements and depending on
parameters passed running an opencl implementation of one of those test statements.

Run:

This is REALLY rough right now and will strongly change as things become more
general purposes. I needed a way to drive a sort of API design. It's ugly now
in order to feel out what that might look like on the journey to address the 
general case.

Ex: sq-cl sql1.cl 1 1 1 0 0 0 0 s

The first param is the name of the currently hand coded OpenCL kernel to use.

The next 7 numbers are a bitmask about what columns the query will use from the
database which by the way is currently hard coded. 1, use that column. 0 don't.

The next value is either s or f. Slow or Fast. It's use 64 or 128 Work Units.

For sql11-sql13 there is an additional parameter due to the different type of 
query being used. a b c. 

For sql1rc.cl, sql3rc.cl and other OpenCL kernels that use the faster vector approach
use the parameter d. 

-----------------------------------------------------------------------------------------

Where from here?

- Convert the rest of the 13 tests sql statements to vectorized versions.
- Convert to using column shards for everything. 
- Start to hook into sqlite's SQL engine. 
	- Generate skeleton OpenCL kernels.
- Determine where break even point is for shipping operations to the GPU.
- Clean up, resource clean up. 
- Deal with 64 types properly on 64 bit systems.
- Add use of the autotools or CMake.

O. I take patches. Please. Seriously. 

-----------------------------------------------------------------------------------------

Thank you.

Thanks to Peter Bakkum and Kevin Skadron for their Cuda based accelerated SQLite. Your
paper http://www.cs.virginia.edu/~skadron/Papers/bakkum_sqlite_tr.pdf inspried me to
attempt this.

Thanks to David Rusling, Mark Orvek and the Linaro TSC for supporting this project.

Thanks to Gil Pitney from TI and Show Liu from Fujitsu with whom make up the other
members of the GPGPU subteam at Linaro.

Cheers!
Tom
Graphics Working Group Tech Lead, Linaro