Microscope technology has evolved significantly since Antonie van Leeuwenhoek. The devices can be separated into optical microscopes (also referred to as "light microscope"), electron microscopes (e.g., TEM) and Scanning Probe Microscopes (SPM). For example, the SPMs produce surface images of specimens using a physical probe to scan a specimen line-by-line and note detected events.
Our frame of reference are the images generated by the Time-of-Flight Secondary Ion Mass Spectrometry (TOF-SIMS), which belongs to the class of SPMs. TOF-SIMS fires a pulsed primary ion beam to desorb and ionize species from a sample surface. This causes some secondary ions accelerating into a mass spectrometer. The flight times of the secondary ions from the sample surface to the detector are measured. TOF-SIMS divides a specimen into a number of blocks of fixed size and scans one block at a time. However, the number of times a block is scaned and the order in which the blocks are scaned are undetermined. The data model for the TOF-SIMS data is a regular grid with at each grid element an arbitrary number of measured events. Currently, scanning takes up to several weeks and produces images of gigabytes, which resolutions are relatively low in the microbiological world. Future scans to produce high resulotion images are expected to take months and the resulting image size is unknown yet. Post-processing of current images is already severely hindered by the size and complexity of the events.
Data model
For images of the current resolution, the data model is an 2-dimentional array of up to 2^15 x 2^15 ≈ 1 billion cells (the number of dimentions will not change in higher resulotion images). Each cell may contain a list of an arbitrary number of events which are measured TOF values of the ion probes. On average, 20% of the cells in an image have one or more events. The data distribution is rather skewed. Normally, a cell has 2 or 3 events, however, some cells have several tens of thousands of events.
The database challenges:
- Find the best way(s) to store the microscopy images, especially, how to deal with the high irregularity of the sizes of individual cells.
- Operator set: the approach hitherto to visualize the data obtained from microscopic imaging is hampered by the size and the need for spatial aggregation. Unfortunately, the Jim Gray 20-query test suite is not known yet.
Microscopy demo
The microscopy demo is used as a driver for the SciQL implementation. It stresses the handling of large amounts of events in a grid. It is unclear yet how a column store can cope with this.
Visualisation of the microscopic data determines to a large extend the query requirements. Histograms of either regularly formed areas or free-hand selected areas constitute an important class of queries. Point queries is another important class of queries, which often come with a context. A multidimensional grid overlaying the image provides a convenient way to access the data. Data is aggregated over the grid. The bottom is formed by events characterised by (x,y,value) where (x,y) does not form a key.