Scientific applications are still poorly served by contemporary relational database systems. At best, the system provides a bridge towards an external library using user-defined functions, explicit import/export facilities or linked-in Java/C# interpreters. Time has come to rectify this with SciQL, an SQL-based query language for science applications with arrays as first class citizens. A peek preview of the direction taken is given below. Detailed description of the language and its use cases (in, e.g., remote sensing, seismology and astronomy) can be found in the SciQL publications:

 

Key to success is a query language that achieves a true symbiosis of TABLE semantics with ARRAY semantics in the context of external software libraries. It led to the design of SciQL, where arrays are made first class citizens by enhancing the SQL framework along the innovative lines:

  • Seamless integration of array-, set-, and sequence- semantics.
  • Named dimensions with constraints as a declarative means for indexed access to array cells.
  • Structural grouping to access selective groups of cells based on positional relationships.
  • Adaptive storage, where the physical array storage is handled by an adaptive runtime system.

Arrays in SciQL are identified by explicitly named index attributes using DIMENSION constraints. Unlike a TABLE, every index combination denotes an array cell whose non-index value is either explicitly stored or derived from the attribute(s) DEFAULT clause. The array size is fixed if the DIMENSION clause limits it explicitly. The size of unbounded arrays are derived from the actual low/high index values in their representation. The index type can be any of the basic scalar types. The index attribute value NULL denotes the absence of a cell. At the logical level this flavor is indistinguishable from null valued attributes, but their underlying implementation may differ greatly.

Arrays may appear wherever a table expression is allowed in a SQL expression, producing an array if the target list contains index attributes. The SQL iterator semantics associated with TABLEs carry over to ARRAYs, but an iteration is confined to cells whose attributes are not null.

An important operation is to carve out an array slab for further processing. In SciQL we take it a step beyond the windowing features of SQL:2003 by identifying cell groups based on their index relationships. Such cell groups form a pattern, called a tile, which can be subsequently used in a GROUP BY clause to derive all possible tile ocv for statistical aggregation.

CREATE ARRAY stations {
    sid       integer DIMENSION[0:1024],
    name      string,
    latitude  integer,
    longitue  integer);

CREATE ARRAY seismic (
    event    timestamp DIMENSION['1988':*],
    station  integer DIMENSION REFERENCES stations.sid,
    value    integer
);


-- Array indexing and slicing

SELECT value FROM seismic['1989':'1989'] WHERE station='BEH';

-- Interpolation producing a new array

SELECT [(A.event + next(A.event))/2], (A[event].value + A[next(event)].value) / 2
FROM seismic AS A;

-- 2 x 2 tile based aggregation producing an array

SELECT [x], [y], avg(v) FROM matrix GROUP BY DISTINCT matrix[x:x+2][y:y+2];