Difference between revisions of "AVA SQL"

Latest revision as of 21:43, 15 May 2013

AVA Database follows SQL '92 standard. In addition, AVA's unique SQL adds new features that facilitate high speed analytics and data mining.

Some AVA SQL conventions

Keyword cases

Keywords are case-insensitive. Both the low case or upper case, or mixed case will point to the same keyword. However, the column names and table names are case-sensitive.

Column assignment

A column can be created from an SQL update. There are two ways to assign a resulting column to a named column:

SQL convention

The "AS" clause to assign the results to a column.

ava> update low * 123.1 as new_low from a
0.024 seconds elapsed.

Non-SQL convention

A shortcut to create a column is by the syntax: "COLUMN_NAME: ...":

ava> update new_low: low * 123.1 from a
0.029 seconds elapsed.

Both will generate the same result:

ava> select * from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |       open |       high |        low |      close |     volume |  adj_close |    new_low |
---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |      20.40 |      20.58 |      20.18 |      20.58 |     797800 |      16.87 |   2,484.16 |
2000/10/16 |   ss600018 |      20.16 |      20.50 |      20.15 |      20.20 |     346400 |      17.60 |   2,480.46 |
2000/10/16 |   ss600130 |      28.08 |      28.29 |      27.16 |      27.23 |     438300 |      24.89 |   3,343.40 |
0.001 seconds elapsed.

Analytics Functions

Mathematical Functions

Modulus ( % )

Take the modulus of the two numbers:

ava> select date, sec_id, ol: open % low from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |         ol |
---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |       0.00 |
2000/10/16 |   ss600018 |       0.00 |
2000/10/16 |   ss600130 |       1.00 |
0.036 seconds elapsed.

Power ( ^ )

Take the power of ( a real number ):

ava> select date, sec_id, o: open ^ 2 from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |          o |
---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |     416.16 |
2000/10/16 |   ss600018 |     406.43 |
2000/10/16 |   ss600130 |     788.49 |
0.081 seconds elapsed.

Factorial ( factorial )

Take factorial of an integer:

ava> select date, sec_id, f: factorial(5) * open  from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |          f |
---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |   2,448.00 |
2000/10/16 |   ss600018 |   2,419.20 |
2000/10/16 |   ss600130 |   3,369.60 |
0.02 seconds elapsed.

Other mathematical functions

ABS(): Absolute Value
EXP(): Exponential, e ( 2.72 ) to a power ( real number )
SIGN(): Sign of a number
LOG(): Log of e (2.72)
LOG10(): LOG of 10
SIN(): SIN
COS(): COS
TAN(): Tangent
ASIN(): Arc Sin
ACOS(): Arc Cos
ATAN(): Arc Tangent

Statistical Functions

SUM

Aggregate function. Sum over a group ( option )

ava> select s: sum(volume) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 1,718
---------------------------------------------------
    sec_id |          s |
---------- | ---------- |
  ss000001 |  407718300 |
  ss600000 |  424768664 |
  ss600001 |  481888440 |
0.29 seconds elapsed.

AVG

Aggregate function. Average over a group ( option )

ava> select s: avg(close) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 1,718
---------------------------------------------------
    sec_id |          s |
---------- | ---------- |
  ss000001 |   2,214.42 |
  ss600000 |      17.70 |
  ss600001 |       5.31 |
0.288 seconds elapsed.

STD

Aggregate function. Standard Deviation over a group ( option )

ava> select s: std(close) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 1,718
---------------------------------------------------
    sec_id |          s |
---------- | ---------- |
  ss000001 |   1,034.01 |
  ss600000 |      11.83 |
  ss600001 |       1.53 |
0.291 seconds elapsed.

MIN

Aggregate function. Minimum value over a group ( option )

ava> select s: min(close) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 1,718
---------------------------------------------------
    sec_id |          s |
---------- | ---------- |
  ss000001 |   1,011.50 |
  ss600000 |       6.48 |
  ss600001 |       2.72 |
0.292 seconds elapsed.

MAX

Aggregate function. Maximum value over a group ( option )

ava> select s: max(close) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 1,718
---------------------------------------------------
    sec_id |          s |
---------- | ---------- |
  ss000001 |   6,092.06 |
  ss600000 |      61.59 |
  ss600001 |      10.92 |
0.293 seconds elapsed.

Time Series Functions

CUMSUM

Cumulative sum. Sums over a pre-ordered data column.

ava> update cs: cumsum(open) from a group by sec_id
0.339 seconds elapsed.
ava> select date, sec_id, cs from a where sec_id = "ss600001"
TABLE NAME: a | NUMBER OF RECORDS: 1,709
---------------------------------------------------
      date |     sec_id |         cs |
---------- | ---------- | ---------- |
2003/01/01 |   ss600001 |       4.99 |
2003/01/02 |   ss600001 |       9.96 |
2003/01/03 |   ss600001 |      14.91 |
2003/01/06 |   ss600001 |      19.91 |
2003/01/07 |   ss600001 |      24.94 |
0.076 seconds elapsed.

To sort the table to a particular order:

ava> .sort a order by date desc
0.052 seconds elapsed.
ava> select * from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |       open |       high |        low |      close |     volume |  adj_close |
---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
2011/04/04 |   ss900946 |       0.47 |       0.47 |       0.47 |       0.47 |          0 |       0.47 |
2011/04/04 |   ss900940 |       0.23 |       0.23 |       0.23 |       0.23 |          0 |       0.23 |
2011/04/04 |   sz000863 |       3.50 |       3.50 |       3.50 |       3.50 |          0 |       3.50 |
0 seconds elapsed.

CUMPROD

Cumulative product. Product over a pre-ordered data column.

ava> update cp: cumprod(open) from a group by sec_id
0.588 seconds elapsed.

RANK

Rank the value from lowest to highest. It could be grouped by a combination of columns (optional).

ava> update rk: rank(volume) from a group by sec_id
0.701 seconds elapsed.
ava> select date, sec_id, volume, rk from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |     volume |         rk |
---------- | ---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |     797800 |        590 |
2000/10/16 |   ss600018 |     346400 |        265 |
2000/10/16 |   ss600130 |     438300 |        864 |
0 seconds elapsed.

WRANK

SYNTAX: wrank(COLUMN_NAME,BINS)

Rank a column and assign the ranks to bins.

ava> update rk: wrank(volume,10) from a group by sec_id
0.714 seconds elapsed.
ava> select date, sec_id, volume, rk from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |     volume |         rk |
---------- | ---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |     797800 |          3 |
2000/10/16 |   ss600018 |     346400 |          2 |
2000/10/16 |   ss600130 |     438300 |          4 |
0 seconds elapsed.

Window Functions

FIRST

Find the first value of a group (optional) by a predefined order.

ava> select date: first(date), open: first(open)  from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 1,718
---------------------------------------------------
    sec_id |       date |       open |
---------- | ---------- | ---------- |
  ss000001 | 2000/10/16 |   1,917.45 |
  ss600000 | 2003/01/01 |       9.86 |
  ss600001 | 2003/01/01 |       4.99 |
0.025 seconds elapsed.

LAST

Find the last value of a group (optional) by a predefined order.

ava> select date: first(date), open: last(open)  from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 1,718
---------------------------------------------------
    sec_id |       date |       open |
---------- | ---------- | ---------- |
  ss000001 | 2000/10/16 |   2,932.48 |
  ss600000 | 2003/01/01 |      13.60 |
  ss600001 | 2003/01/01 |       5.29 |
0.025 seconds elapsed.

LAG

SYNTAX: lag(COLUMN_NAME, NUM_ROWS)

Lag the data by a number of rows within a group (optional).

ava> update lag_open: lag(open,2) from a group by sec_id
0.347 seconds elapsed.

LEAD

SYNTAX: lead(COLUMN_NAME, NUM_ROWS)

Lead the data by a number of rows within a group (optional).

ava> update lead_open: lead(open,2) from a group by sec_id
0.336 seconds elapsed.

ava> select date, sec_id, open, lag_open, lead_open from a where sec_id = 'ss000001'
TABLE NAME: a | NUMBER OF RECORDS: 2,678
---------------------------------------------------
      date |     sec_id |       open |   lag_open |  lead_open |
---------- | ---------- | ---------- | ---------- | ---------- |
2000/10/16 |   ss000001 |   1,917.45 |        nan |   1,911.71 |
2000/10/17 |   ss000001 |   1,902.07 |        nan |   1,925.21 |
2000/10/18 |   ss000001 |   1,911.71 |   1,917.45 |   1,913.16 |
0.063 seconds elapsed.

MAVG

SYNTAX: mavg(COLUMN_NAME, WINDOW)

Compute the simple moving average within a moving window.

ava> update mavg_open: mavg(open,2)  from a group by sec_id
0.364 seconds elapsed.

MSUM

SYNTAX: msum(COLUMN_NAME, WINDOW)

Compute the moving sum within a moving window.

ava> update msum_open: msum(open,2)  from a group by sec_id
0.361 seconds elapsed.

MSTD

SYNTAX: mstd(COLUMN_NAME, WINDOW)

Compute the moving standard deviation within a moving window.

ava> update mstd_open: mstd(open,2)  from a group by sec_id
0.377 seconds elapsed.

ava> select date, sec_id, open, mavg_open, mstd_open, msum_open from a where sec_id = 'ss000001'
TABLE NAME: a | NUMBER OF RECORDS: 2,678
---------------------------------------------------
      date |     sec_id |       open |  mavg_open |  mstd_open |  msum_open |
---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
2000/10/16 |   ss000001 |   1,917.45 |   1,917.45 |        nan |   1,917.45 |
2000/10/17 |   ss000001 |   1,902.07 |   1,909.76 |      10.88 |   3,819.52 |
2000/10/18 |   ss000001 |   1,911.71 |   1,906.89 |       6.82 |   3,813.78 |

EMA

Exponential moving average. SYNTAX: ema(COLUMN_NAME, HALF_LIFE)

ava> update ema_open: ema(open, 0.1) from a group by sec_id
0.034 seconds elapsed.

Row-wise Aggregation Functions

The row-wise aggregate functions are shortcuts to the aggregate functions. The functions first perform aggregation for each group, and apply the aggregated value back to each row of the same group.

WSUM

row-wise filling value from SUM function.

ava> update mopen: wsum(open) from a group by sec_id
0.406 seconds elapsed.
ava> select date, sec_id, open, mopen from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |       open |      mopen |
---------- | ---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |      20.40 |  26,741.47 |
2000/10/16 |   ss600018 |      20.16 |  31,974.72 |
2000/10/16 |   ss600130 |      28.08 |  30,661.59 |
0.001 seconds elapsed.

ava> update open_demean: open - wsum(open) from a group by sec_id
0.428 seconds elapsed.

WAVG

row-wise filling value from AVG function.

ava> select date, sec_id, mopen: wavg(open) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |      mopen |
---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |      10.19 |
2000/10/16 |   ss600018 |      12.18 |
2000/10/16 |   ss600130 |      11.65 |
0.377 seconds elapsed.

WSTD

row-wise filling value from STD function.

ava> select date, sec_id, mopen: wstd(open) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |      mopen |
---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |       4.72 |
2000/10/16 |   ss600018 |       5.91 |
2000/10/16 |   ss600130 |       8.86 |
0.379 seconds elapsed.

WMIN

row-wise filling value from MIN function.

ava> select date, sec_id, mopen: wmin(open) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |      mopen |
---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |       3.70 |
2000/10/16 |   ss600018 |       3.19 |
2000/10/16 |   ss600130 |       1.70 |
0.38 seconds elapsed.

WMAX

row-wise filling value from MAX function.

ava> select date, sec_id, mopen: wmax(open) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |      mopen |
---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |      23.58 |
2000/10/16 |   ss600018 |      25.30 |
2000/10/16 |   ss600130 |      33.35 |
0.378 seconds elapsed.

WCOUNT

row-wise filling value from COUNT function.

ava> select date, sec_id, mopen: wcount(open) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |      mopen |
---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |       2624 |
2000/10/16 |   ss600018 |       2625 |
2000/10/16 |   ss600130 |       2633 |
0.286 seconds elapsed.

Table Join Functions

The current version only supports LEFT OUTER JOIN type of table joins. No index or key is necessary to perform the join since the table join algorithm is already optimized. This feature greatly enhances the performance of the database since the index creation of a traditional database is not only time-consuming, but memory and disk storage intensive as well.

The syntax for the table join is non-SQL standard, but should be straightforward.

SYNTAX: TABLE[COLUMN1, COLUMN2].TARGET_COLUMN

For example:

ava> create table b as ( select date, sec_id, open from a )
0.078 seconds elapsed.
ava> select * from b
TABLE NAME: b | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |       open |
---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |      20.40 |
2000/10/16 |   ss600018 |      20.16 |
2000/10/16 |   ss600130 |      28.08 |
0.001 seconds elapsed.

ava> update open2: b[date,sec_id].open from a
1.27 seconds elapsed.
ava> select date, sec_id, open, open2 from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |       open |      open2 |
---------- | ---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |      20.40 |      20.40 |
2000/10/16 |   ss600018 |      20.16 |      20.16 |
2000/10/16 |   ss600130 |      28.08 |      28.08 |
0.001 seconds elapsed.

The matching columns (keys) of the two tables ( in this example, "date,sec_id" ) have to be present on both tables and bears the same names and the same data types. If the keys are not unique on the joining table ( in this example, table "b"), then only the first value of the joining table for each key is taken ( a warning message will be issued).

The efficiency of the table join operation also depends on the natural order the keys. Random keys perform the worst and ordered keys the best.

Both hash-join and merge-sort joins are supported. Please refer to the AVA Database Management Reference for more details.

Data Conversion Functions

to_date

From a numerical value to date or from a string ( in the format of "YYYY-MM-DD" ) to date.

ava> select date, sec_id from a where date = to_date("2000-10-16")
TABLE NAME: a | NUMBER OF RECORDS: 100
---------------------------------------------------
      date |     sec_id |
---------- | ---------- |
2000/10/16 |   ss600008 |
2000/10/16 |   ss600018 |
2000/10/16 |   ss600130 |
0.019 seconds elapsed.

to_string

SYNTAX: to_string(COLUMN, NUM_CHARS).

From all value to VARCHAR type. The number of characters must be specified. Data will be truncated at the maximum number of characters.

ava> select date, sec_id, sec_id2: to_string(sec_id, 5) from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |    sec_id2 |
---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |      ss600 |
2000/10/16 |   ss600018 |      ss600 |
2000/10/16 |   ss600130 |      ss600 |
0.072 seconds elapsed.

to_time

From a numerical value to time or from a string ( in the format of "HH:MM:SS.000" ) to time.

ava> update time: to_time("10:10:10.999") from a
0.032 seconds elapsed.
ava> select date, time, sec_id from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |       time   |     sec_id |
---------- | ----------   | ---------- |
2000/10/16 | 10:10:10.999 |   ss600008 |
2000/10/16 | 10:10:10.999 |   ss600018 |
2000/10/16 | 10:10:10.999 |   ss600130 |
0.001 seconds elapsed.

to_long

From a numerical value to time or a string to LONG type.

ava> select date, sec_id, open, lopen: to_long(open) from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |       open |      lopen |
---------- | ---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |      20.40 |         20 |
2000/10/16 |   ss600018 |      20.16 |         20 |
2000/10/16 |   ss600130 |      28.08 |         28 |
0.012 seconds elapsed.

to_float

From a numerical value to time or a string to FLOAT type.

ava> select date, sec_id, volume, fvolume: to_float(volume) from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
      date |     sec_id |     volume |    fvolume |
---------- | ---------- | ---------- | ---------- |
2000/10/16 |   ss600008 |     797800 | 797,800.00 |
2000/10/16 |   ss600018 |     346400 | 346,400.00 |
2000/10/16 |   ss600130 |     438300 | 438,300.00 |
0.012 seconds elapsed.

to_integer

From a numerical value to time or a string to INTEGER type.

ava> select date, sec_id, open, lopen: to_integer(open) from a

to_unsigned_integer

From a numerical value to time or a string to UNSIGNED INTEGER type.

ava> select date, sec_id, open, lopen: to_unsigned_integer(open) from a