Difference between revisions of "AVA SQL"
m |
|||
| Line 429: | Line 429: | ||
The matching columns (keys) of the two tables ( in this example, "date,sec_id" ) have to be present on both tables and bears the same names and the same data types. If the keys are not unique on the joining table ( in this example, table "b"), then only the first value of the joining table for each key is taken ( a warning message will be issued). | The matching columns (keys) of the two tables ( in this example, "date,sec_id" ) have to be present on both tables and bears the same names and the same data types. If the keys are not unique on the joining table ( in this example, table "b"), then only the first value of the joining table for each key is taken ( a warning message will be issued). | ||
| − | The efficiency of the table join operation also depends on the | + | The efficiency of the table join operation also depends on the natural order the keys. Random keys perform the worst and ordered keys the best. |
Both hash-join and merge-sort joins are supported. Please refer to the [[AVA Database Management Reference | AVA Database Management Reference ]] for more details. | Both hash-join and merge-sort joins are supported. Please refer to the [[AVA Database Management Reference | AVA Database Management Reference ]] for more details. | ||
Latest revision as of 21:43, 15 May 2013
AVA Database follows SQL '92 standard. In addition, AVA's unique SQL adds new features that facilitate high speed analytics and data mining.
Contents |
Some AVA SQL conventions
Keyword cases
Keywords are case-insensitive. Both the low case or upper case, or mixed case will point to the same keyword. However, the column names and table names are case-sensitive.
Column assignment
A column can be created from an SQL update. There are two ways to assign a resulting column to a named column:
SQL convention
The "AS" clause to assign the results to a column.
ava> update low * 123.1 as new_low from a 0.024 seconds elapsed.
Non-SQL convention
A shortcut to create a column is by the syntax: "COLUMN_NAME: ...":
ava> update new_low: low * 123.1 from a 0.029 seconds elapsed.
Both will generate the same result:
ava> select * from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | open | high | low | close | volume | adj_close | new_low |
---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 20.40 | 20.58 | 20.18 | 20.58 | 797800 | 16.87 | 2,484.16 |
2000/10/16 | ss600018 | 20.16 | 20.50 | 20.15 | 20.20 | 346400 | 17.60 | 2,480.46 |
2000/10/16 | ss600130 | 28.08 | 28.29 | 27.16 | 27.23 | 438300 | 24.89 | 3,343.40 |
0.001 seconds elapsed.
Analytics Functions
Mathematical Functions
Modulus ( % )
Take the modulus of the two numbers:
ava> select date, sec_id, ol: open % low from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | ol |
---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 0.00 |
2000/10/16 | ss600018 | 0.00 |
2000/10/16 | ss600130 | 1.00 |
0.036 seconds elapsed.
Power ( ^ )
Take the power of ( a real number ):
ava> select date, sec_id, o: open ^ 2 from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | o |
---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 416.16 |
2000/10/16 | ss600018 | 406.43 |
2000/10/16 | ss600130 | 788.49 |
0.081 seconds elapsed.
Factorial ( factorial )
Take factorial of an integer:
ava> select date, sec_id, f: factorial(5) * open from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | f |
---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 2,448.00 |
2000/10/16 | ss600018 | 2,419.20 |
2000/10/16 | ss600130 | 3,369.60 |
0.02 seconds elapsed.
Other mathematical functions
- ABS(): Absolute Value
- EXP(): Exponential, e ( 2.72 ) to a power ( real number )
- SIGN(): Sign of a number
- LOG(): Log of e (2.72)
- LOG10(): LOG of 10
- SIN(): SIN
- COS(): COS
- TAN(): Tangent
- ASIN(): Arc Sin
- ACOS(): Arc Cos
- ATAN(): Arc Tangent
Statistical Functions
SUM
Aggregate function. Sum over a group ( option )
ava> select s: sum(volume) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 1,718
---------------------------------------------------
sec_id | s |
---------- | ---------- |
ss000001 | 407718300 |
ss600000 | 424768664 |
ss600001 | 481888440 |
0.29 seconds elapsed.
AVG
Aggregate function. Average over a group ( option )
ava> select s: avg(close) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 1,718
---------------------------------------------------
sec_id | s |
---------- | ---------- |
ss000001 | 2,214.42 |
ss600000 | 17.70 |
ss600001 | 5.31 |
0.288 seconds elapsed.
STD
Aggregate function. Standard Deviation over a group ( option )
ava> select s: std(close) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 1,718
---------------------------------------------------
sec_id | s |
---------- | ---------- |
ss000001 | 1,034.01 |
ss600000 | 11.83 |
ss600001 | 1.53 |
0.291 seconds elapsed.
MIN
Aggregate function. Minimum value over a group ( option )
ava> select s: min(close) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 1,718
---------------------------------------------------
sec_id | s |
---------- | ---------- |
ss000001 | 1,011.50 |
ss600000 | 6.48 |
ss600001 | 2.72 |
0.292 seconds elapsed.
MAX
Aggregate function. Maximum value over a group ( option )
ava> select s: max(close) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 1,718
---------------------------------------------------
sec_id | s |
---------- | ---------- |
ss000001 | 6,092.06 |
ss600000 | 61.59 |
ss600001 | 10.92 |
0.293 seconds elapsed.
Time Series Functions
CUMSUM
Cumulative sum. Sums over a pre-ordered data column.
ava> update cs: cumsum(open) from a group by sec_id
0.339 seconds elapsed.
ava> select date, sec_id, cs from a where sec_id = "ss600001"
TABLE NAME: a | NUMBER OF RECORDS: 1,709
---------------------------------------------------
date | sec_id | cs |
---------- | ---------- | ---------- |
2003/01/01 | ss600001 | 4.99 |
2003/01/02 | ss600001 | 9.96 |
2003/01/03 | ss600001 | 14.91 |
2003/01/06 | ss600001 | 19.91 |
2003/01/07 | ss600001 | 24.94 |
0.076 seconds elapsed.
To sort the table to a particular order:
ava> .sort a order by date desc
0.052 seconds elapsed.
ava> select * from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | open | high | low | close | volume | adj_close |
---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
2011/04/04 | ss900946 | 0.47 | 0.47 | 0.47 | 0.47 | 0 | 0.47 |
2011/04/04 | ss900940 | 0.23 | 0.23 | 0.23 | 0.23 | 0 | 0.23 |
2011/04/04 | sz000863 | 3.50 | 3.50 | 3.50 | 3.50 | 0 | 3.50 |
0 seconds elapsed.
CUMPROD
Cumulative product. Product over a pre-ordered data column.
ava> update cp: cumprod(open) from a group by sec_id 0.588 seconds elapsed.
RANK
Rank the value from lowest to highest. It could be grouped by a combination of columns (optional).
ava> update rk: rank(volume) from a group by sec_id
0.701 seconds elapsed.
ava> select date, sec_id, volume, rk from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | volume | rk |
---------- | ---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 797800 | 590 |
2000/10/16 | ss600018 | 346400 | 265 |
2000/10/16 | ss600130 | 438300 | 864 |
0 seconds elapsed.
WRANK
SYNTAX: wrank(COLUMN_NAME,BINS)
Rank a column and assign the ranks to bins.
ava> update rk: wrank(volume,10) from a group by sec_id
0.714 seconds elapsed.
ava> select date, sec_id, volume, rk from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | volume | rk |
---------- | ---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 797800 | 3 |
2000/10/16 | ss600018 | 346400 | 2 |
2000/10/16 | ss600130 | 438300 | 4 |
0 seconds elapsed.
Window Functions
FIRST
Find the first value of a group (optional) by a predefined order.
ava> select date: first(date), open: first(open) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 1,718
---------------------------------------------------
sec_id | date | open |
---------- | ---------- | ---------- |
ss000001 | 2000/10/16 | 1,917.45 |
ss600000 | 2003/01/01 | 9.86 |
ss600001 | 2003/01/01 | 4.99 |
0.025 seconds elapsed.
LAST
Find the last value of a group (optional) by a predefined order.
ava> select date: first(date), open: last(open) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 1,718
---------------------------------------------------
sec_id | date | open |
---------- | ---------- | ---------- |
ss000001 | 2000/10/16 | 2,932.48 |
ss600000 | 2003/01/01 | 13.60 |
ss600001 | 2003/01/01 | 5.29 |
0.025 seconds elapsed.
LAG
SYNTAX: lag(COLUMN_NAME, NUM_ROWS)
Lag the data by a number of rows within a group (optional).
ava> update lag_open: lag(open,2) from a group by sec_id 0.347 seconds elapsed.
LEAD
SYNTAX: lead(COLUMN_NAME, NUM_ROWS)
Lead the data by a number of rows within a group (optional).
ava> update lead_open: lead(open,2) from a group by sec_id 0.336 seconds elapsed.
ava> select date, sec_id, open, lag_open, lead_open from a where sec_id = 'ss000001'
TABLE NAME: a | NUMBER OF RECORDS: 2,678
---------------------------------------------------
date | sec_id | open | lag_open | lead_open |
---------- | ---------- | ---------- | ---------- | ---------- |
2000/10/16 | ss000001 | 1,917.45 | nan | 1,911.71 |
2000/10/17 | ss000001 | 1,902.07 | nan | 1,925.21 |
2000/10/18 | ss000001 | 1,911.71 | 1,917.45 | 1,913.16 |
0.063 seconds elapsed.
MAVG
SYNTAX: mavg(COLUMN_NAME, WINDOW)
Compute the simple moving average within a moving window.
ava> update mavg_open: mavg(open,2) from a group by sec_id 0.364 seconds elapsed.
MSUM
SYNTAX: msum(COLUMN_NAME, WINDOW)
Compute the moving sum within a moving window.
ava> update msum_open: msum(open,2) from a group by sec_id 0.361 seconds elapsed.
MSTD
SYNTAX: mstd(COLUMN_NAME, WINDOW)
Compute the moving standard deviation within a moving window.
ava> update mstd_open: mstd(open,2) from a group by sec_id 0.377 seconds elapsed.
ava> select date, sec_id, open, mavg_open, mstd_open, msum_open from a where sec_id = 'ss000001'
TABLE NAME: a | NUMBER OF RECORDS: 2,678
---------------------------------------------------
date | sec_id | open | mavg_open | mstd_open | msum_open |
---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
2000/10/16 | ss000001 | 1,917.45 | 1,917.45 | nan | 1,917.45 |
2000/10/17 | ss000001 | 1,902.07 | 1,909.76 | 10.88 | 3,819.52 |
2000/10/18 | ss000001 | 1,911.71 | 1,906.89 | 6.82 | 3,813.78 |
EMA
Exponential moving average. SYNTAX: ema(COLUMN_NAME, HALF_LIFE)
ava> update ema_open: ema(open, 0.1) from a group by sec_id 0.034 seconds elapsed.
Row-wise Aggregation Functions
The row-wise aggregate functions are shortcuts to the aggregate functions. The functions first perform aggregation for each group, and apply the aggregated value back to each row of the same group.
WSUM
row-wise filling value from SUM function.
ava> update mopen: wsum(open) from a group by sec_id
0.406 seconds elapsed.
ava> select date, sec_id, open, mopen from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | open | mopen |
---------- | ---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 20.40 | 26,741.47 |
2000/10/16 | ss600018 | 20.16 | 31,974.72 |
2000/10/16 | ss600130 | 28.08 | 30,661.59 |
0.001 seconds elapsed.
ava> update open_demean: open - wsum(open) from a group by sec_id 0.428 seconds elapsed.
WAVG
row-wise filling value from AVG function.
ava> select date, sec_id, mopen: wavg(open) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | mopen |
---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 10.19 |
2000/10/16 | ss600018 | 12.18 |
2000/10/16 | ss600130 | 11.65 |
0.377 seconds elapsed.
WSTD
row-wise filling value from STD function.
ava> select date, sec_id, mopen: wstd(open) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | mopen |
---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 4.72 |
2000/10/16 | ss600018 | 5.91 |
2000/10/16 | ss600130 | 8.86 |
0.379 seconds elapsed.
WMIN
row-wise filling value from MIN function.
ava> select date, sec_id, mopen: wmin(open) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | mopen |
---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 3.70 |
2000/10/16 | ss600018 | 3.19 |
2000/10/16 | ss600130 | 1.70 |
0.38 seconds elapsed.
WMAX
row-wise filling value from MAX function.
ava> select date, sec_id, mopen: wmax(open) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | mopen |
---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 23.58 |
2000/10/16 | ss600018 | 25.30 |
2000/10/16 | ss600130 | 33.35 |
0.378 seconds elapsed.
WCOUNT
row-wise filling value from COUNT function.
ava> select date, sec_id, mopen: wcount(open) from a group by sec_id
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | mopen |
---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 2624 |
2000/10/16 | ss600018 | 2625 |
2000/10/16 | ss600130 | 2633 |
0.286 seconds elapsed.
Table Join Functions
The current version only supports LEFT OUTER JOIN type of table joins. No index or key is necessary to perform the join since the table join algorithm is already optimized. This feature greatly enhances the performance of the database since the index creation of a traditional database is not only time-consuming, but memory and disk storage intensive as well.
The syntax for the table join is non-SQL standard, but should be straightforward.
SYNTAX: TABLE[COLUMN1, COLUMN2].TARGET_COLUMN
For example:
ava> create table b as ( select date, sec_id, open from a )
0.078 seconds elapsed.
ava> select * from b
TABLE NAME: b | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | open |
---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 20.40 |
2000/10/16 | ss600018 | 20.16 |
2000/10/16 | ss600130 | 28.08 |
0.001 seconds elapsed.
ava> update open2: b[date,sec_id].open from a
1.27 seconds elapsed.
ava> select date, sec_id, open, open2 from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | open | open2 |
---------- | ---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 20.40 | 20.40 |
2000/10/16 | ss600018 | 20.16 | 20.16 |
2000/10/16 | ss600130 | 28.08 | 28.08 |
0.001 seconds elapsed.
The matching columns (keys) of the two tables ( in this example, "date,sec_id" ) have to be present on both tables and bears the same names and the same data types. If the keys are not unique on the joining table ( in this example, table "b"), then only the first value of the joining table for each key is taken ( a warning message will be issued).
The efficiency of the table join operation also depends on the natural order the keys. Random keys perform the worst and ordered keys the best.
Both hash-join and merge-sort joins are supported. Please refer to the AVA Database Management Reference for more details.
Data Conversion Functions
to_date
From a numerical value to date or from a string ( in the format of "YYYY-MM-DD" ) to date.
ava> select date, sec_id from a where date = to_date("2000-10-16")
TABLE NAME: a | NUMBER OF RECORDS: 100
---------------------------------------------------
date | sec_id |
---------- | ---------- |
2000/10/16 | ss600008 |
2000/10/16 | ss600018 |
2000/10/16 | ss600130 |
0.019 seconds elapsed.
to_string
SYNTAX: to_string(COLUMN, NUM_CHARS).
From all value to VARCHAR type. The number of characters must be specified. Data will be truncated at the maximum number of characters.
ava> select date, sec_id, sec_id2: to_string(sec_id, 5) from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | sec_id2 |
---------- | ---------- | ---------- |
2000/10/16 | ss600008 | ss600 |
2000/10/16 | ss600018 | ss600 |
2000/10/16 | ss600130 | ss600 |
0.072 seconds elapsed.
to_time
From a numerical value to time or from a string ( in the format of "HH:MM:SS.000" ) to time.
ava> update time: to_time("10:10:10.999") from a
0.032 seconds elapsed.
ava> select date, time, sec_id from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | time | sec_id |
---------- | ---------- | ---------- |
2000/10/16 | 10:10:10.999 | ss600008 |
2000/10/16 | 10:10:10.999 | ss600018 |
2000/10/16 | 10:10:10.999 | ss600130 |
0.001 seconds elapsed.
to_long
From a numerical value to time or a string to LONG type.
ava> select date, sec_id, open, lopen: to_long(open) from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | open | lopen |
---------- | ---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 20.40 | 20 |
2000/10/16 | ss600018 | 20.16 | 20 |
2000/10/16 | ss600130 | 28.08 | 28 |
0.012 seconds elapsed.
to_float
From a numerical value to time or a string to FLOAT type.
ava> select date, sec_id, volume, fvolume: to_float(volume) from a
TABLE NAME: a | NUMBER OF RECORDS: 3,252,871
---------------------------------------------------
date | sec_id | volume | fvolume |
---------- | ---------- | ---------- | ---------- |
2000/10/16 | ss600008 | 797800 | 797,800.00 |
2000/10/16 | ss600018 | 346400 | 346,400.00 |
2000/10/16 | ss600130 | 438300 | 438,300.00 |
0.012 seconds elapsed.
to_integer
From a numerical value to time or a string to INTEGER type.
ava> select date, sec_id, open, lopen: to_integer(open) from a
to_unsigned_integer
From a numerical value to time or a string to UNSIGNED INTEGER type.
ava> select date, sec_id, open, lopen: to_unsigned_integer(open) from a