在表连接的查询中使用CORR()函数

时间:2014-08-13 22:25:10

标签: google-bigquery

在表连接查询上使用CORR()函数时,我得到一个空值。但是,在没有连接的查询中,CORR()函数返回一个值。我得到其他字段的值。我试过给字段别名,或者没有别名,但我似乎无法在查询2中获得相关值。

提前致谢。


查询1 返回相关值。查询和结果json链接如下。

select DATE(Time ) as date, ROUND(AVG(Price),2) as price, ROUND(SUM(amount),2) as volume, CORR(price, amount) as correlation

from

ds_5.tb_4981, ds_5.tb_4978, ds_5.tb_4967

where YEAR(Time) = 2014

group by date

order by date ASC

查询1结果json:https://json.datadives.com/64cbd7a4a5aba3a864b17a719148620f.json


查询2 相关的空值。查询和结果json链接如下。

select bitcoin.date as date, bitcoin.btcprice, blockchain.trans_vol,  CORR(bitcoin.btcprice,blockchain.trans_vol) as correlation

from

(select DATE(time) as date, AVG(price) as btcprice
from

ds_5.tb_4981, ds_5.tb_4978, ds_5.tb_4967

where YEAR(Time) = 2014

group by date) as bitcoin

JOIN
(select
DATE(blocktime) as date, SUM(vout.value) as trans_vol
from ds_14.tb_7917, ds_14.tb_7918, ds_14.tb_7919, ds_14.tb_7920, ds_14.tb_7921, ds_14.tb_7922, ds_14.tb_7923, ds_14.tb_7924, ds_14.tb_7925, ds_14.tb_7926, ds_14.tb_7927, ds_14.tb_7928, ds_14.tb_7934, ds_14.tb_7972, ds_14.tb_8016, ds_14.tb_8086, ds_14.tb_9743, ds_14.tb_9888, ds_14.tb_10084, ds_14.tb_10136, ds_14.tb_10500, ds_14.tb_10601
where YEAR(blocktime) = 2014
group by Date) as blockchain

on bitcoin.date = blockchain.date

group each by date, bitcoin.btcprice, blockchain.trans_vol

order by date ASC

查询2结果json:https://json.datadives.com/9427dc9f51ba36add5f008403def7b6d.json

1 个答案:

答案 0 :(得分:1)

我使用了您关联的CSV并将其保留在此处:https://bigquery.cloud.google.com/table/fh-bigquery:public_dump.datadivescsv

(我不确定您为什么更喜欢按文件共享csv,而不是在BigQuery中创建公共数据集并共享链接)

这样可行:

SELECT CORR(btc_price, trans_vol)
FROM [fh-bigquery:public_dump.datadivescsv] 

-0.004957046970769512   

但这并不是:

SELECT CORR(btc_price, trans_vol)
FROM [fh-bigquery:public_dump.datadivescsv]
GROUP BY date

null
null
...
null

那是预期的!

原因:要计算相关性,我们需要超过2个数字的集合。在第二个查询上按日期分组给我们留下了1个元素的n组,因此相关性是不可计算的。

(旁注:2个元素之间的相关性总是1或-1。我们真的需要至少3个元素,并且结果更重要)

SELECT CORR(x, y)
FROM (SELECT 1 x, 2 y)
null

SELECT CORR(x, y)
FROM (SELECT 1 x, 2 y), (SELECT 3 x, 8 y)
1.0

SELECT CORR(x, y)
FROM (SELECT 1 x, 2 y), (SELECT 3 x, 8 y), (SELECT 7 x, 1 y)
-0.3170147297373293 

......等等