我想在Big查询中加入两个带有公共列的表?

时间:2018-01-09 19:42:04

标签: sql join google-bigquery legacy-sql

要加入表格,我使用以下查询。

SELECT *
FROM(select user as uservalue1 FROM [projectname.FullData_Edited]) as FullData_Edited 
JOIN (select user as uservalue2 FROM [projectname.InstallDate]) as InstallDate 
ON FullData_Edited.uservalue1=InstallDate.uservalue2;

查询有效但连接表只有两列uservalue1和uservalue2。 我想保留表中的所有列。知道如何实现这个目标吗?

2 个答案:

答案 0 :(得分:2)

   
#legacySQL
SELECT <list of fields to output>
FROM [projectname:datasetname.FullData_Edited] AS FullData_Edited
JOIN [projectname:datasetname.InstallDate] AS InstallDate
ON FullData_Edited.user = InstallDate.user

或(并且优选)

#standardSQL
SELECT <list of fields to output>
FROM `projectname.datasetname.FullData_Edited` AS FullData_Edited
JOIN `projectname.datasetname.InstallDate` AS InstallDate
ON FullData_Edited.user = InstallDate.user

注意,在这种情况下使用SELECT *会导致Ambiguous column name错误,因此最好在输出中放置您需要的列/字段的明确列表

解决方法的方法是使用USING()语法,如下例所示 假设user是唯一不明确的字段 - 它可以解决问题

#standardSQL
SELECT *
FROM `projectname.datasetname.FullData_Edited` AS FullData_Edited
JOIN `projectname.datasetname.InstallDate` AS InstallDate
USING (user)

例如:

#standardSQL
WITH `projectname.datasetname.FullData_Edited` AS (
  SELECT 1 user, 'a' field1
),
`projectname.datasetname.InstallDate` AS (
  SELECT 1 user, 'b' field2
)
SELECT *
FROM `projectname.datasetname.FullData_Edited` AS FullData_Edited
JOIN `projectname.datasetname.InstallDate` AS InstallDate
USING (user)

返回

user    field1  field2   
1       a       b    

而使用ON FullData_Edited.user = InstallDate.user会出现以下错误

Error: Duplicate column names in the result are not supported. Found duplicate(s): user

答案 1 :(得分:1)

如果您想要所有列,请不要使用子查询:

SELECT *
FROM [projectname.FullData_Edited] as FullData_Edited JOIN
     [projectname.InstallDate] as InstallDate 
     ON FullData_Edited.uservalue1 = InstallDate.uservalue2;

您可能必须列出要避免重复列名称的特定列。

当你在这里时,你也应该切换到标准SQL。