如何缩短bigquery查询的运行时间

时间:2018-11-18 04:44:50

标签: sql google-bigquery

我有两个表:

1. PEOPLE (PK, Name, Address, Zip, <<some random other columns>>)
2. EMAIL  (PK, Name, Address, Zip, Email)

这是一个一对多表,其中按名称,地址和邮政编码链接。

我需要的是

PEOPLE (PK, Name, Address, Zip, <<some random other columns>>, FK_Email1, Email1, FK_Email2, Email2, FK_Email3, Email3)

到目前为止,我是这样的:

#standardSQL
SELECT a.PK, a.FK, Source, FirstName, LastName, MiddleName, SuffixName, Gender, Age, DOB, Address, Address2, City, State, Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION,
  FK_Email[SAFE_ORDINAL(1)] FK_Email1, Emails[SAFE_ORDINAL(1)] Email1, FK_Email[SAFE_ORDINAL(2)] as FK_Email2, Emails[SAFE_ORDINAL(2)] Email2, FK_Email[SAFE_ORDINAL(3)] as FK_Email3, Emails[SAFE_ORDINAL(3)] Email3
FROM (
  SELECT
    P.PK, P.FK, P.Source, P.FirstName, P.LastName, MiddleName, SuffixName, Gender, Age, DOB, P.Address, Address2, P.City, P.State, P.Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION
    , ARRAY_AGG(E.Email) Emails, ARRAY_AGG(E.PK) FK_Email
  FROM `db.ds.table1` P
  left JOIN `db.ds.table2`  E
  ON P.FirstName = E.FirstName
  AND P.LastName = E.LastName
  AND P.Address = E.Address
  AND P.Zip = E.Zip
Group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87
) a

我的问题是,这已经超过了六个退出时间限制。 无论如何,可以使运行速度更快吗?

谢谢!

1 个答案:

答案 0 :(得分:2)

我觉得下面的方法是一样的,但是以更优化的方式进行

#standardSQL
SELECT 
  PK, FK, Source, P.FirstName, P.LastName, MiddleName, SuffixName, Gender, Age, DOB, P.Address, Address2, City, State, P.Zip, Zip4, Cleaned_HouseNumber, Cleaned_Street, Cleaned_City, Cleaned_County, Cleaned_State, Cleaned_Zip, TimeZone, Income, HomeValue, Networth, MaritalStatus, IsRenter, HasChildren, CreditRating, Investor, LinesOfCredit, InvestorRealEstate, Traveler, Pets, MailResponder, Charitable, PolicalDonations, PoliticalParty, ATTOM_ID, GEOID, SCORE, Latitude, Longitude, SpouseFirstName, SpouseLastName, HomeAvailableHomeEquity, HomeTotalLoans, HomeLoan1Amount, HomeLoan2Amount, HomeValueRangeCode, HomeValueRangeText, HomeMarketValue, HomeAssessedValue, HomeLoanToValue, HomeSQFT, HomeLotSQFT, HomeYearBuilt, HomePurchaseDate, HomeLoan1Date, HomeLoan2Date, HomeParcelNumber, HomePropertyType, DNC, HomeCompanyOwned, HomeTrustOwned, HomeOwnerOccupied, HomeType, HomePool, HomeGarage, HomeHeating, HomeCooling, HomeBedrooms, HomeBathrooms, HomeNumberOfUnits, MailingAddress, MailingCity, MailingState, MailingZip, MailingZip4, Married, Divorce, Education, Occupation, Ethnicity, LANGUAGE, RELIGION,
  FK_Email[SAFE_ORDINAL(1)] FK_Email1, Emails[SAFE_ORDINAL(1)] Email1, FK_Email[SAFE_ORDINAL(2)] AS FK_Email2, Emails[SAFE_ORDINAL(2)] Email2, FK_Email[SAFE_ORDINAL(3)] AS FK_Email3, Emails[SAFE_ORDINAL(3)] Email3
FROM `db.ds.table1` P
LEFT JOIN (
  SELECT FirstName, LastName, Address, Zip, 
    ARRAY_AGG(Email LIMIT 3) Emails, ARRAY_AGG(PK LIMIT 3) FK_Email
  FROM `db.ds.table2`
  GROUP BY FirstName, LastName, Address, Zip
) E
ON P.FirstName = E.FirstName
AND P.LastName = E.LastName
AND P.Address = E.Address
AND P.Zip = E.Zip