SQL(BigQuery) 中 2020 年 5 月 1 日和 2021 年 4 月 30 日之间一周中每一天的计数

时间:2021-06-17 06:23:54

标签: sql google-bigquery

我需要在 SQL(BigQuery) 中计算 2020 年 5 月 1 日至 2021 年 4 月 30 日期间一周中每一天的总出现次数(星期日数、星期一数等)。

我有以下相关领域需要处理:

<头>
字段 类型
started_at_cst 日期时间
ended_at_cst 日期时间
Day_of_Week STRING

该期间的开始日期将是从开始时间开始的最早日期,结束时间是从结束时间开始的最晚日期。

4 个答案:

答案 0 :(得分:0)

首先,您应该为要创建报告的时间段创建每日记录。 为此,您可以创建一个表作为“日历表”。

CREATE TABLE dbo.CalendarTable
    (
    Date datetime NOT NULL,
    DayWeekNumber nvarchar(50) NULL
    )  ON [PRIMARY]
GO
ALTER TABLE dbo.CalendarTable ADD CONSTRAINT
    PK_CalendarTable PRIMARY KEY CLUSTERED 
    (
    Date
    ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Go

之后你应该像这样填充日历表:


Truncate Table CalendarTable
Declare @FromDate as datetime , @ToDate as datetime, @counter as int , @DateRecord as int,
@Date as Datetime

Set @FromDate = '2020-03-21'
Set @ToDate = '2021-03-21'
Set @Counter = 1

Set @DateRecord = dateDiff(D,@FromDate, @ToDate)

While (@Counter <=@DateRecord)
BEGIN
Insert Into CalendarTable (Date,DayWeekNumber)
    Select DateAdd(d,@Counter,@FromDate),DATEPART(dw,DateAdd(d,@Counter,@FromDate))
Set @Counter += 1

END

最后将它与您的出现表结合

Select Date,(Select Count(*) From occurrencesTable Where started_at_cst>= CalendarTable.Date
and ended_at_cst<= CalendarTable.Date ) as NumberOfOccurence from CalendarTable

答案 1 :(得分:0)

您可以通过 GENERATE_ARRAY 函数获得您想要的:

WITH dow_generated AS (
    # Step 2. Extract the day of the week for each generated date
    SELECT
        FORMAT_DATE("%A",dates) AS day_of_week
    FROM (
            # Step 1. Generate an array of dates from minimum of started_at_cst to maximumn of ended_at_cst
            SELECT 
                GENERATE_DATE_ARRAY(MIN(started_at_cst), MAX(ended_at_cst)) as dates
            FROM `your_table`) dates_generated, unnest(dates_generated.dates) dates)

# Step 3. Count the different days of the week
SELECT day_of_week, COUNT(day_of_week) AS day_count
FROM dow_generated
GROUP BY day_of_week

你应该得到一张这样的表:

enter image description here

答案 2 :(得分:0)

想必,您希望得到您指定格式的结果。如果是这样,这应该可以满足您的需求:

WITH t AS (
       select datetime('2020-05-01') as started_at_cst, datetime('2021-05-30') as ended_at_cst
)
SELECT FORMAT_DATE('%A', dte) AS day_of_week, COUNT(*) AS day_count
FROM (SELECT MIN(started_at_cst) as min_started_at_cst, MAX(ended_at_cst) as max_ended_at_cst
      FROM t
     ) t CROSS JOIN
     UNNEST(GENERATE_DATE_ARRAY(DATE(min_started_at_cst), DATE(max_ended_at_cst), INTERVAL 1 day)) dte
GROUP BY day_of_week
ORDER BY MIN(dte)

答案 3 :(得分:0)

考虑以下

order by

如果您希望输出按工作日排序 - 在 order by case day_of_the_week when 'Monday' then 1 when 'Tuesday' then 2 when 'Wednesday' then 3 when 'Thursday' then 4 when 'Friday' then 5 when 'Saturday' then 6 when 'Sunday' then 7 end 下方添加

select day_of_the_week, count(1) days_count , pos
from (
  select format_date('%A', day) day_of_the_week, format_date('%u', day) pos
  from `project.dataset.table`,
  unnest(generate_date_array(date(started_at_cst), date(ended_at_cst))) day
  where day between '2020-05-01' and '2021-04-30'
)  
group by day_of_the_week, pos
order by pos

最后 - 如果上面的 order by 看起来对你来说太罗嗦了 - 使用下面的版本

CalculateDistance