我需要在 SQL(BigQuery) 中计算 2020 年 5 月 1 日至 2021 年 4 月 30 日期间一周中每一天的总出现次数(星期日数、星期一数等)。
我有以下相关领域需要处理:
字段 | 类型 |
---|---|
started_at_cst | 日期时间 |
ended_at_cst | 日期时间 |
Day_of_Week | STRING |
该期间的开始日期将是从开始时间开始的最早日期,结束时间是从结束时间开始的最晚日期。
答案 0 :(得分:0)
首先,您应该为要创建报告的时间段创建每日记录。 为此,您可以创建一个表作为“日历表”。
CREATE TABLE dbo.CalendarTable
(
Date datetime NOT NULL,
DayWeekNumber nvarchar(50) NULL
) ON [PRIMARY]
GO
ALTER TABLE dbo.CalendarTable ADD CONSTRAINT
PK_CalendarTable PRIMARY KEY CLUSTERED
(
Date
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Go
之后你应该像这样填充日历表:
Truncate Table CalendarTable
Declare @FromDate as datetime , @ToDate as datetime, @counter as int , @DateRecord as int,
@Date as Datetime
Set @FromDate = '2020-03-21'
Set @ToDate = '2021-03-21'
Set @Counter = 1
Set @DateRecord = dateDiff(D,@FromDate, @ToDate)
While (@Counter <=@DateRecord)
BEGIN
Insert Into CalendarTable (Date,DayWeekNumber)
Select DateAdd(d,@Counter,@FromDate),DATEPART(dw,DateAdd(d,@Counter,@FromDate))
Set @Counter += 1
END
最后将它与您的出现表结合
Select Date,(Select Count(*) From occurrencesTable Where started_at_cst>= CalendarTable.Date
and ended_at_cst<= CalendarTable.Date ) as NumberOfOccurence from CalendarTable
答案 1 :(得分:0)
您可以通过 GENERATE_ARRAY
函数获得您想要的:
WITH dow_generated AS (
# Step 2. Extract the day of the week for each generated date
SELECT
FORMAT_DATE("%A",dates) AS day_of_week
FROM (
# Step 1. Generate an array of dates from minimum of started_at_cst to maximumn of ended_at_cst
SELECT
GENERATE_DATE_ARRAY(MIN(started_at_cst), MAX(ended_at_cst)) as dates
FROM `your_table`) dates_generated, unnest(dates_generated.dates) dates)
# Step 3. Count the different days of the week
SELECT day_of_week, COUNT(day_of_week) AS day_count
FROM dow_generated
GROUP BY day_of_week
你应该得到一张这样的表:
答案 2 :(得分:0)
想必,您希望得到您指定格式的结果。如果是这样,这应该可以满足您的需求:
WITH t AS (
select datetime('2020-05-01') as started_at_cst, datetime('2021-05-30') as ended_at_cst
)
SELECT FORMAT_DATE('%A', dte) AS day_of_week, COUNT(*) AS day_count
FROM (SELECT MIN(started_at_cst) as min_started_at_cst, MAX(ended_at_cst) as max_ended_at_cst
FROM t
) t CROSS JOIN
UNNEST(GENERATE_DATE_ARRAY(DATE(min_started_at_cst), DATE(max_ended_at_cst), INTERVAL 1 day)) dte
GROUP BY day_of_week
ORDER BY MIN(dte)
答案 3 :(得分:0)
考虑以下
order by
如果您希望输出按工作日排序 - 在 order by case day_of_the_week
when 'Monday' then 1
when 'Tuesday' then 2
when 'Wednesday' then 3
when 'Thursday' then 4
when 'Friday' then 5
when 'Saturday' then 6
when 'Sunday' then 7
end
下方添加
select day_of_the_week, count(1) days_count , pos
from (
select format_date('%A', day) day_of_the_week, format_date('%u', day) pos
from `project.dataset.table`,
unnest(generate_date_array(date(started_at_cst), date(ended_at_cst))) day
where day between '2020-05-01' and '2021-04-30'
)
group by day_of_the_week, pos
order by pos
最后 - 如果上面的 order by 看起来对你来说太罗嗦了 - 使用下面的版本
CalculateDistance