为下表的“文件”提供了列
repo_name, file_name, size, downloads, date_stamp
and values:
('repo1', 'file1', 100, 7, '2019-08-15')
('repo1', 'file1', 100, 5, '2019-08-08')
('repo1', 'file2', 100, 10, '2019-08-15')
('repo1', 'file3', 100, 10, '2019-08-08')
('repo2', 'file1', 100, 10, '2019-08-15')
('repo2', 'file2', 100, 10, '2019-08-15')
我要选择repo_name / file_name的所有组合: 1)是08-15的新功能,换句话说,它在08-08上不存在,并对下载值求和。 2)对于08-15不存在,但在08-08 3)同时出现在08-08和08-15上,并求和这些日期的下载差异
对于3),这似乎起作用:
for row in cur.execute('select a.repo_name, a.file_name, a.downloads - b.downloads from files a inner join files b on a.repo_name = b.repo_name and a.file_name = b.file_name where a.date_stamp = ? and b.date_stamp = ? ', (today, daysback_7):
print(row)
这不对值求和,但返回: ('repo1','file1',2)两个日期上存在的唯一repo_name / file_name组合以及下载值之间的差异。我需要查看是否可以在一个查询中求和值,因为我只想要总数。最坏的情况是,我可以遍历行并求和。
对于2)我只想知道08-08而不是08-15处的repo_name / file_name组合的数量,换句话说,文件已删除:
for row in cur.execute('select repo_name, file_name from files where date_stamp = ? except select repo_name, file_name from files where date_stamp = ?', (daysback_7, today)):
print(row)
用于1)我可以获取repo_name / file_name组合,但我也想获取下载值。这就是我所拥有的:
for row in cur.execute('select repo_name, file_name from files where date_stamp = ? except select repo_name, file_name from files where date_stamp = ? ', (today, daysback_7)):
print(row)
它返回:
('repo1', 'file2')
('repo2', 'file1')
('repo2', 'file2')
但是我不能包括下载的列,否则它会包含repo1 / file1的行,这不是新的。
答案 0 :(得分:1)
注意:由于使用窗口函数,其中一些需要现代版本的sqlite(3.25或更高版本):
-- 1 - Sum of downloads of files that only exist on 2019-08-15
SELECT sum(downloads)
FROM (SELECT downloads
, first_value(date_stamp) OVER (PARTITION BY repo_name, file_name
ORDER BY date_stamp) AS first_date
FROM files)
WHERE first_date = '2019-08-15';
sum(downloads)
--------------
30
-- 2 - Files that exist on 2019-08-08 but not 2019-08-15
SELECT repo_name, file_name FROM files WHERE date_stamp = '2019-08-08'
EXCEPT
SELECT repo_name, file_name FROM files WHERE date_stamp = '2019-08-15';
repo_name file_name
---------- ----------
repo1 file3
- 3 - Sum of difference in downloads for files present on both dates
SELECT sum(diff)
FROM (SELECT downloads - lag(downloads, 1) OVER (PARTITION BY repo_name, file_name
ORDER BY date_stamp) AS diff
FROM files
WHERE date_stamp IN ('2019-08-08', '2019-08-15'));
sum(diff)
----------
2
这三个都受益于files(repo_name, file_name, date_stamp)
上的索引。