我正在尝试使用window函数创建一列,以汇总特定年份的收入。例如,我需要一个列作为2020年用户收入。
╔══════╦════════╦═════════╦═════════╗
║ year ║ userid ║ orderid ║ revenue ║
╠══════╬════════╬═════════╬═════════╣
║ 2019 ║ 1 ║ a1 ║ 50 ║
║ 2020 ║ 1 ║ a2 ║ 100 ║
║ 2020 ║ 1 ║ a3 ║ 50 ║
║ 2019 ║ 2 ║ b1 ║ 100 ║
║ 2020 ║ 2 ║ b2 ║ 100 ║
╚══════╩════════╩═════════╩═════════╝
我可以使用子查询来实现相同功能,但是我想知道是否可以使用window函数来做到这一点?
select *, sum(revenue) over (partition by year, userid) as 2020_user_revenue
from table
当前拥有:
╔══════╦════════╦═════════╦═════════╦═══════════════════╗
║ year ║ userid ║ orderid ║ revenue ║ 2020_user_revenue ║
╠══════╬════════╬═════════╬═════════╬═══════════════════╣
║ 2019 ║ 1 ║ a1 ║ 50 ║ 50 ║
║ 2020 ║ 1 ║ a2 ║ 100 ║ 150 ║
║ 2020 ║ 1 ║ a3 ║ 50 ║ 150 ║
║ 2019 ║ 2 ║ b1 ║ 100 ║ 100 ║
║ 2020 ║ 2 ║ b2 ║ 100 ║ 100 ║
╚══════╩════════╩═════════╩═════════╩═══════════════════╝
预期:
╔══════╦════════╦═════════╦═════════╦═══════════════════╗
║ year ║ userid ║ orderid ║ revenue ║ 2020_user_revenue ║
╠══════╬════════╬═════════╬═════════╬═══════════════════╣
║ 2019 ║ 1 ║ a1 ║ 50 ║ 150 ║
║ 2020 ║ 1 ║ a2 ║ 100 ║ 150 ║
║ 2020 ║ 1 ║ a3 ║ 50 ║ 150 ║
║ 2019 ║ 2 ║ b1 ║ 100 ║ 100 ║
║ 2020 ║ 2 ║ b2 ║ 100 ║ 100 ║
╚══════╩════════╩═════════╩═════════╩═══════════════════╝
答案 0 :(得分:0)
您能在下面的脚本中尝试这个吗?
import pandas as pd
import numpy as np
df = pd.DataFrame([
[1, 0, 2, 2],
[1, 1, 0, 0],
[0, 2, 3, 2],
[2, 2, 1, 1]],
columns=['col1', 'col2', 'col3', 'col4'])
# cols = df.columns[:-1]
df1 = df.iloc[:,:-1]
df1['threshold']=1
df2 = df1.drop('threshold', 1).gt(df1['threshold'], 0)
df2 = df2.apply(lambda x: ', '.join(x.index[x]),axis=1)
df['d']=df2
print df
输出为-
col1 col2 col3 col4 d
0 1 0 2 2 col3
1 1 1 0 0
2 0 2 3 2 col2, col3
3 2 2 1 1 col1, col2
答案 1 :(得分:0)
只需使用条件窗口函数:
select t.*,
sum(case when year = 2020 then revenue else 0
end) over (partition by userid) as revenue_2020
from t;