我有一个包含日期和公众假期的数据框
Date WeekNum Public_Holiday
1/1/2015 1 1
2/1/2015 1 0
3/1/2015 1 0
4/1/2015 1 0
5/1/2015 1 0
6/1/2015 1 0
7/1/2015 1 0
8/1/2015 2 0
9/1/2015 2 0
10/1/2015 2 0
11/1/2015 2 0
12/1/2015 2 0
13/1/2015 2 0
我必须创建一个名为Public_Holiday_Week的条件列,如果该星期有公共假期,则该列应返回1
我想看到这样的输出
Date WeekNum Public_Holiday Public_Holiday_Week
1/1/2015 1 1 1
2/1/2015 1 0 1
3/1/2015 1 0 1
4/1/2015 1 0 1
5/1/2015 1 0 1
6/1/2015 1 0 1
7/1/2015 1 0 1
8/1/2015 2 0 0
9/1/2015 2 0 0
10/1/2015 2 0 0
11/1/2015 2 0 0
12/1/2015 2 0 0
13/1/2015 2 0 0
我尝试使用np.where
df['Public_Holiday_Week'] = np.where(df['Public_Holiday']==1,1,0)
但是当它不是公共假期时,它将在一周的其他几天应用0。
我必须在这里申请滚动吗?感谢您的帮助
答案 0 :(得分:8)
为了提高性能,请不要使用current_user
,而应使用至少一个bank_id
来获得全部def self.import(file, bank_id, current_user_id)
transactions = []
CSV.foreach(file.path, headers: true) do |row|
transaction_params = row.to_hash.merge!({bank_id: bank_id, user_id: current_user_id})
transactions << Transaction.new(transaction_params)
end
Transaction.import(transactions)
end
,然后通过isin
选择值,最后将布尔值掩码转换为{{1 }}:
groupby
@Mohamed Thasin指出,如果有必要,可以按周分组,但是由于不同的week
编号,将获得不同的输出:
WeekNum
答案 1 :(得分:7)
resample
并完全跳过WeekNum
列的使用。df.assign(
Public_Holiday_Week=
df.resample('W-Wed', on='Date').Public_Holiday.transform('max')
)
Date WeekNum Public_Holiday Public_Holiday_Week
0 2015-01-01 1 1 1
1 2015-01-02 1 0 1
2 2015-01-03 1 0 1
3 2015-01-04 1 0 1
4 2015-01-05 1 0 1
5 2015-01-06 1 0 1
6 2015-01-07 1 0 1
7 2015-01-08 2 0 0
8 2015-01-09 2 0 0
9 2015-01-10 2 0 0
10 2015-01-11 2 0 0
11 2015-01-12 2 0 0
12 2015-01-13 2 0 0
答案 2 :(得分:4)
groupby
和max
,以及map
:df['Public_Holiday_Week'] = df.WeekNum.map(df.groupby('WeekNum').Public_Holiday.max())
print(df)
Date WeekNum Public_Holiday Public_Holiday_Week
0 1/1/2015 1 1 1
1 2/1/2015 1 0 1
2 3/1/2015 1 0 1
3 4/1/2015 1 0 1
4 5/1/2015 1 0 1
5 6/1/2015 1 0 1
6 7/1/2015 1 0 1
7 8/1/2015 2 0 0
8 9/1/2015 2 0 0
9 10/1/2015 2 0 0
10 11/1/2015 2 0 0
11 12/1/2015 2 0 0
12 13/1/2015 2 0 0
groupby
和transform
,以及max
df['Public_Holiday_Week'] = df.groupby('WeekNum').Public_Holiday.transform('max')
非常感谢,当按月-年分组时,这将很好地推广:
df['Public_Holiday_Week'] = (
df.groupby(['WeekNum', df.Date.str.split('/', 1).str[1]])
.Public_Holiday.transform('max')
)
print(df)
Date WeekNum Public_Holiday Public_Holiday_Week
0 1/1/2015 1 1 1
1 2/1/2015 1 0 1
2 3/1/2015 1 0 1
3 4/1/2015 1 0 1
4 5/1/2015 1 0 1
5 6/1/2015 1 0 1
6 7/1/2015 1 0 1
7 8/1/2015 2 0 0
8 9/1/2015 2 0 0
9 10/1/2015 2 0 0
10 11/1/2015 2 0 0
11 12/1/2015 2 0 0
12 13/1/2015 2 0 0