我有几列数据,它们位于pandas数据帧中。数据看起来像
cus_id timestamp values second_val
0 10173 2010-06-12 39.0 1
1 95062 2010-09-11 35.0 2
2 171081 2010-07-05 39.0 1
3 122867 2010-08-18 39.0 1
4 107186 2010-11-23 0.0 3
5 171085 2010-09-02 0.0 2
6 169767 2010-07-03 28.0 2
7 80170 2010-03-23 39.0 2
8 154178 2010-10-02 37.0 2
9 3494 2010-11-01 0.0 1
.
.
.
.
5054054 1716139 2012-01-12 0.0 2
5054055 1716347 2012-01-18 28.0 1
5054056 1807501 2012-01-21 0.0 1
有0个值数据显示在值列中,它出现在不同的日期。我试图将每个月的所有second_val值分组,当时当前的值列数据等于零以正确绘制它们并使用
进行绘制Jan10 = df.second_val[df['timestamp'].str.contains('2010-01')][df['values']==0].sum()
Feb10 = df.second_val[df['timestamp'].str.contains('2010-02')][df['values']==0].sum()
Mar10 = df.second_val[df['timestamp'].str.contains('2010-03')][df['values']==0].sum()
.
.
.
.
Jan12 = df.second_val[df['timestamp'].str.contains('2012-01')][df['values']==0].sum()
Feb12 = df.second_val[df['timestamp'].str.contains('2012-02')][df['values']==0].sum()
Months = ['2010-01', '2010-02', '2010-03', '2010-04' . . . . ., '2012-01', '2012-02']
Months_Orders = [Jan10, Feb10, Mar10, Apr10, . . . . .. ., Jan12, Feb12]
plt.figure(figsize=(15,8))
plt.scatter(x = Months, y = Months_Orders)
如果0在jan10中出现10天并且second_val数据的总和是20.那么它应该给我20月1月 例如
cus_id timestamp values second_val
0 10173 2010-01-10 0.0 1
.
.
13 95062 2010-01-11 0.0 2
34 171081 2010-01-23 0.0 1
有没有办法通过编写函数或任何内置的pandas方式来改进。我尝试了我以前的问题解决方案,但它有所不同,并没有为我正常工作所以我使用这个硬编码,似乎效率低下。感谢
答案 0 :(得分:0)
IIUC
df.timestamp=pd.to_datetime(df.timestamp)
df=df[df['values']==0]# filter it before groupby
df.groupby(df.timestamp.dt.strftime('%Y-%m')).second_val.sum()# using groupby after filter to get what you need, group key is format %Y-%m