我有一个数据帧,其中idex为datetime,不同的id和不同的值 我有一个df:
data = [
{ 'datetime_start' : "2017-03-15 14:31:20.507", "id" : "usr_21", "value": "-1.286452"},
{ 'datetime_start' : "2017-03-16 15:17:45.550", "id" : "usr_15", "value": "-2.349203"},
{ 'datetime_start' : "2017-03-17 14:20:47.437", "id" : "usr_13", "value": "-2.397038"},
{ 'datetime_start' : "2017-03-19 09:43:47.262", "id" : "usr_12", "value": "-1.250512"},
{ 'datetime_start' : "2017-03-19 15:18:47.941", "id" : "usr_21", "value": "-0.681998"},
{ 'datetime_start' : "2017-03-19 20:03:52.905", "id" : "usr_15", "value": "-1.018452"},
{ 'datetime_start' : "2017-03-22 13:40:48.178", "id" : "usr_21", "value": "-1.531373"},
{ 'datetime_start' : "2017-03-22 19:54:48.320", "id" : "usr_18", "value": "-3.789466"},
{ 'datetime_start' : "2017-03-23 13:53:48.789", "id" : "usr_21", "value": "-1.288360"},
{ 'datetime_start' : "2017-03-24 15:54:48.649", "id" : "usr_21", "value": "0.213171"},
{ 'datetime_start' : "2017-03-25 17:53:48.422", "id" : "usr_13", "value": "-2.020710"},
{ 'datetime_start' : "2017-03-26 06:10:48.197", "id" : "usr_12", "value": "-1.484709"},
{ 'datetime_start' : "2017-03-15 14:31:20.507", "id" : "usr_21", "value": "-1.286452"},
{ 'datetime_start' : "2017-03-16 15:18:45.550", "id" : "usr_18", "value": "-2.349203"},
{ 'datetime_start' : "2017-03-17 14:18:47.437", "id" : "usr_11", "value": "-2.397038"},
{ 'datetime_start' : "2017-03-19 09:48:47.262", "id" : "usr_15", "value": "-1.250512"},
{ 'datetime_start' : "2017-03-19 15:18:47.941", "id" : "usr_21", "value": "-0.681998"},
{ 'datetime_start' : "2017-03-19 20:03:52.905", "id" : "usr_13", "value": "-1.018452"},
{ 'datetime_start' : "2017-03-22 13:53:48.178", "id" : "usr_21", "value": "-1.531373"},
{ 'datetime_start' : "2017-03-22 19:53:48.320", "id" : "usr_18", "value": "-3.789466"},
{ 'datetime_start' : "2017-03-23 13:53:48.789", "id" : "usr_21", "value": "-1.288360"},
{ 'datetime_start' : "2017-03-24 15:53:48.649", "id" : "usr_11", "value": "0.213171"},
{ 'datetime_start' : "2017-03-25 16:53:48.422", "id" : "usr_13", "value": "-2.020710"},
{ 'datetime_start' : "2017-03-26 06:08:48.197", "id" : "usr_15", "value": "-1.484709"}
]
df = pd.DataFrame(data)
df['datetime_start'] = pd.to_datetime(df['datetime_start'])
我希望将此数据表示为数据透视
table = pd.pivot_table(df, values='value', index='id'],
columns=['index'], aggfunc=np.sum)
所以对于每个(id,datetime)我们都有一个值 如果没有价值,它就是无
是否有一种优雅的方法可以用此规则替换None值:
if value(id_i, datetime_i) == None :
if value(id_i, datetime_i-1) != 0 :
value(id_i, datetime_i) = value(id_i, datetime_i-1)
else:
value(id_i, datetime_i) = 0
这种传播:
答案 0 :(得分:1)
我认为您需要先将columns=['index']
更改为columns='datetime_start'
,然后使用ffill
(fillna
与method='ffill'
):
table = (pd.pivot_table(df,
values='value',
index='id',
columns='datetime_start',
aggfunc=np.sum)
.ffill(axis=1, limit=1))
使用较少的数据进行编辑:
data = [
{ 'datetime_start' : "2017-03-15 14:31:20.507", "id" : "usr_21", "value": "-1.286452"},
{ 'datetime_start' : "2017-03-16 15:17:45.550", "id" : "usr_15", "value": "-2.349203"},
{ 'datetime_start' : "2017-03-17 14:20:47.437", "id" : "usr_13", "value": "-2.397038"},
{ 'datetime_start' : "2017-03-19 09:43:47.262", "id" : "usr_12", "value": "-1.250512"},
]
df = pd.DataFrame(data)
df['datetime_start'] = pd.to_datetime(df['datetime_start'])
table = (pd.pivot_table(df,
values='value',
index='id',
columns='datetime_start',
aggfunc=np.sum)
)
print (table)
datetime_start 2017-03-15 14:31:20.507 2017-03-16 15:17:45.550 \
id
usr_12 None None
usr_13 None None
usr_15 None -2.349203
usr_21 -1.286452 None
datetime_start 2017-03-17 14:20:47.437 2017-03-19 09:43:47.262
id
usr_12 None -1.250512
usr_13 -2.397038 None
usr_15 None None
usr_21 None None
如果之前的值不是None
,则只替换一个值None
- 然后添加参数limit
:
table1 = (pd.pivot_table(df,
values='value',
index='id',
columns='datetime_start',
aggfunc=np.sum)
.ffill(axis=1, limit=1)
)
print (table1)
datetime_start 2017-03-15 14:31:20.507 2017-03-16 15:17:45.550 \
id
usr_12 None None
usr_13 None None
usr_15 None -2.349203
usr_21 -1.286452 -1.286452
datetime_start 2017-03-17 14:20:47.437 2017-03-19 09:43:47.262
id
usr_12 None -1.250512
usr_13 -2.397038 -2.397038
usr_15 -2.349203 None
usr_21 None None
将所有NaN
替换为之前的非NaNs
,最后将所有NaN
替换为0
,删除limit
并添加fillna(0)
:
table2 = (pd.pivot_table(df,
values='value',
index='id',
columns='datetime_start',
aggfunc=np.sum)
.ffill(axis=1)
.fillna(0)
)
print (table2)
datetime_start 2017-03-15 14:31:20.507 2017-03-16 15:17:45.550 \
id
usr_12 0 0
usr_13 0 0
usr_15 0 -2.349203
usr_21 -1.286452 -1.286452
datetime_start 2017-03-17 14:20:47.437 2017-03-19 09:43:47.262
id
usr_12 0 -1.250512
usr_13 -2.397038 -2.397038
usr_15 -2.349203 -2.349203
usr_21 -1.286452 -1.286452