我有一个按日期索引的数据框,我试图根据类别为每个accountid提供分数,如果索引日期存在该类别值,则此数据框将如下所示。
accountid category Smooth Hard Sharp Narrow
timestamp
2018-03-29 101 Smooth 1 NaN NaN NaN
2018-03-29 102 Hard NaN 1 NaN NaN
2018-03-30 103 Narrow NaN NaN NaN 1
2018-04-30 104 Sharp NaN NaN 1 NaN
2018-04-21 105 Narrow NaN NaN NaN 1
每个accountid循环数据框的最佳方法是什么,并为每个未被堆叠的类别分配分数。
这是数据框创建脚本。
import pandas as pd
import datetime
idx = pd.date_range('02-28-2018', '04-29-2018')
df = pd.DataFrame(
[[ '101', '2018-03-29', 'Smooth','NaN','NaN','NaN','NaN'], [
'102', '2018-03-29', 'Hard','NaN','NaN','NaN','NaN'
], [ '103', '2018-03-30', 'Narrow','NaN','NaN','NaN','NaN'], [
'104', '2018-04-30', 'Sharp','NaN','NaN','NaN','NaN'
], [ '105', '2018-04-21', 'Narrow','NaN','NaN','NaN','NaN']],
columns=[ 'accountid', 'timestamp', 'category','Smooth','Hard','Sharp','Narrow'])
df['timestamp'] = pd.to_datetime(df['timestamp'])
df=df.set_index(['timestamp'])
print(df)
答案 0 :(得分:0)
您可以将str访问者与get_dummies
:
df[['accountid','category']].assign(**df['category'].str.get_dummies())
输出:
accountid category Hard Narrow Sharp Smooth
timestamp
2018-03-29 101 Smooth 0 0 0 1
2018-03-29 102 Hard 1 0 0 0
2018-03-30 103 Narrow 0 1 0 0
2018-04-30 104 Sharp 0 0 1 0
2018-04-21 105 Narrow 0 1 0 0
用nan替换0,
df[['accountid','category']].assign(**df['category'].str.get_dummies())\
.replace(0,np.nan)
输出:
accountid category Hard Narrow Sharp Smooth
timestamp
2018-03-29 101 Smooth NaN NaN NaN 1.0
2018-03-29 102 Hard 1.0 NaN NaN NaN
2018-03-30 103 Narrow NaN 1.0 NaN NaN
2018-04-30 104 Sharp NaN NaN 1.0 NaN
2018-04-21 105 Narrow NaN 1.0 NaN NaN