使用pandas,以下DataFrame
:
In [1]: import pandas as pd
In [2]: pd.DataFrame({'month': [1, 1, 1, 2, 2, 3, 3],
'type': ["T1", "T1", "T4", "T2", "T3", "T1", "T3"],
'value': [10, 40, 20, 30, 10, 40, 50]})
Out[2]:
month type value
0 1 T1 10
1 1 T1 40
2 1 T4 20
3 2 T2 30
4 2 T3 10
5 3 T1 40
6 3 T3 50
要处理以产生下面的结果吗?
Out[3]:
T1 T2 T3 T4 month
0 10 0 0 0 1
1 40 0 0 0 1
2 0 0 0 20 1
3 0 30 0 0 2
4 0 0 10 0 2
5 40 0 0 0 3
6 0 0 50 0 3
答案 0 :(得分:4)
<强> pandas
强>
巧妙使用pd.get_dummies
pd.get_dummies(df.type).mul(df.value, 0).join(df.month)
T1 T2 T3 T4 month
0 10 0 0 0 1
1 40 0 0 0 1
2 0 0 0 20 1
3 0 30 0 0 2
4 0 0 10 0 2
5 40 0 0 0 3
6 0 0 50 0 3
<强> numpy
强>
或者同样的想法,但超级充电
u, inv = np.unique(df.type.values, return_inverse=True)
eye = np.eye(u.size, dtype=int)
v = df.value.values
m = df.month.values
pd.DataFrame(
np.column_stack([eye[inv] * v[:, None], m]),
df.index, np.append(u, 'month')
)
T1 T2 T3 T4 month
0 10 0 0 0 1
1 40 0 0 0 1
2 0 0 0 20 1
3 0 30 0 0 2
4 0 0 10 0 2
5 40 0 0 0 3
6 0 0 50 0 3
时间
%timeit pd.get_dummies(df.type).mul(df.value, 0).join(df.month)
1000 loops, best of 3: 1.1 ms per loop
%%timeit
u, inv = np.unique(df.type.values, return_inverse=True)
eye = np.eye(u.size, dtype=int)
v = df.value.values
m = df.month.values
pd.DataFrame(
np.column_stack([eye[inv] * v[:, None], m]),
df.index, np.append(u, 'month')
)
10000 loops, best of 3: 189 µs per loop
%%timeit
(df.set_index(['type'],append=True)['value']
.unstack(fill_value=0)).join(df[['month']])
100 loops, best of 3: 1.92 ms per loop
%%timeit
d1 = df.set_index(['month','type'], append=True)['value'] \
.unstack(fill_value=0) \
.reset_index(level=1) \
cols = d1.columns[1:].tolist() + d1.columns[:1].tolist()
d1 = d1.reindex_axis(cols, axis=1)
d1
100 loops, best of 3: 2.48 ms per loop
答案 1 :(得分:3)
您可以使用set_index
和unstack
的组合来获取T1
- T4
列,然后加入月份列,如下所示:
(df.set_index(['type'],append=True)['value']
.unstack(fill_value=0)).join(df[['month']])
# T1 T2 T3 T4 month
# 0 10 0 0 0 1
# 1 40 0 0 0 1
# 2 0 0 0 20 1
# 3 0 30 0 0 2
# 4 0 0 10 0 2
# 5 40 0 0 0 3
# 6 0 0 50 0 3
答案 2 :(得分:2)
您可以使用set_index
,unstack
和reset_index
。最后为列的更改顺序添加reindex_axis
:
char[]