python,pandas - 将键值列转换为多列

时间:2017-05-05 04:20:19

标签: python pandas numpy

使用pandas,以下DataFrame

如何?
In [1]: import pandas as pd
In [2]: pd.DataFrame({'month': [1, 1, 1, 2, 2, 3, 3],
                      'type': ["T1", "T1", "T4", "T2", "T3", "T1", "T3"],
                      'value': [10, 40, 20, 30, 10, 40, 50]})
Out[2]: 
   month type  value
0      1   T1     10
1      1   T1     40
2      1   T4     20
3      2   T2     30
4      2   T3     10
5      3   T1     40
6      3   T3     50

要处理以产生下面的结果吗?

Out[3]: 
   T1  T2  T3  T4  month
0  10   0   0   0      1
1  40   0   0   0      1
2   0   0   0  20      1
3   0  30   0   0      2
4   0   0  10   0      2
5  40   0   0   0      3
6   0   0  50   0      3

3 个答案:

答案 0 :(得分:4)

<强> pandas
巧妙使用pd.get_dummies

pd.get_dummies(df.type).mul(df.value, 0).join(df.month)

   T1  T2  T3  T4  month
0  10   0   0   0      1
1  40   0   0   0      1
2   0   0   0  20      1
3   0  30   0   0      2
4   0   0  10   0      2
5  40   0   0   0      3
6   0   0  50   0      3

<强> numpy
或者同样的想法,但超级充电

u, inv = np.unique(df.type.values, return_inverse=True)
eye = np.eye(u.size, dtype=int)
v = df.value.values
m = df.month.values
pd.DataFrame(
    np.column_stack([eye[inv] * v[:, None], m]),
    df.index, np.append(u, 'month')
)

   T1  T2  T3  T4  month
0  10   0   0   0      1
1  40   0   0   0      1
2   0   0   0  20      1
3   0  30   0   0      2
4   0   0  10   0      2
5  40   0   0   0      3
6   0   0  50   0      3

时间

%timeit pd.get_dummies(df.type).mul(df.value, 0).join(df.month)
1000 loops, best of 3: 1.1 ms per loop

%%timeit
u, inv = np.unique(df.type.values, return_inverse=True)
eye = np.eye(u.size, dtype=int)
v = df.value.values
m = df.month.values
pd.DataFrame(
    np.column_stack([eye[inv] * v[:, None], m]),
    df.index, np.append(u, 'month')
)
10000 loops, best of 3: 189 µs per loop

%%timeit
(df.set_index(['type'],append=True)['value']
   .unstack(fill_value=0)).join(df[['month']])
100 loops, best of 3: 1.92 ms per loop

%%timeit
d1 = df.set_index(['month','type'], append=True)['value'] \
       .unstack(fill_value=0) \
       .reset_index(level=1) \

cols = d1.columns[1:].tolist() + d1.columns[:1].tolist() 
d1 = d1.reindex_axis(cols, axis=1)
d1
100 loops, best of 3: 2.48 ms per loop

答案 1 :(得分:3)

您可以使用set_indexunstack的组合来获取T1 - T4列,然后加入月份列,如下所示:

(df.set_index(['type'],append=True)['value']
   .unstack(fill_value=0)).join(df[['month']])
#    T1  T2  T3  T4  month
# 0  10   0   0   0      1
# 1  40   0   0   0      1
# 2   0   0   0  20      1
# 3   0  30   0   0      2
# 4   0   0  10   0      2
# 5  40   0   0   0      3
# 6   0   0  50   0      3 

答案 2 :(得分:2)

您可以使用set_indexunstackreset_index。最后为列的更改顺序添加reindex_axis

char[]