规范化大熊猫中的长数据

时间:2015-11-20 00:21:58

标签: python pandas pivot dataframe pivot-table

我无法找到如何在熊猫中以长格式标准化数据。在R中,我会将数据转换为规范化,然后融化。但我无法弄清楚如何“反转”pivot_table,这是一个例子:

import pandas as pd

# Declare the dataframe
df = pd.DataFrame({'Time': [1, 1, 1, 2, 2, 2],
        'Machine': ['A', 'B', 'C', 'A', 'B','C'],
                   'Sensor1': [2.0,3.0,3.0,4.0,5.0,6.0],
                   'Sensor2': [1, 1, 3, 1, 1, 3]        
    })

df看起来像

    Machine  Sensor1  Sensor2  Time
0       A        2        1     1
1       B        3        1     1
2       C        3        3     1
3       A        4        1     2
4       B        5        1     2
5       C        6        3     2    

透视

# Pivot
dfWide = pd.pivot_table(df,index = 'Time',values=['Sensor1','Sensor2'],columns='Machine')

# Normalize
machines = ['C','B','A'] # Backwards to normalize A last
for m in machines:
    dfWide.loc[:,('Sensor1',m)] = dfWide.loc[:,('Sensor1',m)] / dfWide.loc[:,('Sensor1','A')]
# Revert to original (long) form

这就是dfWide的样子

print(dfWide)
              Sensor1        Sensor2
Machine     A    B    C     A   B   C
Time                        
1           1   1.50  1.5   1   1   3
2           1   1.25  1.5   1   1   3

有谁知道如何完成最后一步?

1 个答案:

答案 0 :(得分:2)

检查pd.melt()

In [134]:
df_melted = pd.melt(dfWide.reset_index() , id_vars=['Time'] , var_name=['Sensors' , 'Machine'])
df_melted
Out[134]:
   Time Sensors Machine value
0   1   Sensor1 A       1.00
1   2   Sensor1 A       1.00
2   1   Sensor1 B       1.50
3   2   Sensor1 B       1.25
4   1   Sensor1 C       1.50
5   2   Sensor1 C       1.50
6   1   Sensor2 A       1.00
7   2   Sensor2 A       1.00
8   1   Sensor2 B       1.00
9   2   Sensor2 B       1.00
10  1   Sensor2 C       3.00
11  2   Sensor2 C       3.00

In [148]:
res = pd.pivot_table(df_melted ,index=['Time' , 'Machine'] , columns=['Sensors']).reset_index()
res
Out[148]:
            Time    Machine      value
Sensors                     Sensor1 Sensor2
0           1           A   1.00    1
1           1           B   1.50    1
2           1           C   1.50    3
3           2           A   1.00    1
4           2           B   1.25    1
5           2           C   1.50    3

In [150]:
res.columns = ['Time' , 'Machine' , 'Sensor1' , 'Sensor2']
res
Out[150]:
  Time  Machine Sensor1 Sensor2
0   1   A       1.00    1
1   1   B       1.50    1
2   1   C       1.50    3
3   2   A       1.00    1
4   2   B       1.25    1
5   2   C       1.50    3