如何将此pandas.read_csv结果转换为以下格式?

时间:2016-02-23 23:03:48

标签: python pandas formatting

以下代码

import pandas as pd

df = pd.load_csv('trace.data')

print(df.ix[0:1, :])

生成以下DataFrame

   frame#  X-1  Y-1  Angle-1  Error-1  X-5  Y-5  Angle-5  Error-5  X-12  \
0       1  NaN  NaN      NaN      NaN  NaN  NaN      NaN      NaN   NaN   
1       2  NaN  NaN      NaN      NaN  NaN  NaN      NaN      NaN   NaN   

      ...      Angle-1355  Error-1355  X-1384  Y-1384  Angle-1384  Error-1384  \
0     ...             NaN         NaN     NaN     NaN         NaN         NaN   
1     ...             NaN         NaN     NaN     NaN         NaN         NaN   

   X-1408  Y-1408  Angle-1408  Error-1408  
0     853    2340  283.262859           0  
1     NaN     NaN         NaN         NaN  

[2 rows x 801 columns]

每行对应于单个图片帧的所有测量值的集合。

第一列是帧的编号。

从第二列开始,每4个连续列是X位置,Y位置,角度和该测量的误差。

i中的数字X-i Y-i Angle-i Error-i是该点的ID。

我想将DataFrame变成这种形式的DataFrame:

  • 帧#
  • 点ID(i中的X-iY-i等)
  • 维度名称(例如XY等)
  • 测量(实际测量,float64

一只受人尊敬的熊猫会怎么做?

1 个答案:

答案 0 :(得分:2)

df = pd.DataFrame({'frame': [1, 2],
                   'Angle-1': [1.6288175485083471, -0.16980795008048055],
                   'Angle-1355': [-0.23364001238956567, 0.10508954185705043],
                   'Angle-1384': [-0.1055306764132989, 1.5766485876766343],
                   'Angle-5': [1.0530749477672805, -0.58051944875155881],
                   'Error-1': [-0.22597615373237354, -0.067869089031437124],
                   'Error-1355': [-1.1205136108736824, 1.5398343350154859],
                   'Error-1384': [0.2072177497820725, 1.5802856128691691],
                   'Error-5': [-0.054906215727689098, -0.115633635459458],
                   'X-1': [1.2374207482997275, -0.74052859017582551],
                   'X-12': [-0.10554748111840574, 0.51297919944988468],
                   'X-1384': [2.2710928129358541, 2.2873598143523743],
                   'X-5': [-0.68576722189220918, 1.480319768103725],
                   'Y-1': [-0.72686786051739416, 1.662550986420245],
                   'Y-1384': [-1.384276797510166, 0.89414830326943084],
                   'Y-5': [-0.12183746322452065, 1.0471295991115857]})

根据上面的示例数据框,您可以弹出frames列,并使用列表推导将其重新整形为展平结构。使用连字符拆分列并重新分配,创建MultiIndex。然后将new_frames与融化的数据框水平连接。

瞧!

frames = df.pop('frame')
new_frames = [i for j in range(df.shape[1]) for i in frames]

df.columns = df.columns.str.split('-', expand=True)

>>> (pd.concat([pd.DataFrame(new_frames), pd.melt(df)], axis=1, ignore_index=True)
     .rename(columns={0: 'frame', 1: 'dimension', 2: 'point', 3: 'measurement'}))
    frame dimension point  measurement
0       1     Angle     1     1.628818
1       2     Angle     1    -0.169808
2       1     Angle  1355    -0.233640
3       2     Angle  1355     0.105090
4       1     Angle  1384    -0.105531
5       2     Angle  1384     1.576649
6       1     Angle     5     1.053075
7       2     Angle     5    -0.580519
8       1     Error     1    -0.225976
9       2     Error     1    -0.067869
10      1     Error  1355    -1.120514
11      2     Error  1355     1.539834
12      1     Error  1384     0.207218
13      2     Error  1384     1.580286
14      1     Error     5    -0.054906
15      2     Error     5    -0.115634
16      1         X     1     1.237421
17      2         X     1    -0.740529
18      1         X    12    -0.105547
19      2         X    12     0.512979
20      1         X  1384     2.271093
21      2         X  1384     2.287360
22      1         X     5    -0.685767
23      2         X     5     1.480320
24      1         Y     1    -0.726868
25      2         Y     1     1.662551
26      1         Y  1384    -1.384277
27      2         Y  1384     0.894148
28      1         Y     5    -0.121837
29      2         Y     5     1.047130