重新排列熊猫数据框

时间:2020-09-28 13:09:09

标签: python pandas dataframe

我有这个输入工作表:

raw = pd.read_excel("raw.xlsx", header=None)

>Out[4]: 
        0     1     2     3     4
0      48  59.0  28.0   6.0  36.0
1      41  36.0  52.0   3.0  22.0
2      32  86.0  66.0  68.0   9.0
3      71  23.0   6.0  98.0  19.0
4      18  92.0  66.0   6.0  54.0
5    Andy   NaN   NaN   NaN   NaN
6      56  89.0   6.0  32.0  50.0
7       3  68.0  49.0  93.0  15.0
8      27  65.0  94.0  96.0  66.0
9      40  96.0  71.0  22.0  83.0
10     96  23.0   5.0  49.0  14.0
11    Bob   NaN   NaN   NaN   NaN
12     43  34.0  42.0  11.0  73.0
13     42  41.0  17.0  91.0  35.0
14     81  74.0  24.0  95.0  95.0
15     89  57.0  35.0  66.0  56.0
16     54  76.0  55.0  72.0  63.0
17  David   NaN   NaN   NaN   NaN
18     58   8.0  62.0  63.0   8.0
19     15  93.0  97.0  38.0   5.0
20     13  96.0  42.0  51.0  48.0
21     23  88.0  20.0  91.0  39.0
22      9  67.0  45.0  58.0  92.0
23   Bill   NaN   NaN   NaN   NaN
24      2   3.0  80.0  28.0  38.0
25    100  68.0  83.0  26.0  45.0
26     79  57.0  40.0  76.0  83.0
27     12  98.0  76.0  63.0  53.0
28     60  88.0  70.0  13.0  50.0
29   Luke   NaN   NaN   NaN   NaN

我必须在一行上重新排列数据的“块”,后跟名称。 格式是固定的,输出应该像这样

>Out[6]: 
   0   1   2   3   4    5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24     25
0  48  59  28   6  36   41  36  52   3  22  32  86  66  68   9  71  23   6  98  19  18  92  66   6  54   Andy
1  56  89   6  32  50    3  68  49  93  15  27  65  94  96  66  40  96  71  22  83  96  23   5  49  14    Bob
2  43  34  42  11  73   42  41  17  91  35  81  74  24  95  95  89  57  35  66  56  54  76  55  72  63  David
3  58   8  62  63   8   15  93  97  38   5  13  96  42  51  48  23  88  20  91  39   9  67  45  58  92   Bill
4   2   3  80  28  38  100  68  83  26  45  79  57  40  76  83  12  98  76  63  53  60  88  70  13  50   Luke

哪个是最Python化的方式?

谢谢

1 个答案:

答案 0 :(得分:2)

代码的第一部分是许多行,这些行从字符串行中读取数据,例如,您需要执行pd.read_excel()

代码的真正部分是转换,在接下来的几行中:

a = df.values
a = a.reshape([a.size // 30, 30])
a = a[:, :-4]
df = pd.DataFrame(a)

以上内容甚至可以简化为单行:

df = pd.DataFrame(df.values.reshape((-1, 30))[:, :-4])

完整代码如下:

Try it online!

import pandas as pd, numpy as np

# Next is just reading-in my data in a fancy way,
# you do pd.read_excel(file) instead like you did before
df = pd.DataFrame([line.split() for line in """
   48  59.0  28.0   6.0  36.0
   41  36.0  52.0   3.0  22.0
   32  86.0  66.0  68.0   9.0
   71  23.0   6.0  98.0  19.0
   18  92.0  66.0   6.0  54.0
 Andy   NaN   NaN   NaN   NaN
   56  89.0   6.0  32.0  50.0
    3  68.0  49.0  93.0  15.0
   27  65.0  94.0  96.0  66.0
   40  96.0  71.0  22.0  83.0
   96  23.0   5.0  49.0  14.0
  Bob   NaN   NaN   NaN   NaN
   43  34.0  42.0  11.0  73.0
   42  41.0  17.0  91.0  35.0
   81  74.0  24.0  95.0  95.0
   89  57.0  35.0  66.0  56.0
   54  76.0  55.0  72.0  63.0
David   NaN   NaN   NaN   NaN
   58   8.0  62.0  63.0   8.0
   15  93.0  97.0  38.0   5.0
   13  96.0  42.0  51.0  48.0
   23  88.0  20.0  91.0  39.0
    9  67.0  45.0  58.0  92.0
 Bill   NaN   NaN   NaN   NaN
    2   3.0  80.0  28.0  38.0
  100  68.0  83.0  26.0  45.0
   79  57.0  40.0  76.0  83.0
   12  98.0  76.0  63.0  53.0
   60  88.0  70.0  13.0  50.0
 Luke   NaN   NaN   NaN   NaN
""".splitlines() if line.strip()])

a = df.values
a = a.reshape([a.size // 30, 30])
a = a[:, :-4]
df = pd.DataFrame(a)

print(df)

输出:

    0     1     2     3     4    5     6     7     8     9  10    11    12    13    14  15    16    17    18    19  20    21    22    23    24     25
0  48  59.0  28.0   6.0  36.0   41  36.0  52.0   3.0  22.0  32  86.0  66.0  68.0   9.0  71  23.0   6.0  98.0  19.0  18  92.0  66.0   6.0  54.0   Andy
1  56  89.0   6.0  32.0  50.0    3  68.0  49.0  93.0  15.0  27  65.0  94.0  96.0  66.0  40  96.0  71.0  22.0  83.0  96  23.0   5.0  49.0  14.0    Bob
2  43  34.0  42.0  11.0  73.0   42  41.0  17.0  91.0  35.0  81  74.0  24.0  95.0  95.0  89  57.0  35.0  66.0  56.0  54  76.0  55.0  72.0  63.0  David
3  58   8.0  62.0  63.0   8.0   15  93.0  97.0  38.0   5.0  13  96.0  42.0  51.0  48.0  23  88.0  20.0  91.0  39.0   9  67.0  45.0  58.0  92.0   Bill
4   2   3.0  80.0  28.0  38.0  100  68.0  83.0  26.0  45.0  79  57.0  40.0  76.0  83.0  12  98.0  76.0  63.0  53.0  60  88.0  70.0  13.0  50.0   Luke