我有这个输入工作表:
raw = pd.read_excel("raw.xlsx", header=None)
>Out[4]:
0 1 2 3 4
0 48 59.0 28.0 6.0 36.0
1 41 36.0 52.0 3.0 22.0
2 32 86.0 66.0 68.0 9.0
3 71 23.0 6.0 98.0 19.0
4 18 92.0 66.0 6.0 54.0
5 Andy NaN NaN NaN NaN
6 56 89.0 6.0 32.0 50.0
7 3 68.0 49.0 93.0 15.0
8 27 65.0 94.0 96.0 66.0
9 40 96.0 71.0 22.0 83.0
10 96 23.0 5.0 49.0 14.0
11 Bob NaN NaN NaN NaN
12 43 34.0 42.0 11.0 73.0
13 42 41.0 17.0 91.0 35.0
14 81 74.0 24.0 95.0 95.0
15 89 57.0 35.0 66.0 56.0
16 54 76.0 55.0 72.0 63.0
17 David NaN NaN NaN NaN
18 58 8.0 62.0 63.0 8.0
19 15 93.0 97.0 38.0 5.0
20 13 96.0 42.0 51.0 48.0
21 23 88.0 20.0 91.0 39.0
22 9 67.0 45.0 58.0 92.0
23 Bill NaN NaN NaN NaN
24 2 3.0 80.0 28.0 38.0
25 100 68.0 83.0 26.0 45.0
26 79 57.0 40.0 76.0 83.0
27 12 98.0 76.0 63.0 53.0
28 60 88.0 70.0 13.0 50.0
29 Luke NaN NaN NaN NaN
我必须在一行上重新排列数据的“块”,后跟名称。 格式是固定的,输出应该像这样
>Out[6]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
0 48 59 28 6 36 41 36 52 3 22 32 86 66 68 9 71 23 6 98 19 18 92 66 6 54 Andy
1 56 89 6 32 50 3 68 49 93 15 27 65 94 96 66 40 96 71 22 83 96 23 5 49 14 Bob
2 43 34 42 11 73 42 41 17 91 35 81 74 24 95 95 89 57 35 66 56 54 76 55 72 63 David
3 58 8 62 63 8 15 93 97 38 5 13 96 42 51 48 23 88 20 91 39 9 67 45 58 92 Bill
4 2 3 80 28 38 100 68 83 26 45 79 57 40 76 83 12 98 76 63 53 60 88 70 13 50 Luke
哪个是最Python化的方式?
谢谢
答案 0 :(得分:2)
代码的第一部分是许多行,这些行从字符串行中读取数据,例如,您需要执行pd.read_excel()
。
代码的真正部分是转换,在接下来的几行中:
a = df.values
a = a.reshape([a.size // 30, 30])
a = a[:, :-4]
df = pd.DataFrame(a)
以上内容甚至可以简化为单行:
df = pd.DataFrame(df.values.reshape((-1, 30))[:, :-4])
完整代码如下:
import pandas as pd, numpy as np
# Next is just reading-in my data in a fancy way,
# you do pd.read_excel(file) instead like you did before
df = pd.DataFrame([line.split() for line in """
48 59.0 28.0 6.0 36.0
41 36.0 52.0 3.0 22.0
32 86.0 66.0 68.0 9.0
71 23.0 6.0 98.0 19.0
18 92.0 66.0 6.0 54.0
Andy NaN NaN NaN NaN
56 89.0 6.0 32.0 50.0
3 68.0 49.0 93.0 15.0
27 65.0 94.0 96.0 66.0
40 96.0 71.0 22.0 83.0
96 23.0 5.0 49.0 14.0
Bob NaN NaN NaN NaN
43 34.0 42.0 11.0 73.0
42 41.0 17.0 91.0 35.0
81 74.0 24.0 95.0 95.0
89 57.0 35.0 66.0 56.0
54 76.0 55.0 72.0 63.0
David NaN NaN NaN NaN
58 8.0 62.0 63.0 8.0
15 93.0 97.0 38.0 5.0
13 96.0 42.0 51.0 48.0
23 88.0 20.0 91.0 39.0
9 67.0 45.0 58.0 92.0
Bill NaN NaN NaN NaN
2 3.0 80.0 28.0 38.0
100 68.0 83.0 26.0 45.0
79 57.0 40.0 76.0 83.0
12 98.0 76.0 63.0 53.0
60 88.0 70.0 13.0 50.0
Luke NaN NaN NaN NaN
""".splitlines() if line.strip()])
a = df.values
a = a.reshape([a.size // 30, 30])
a = a[:, :-4]
df = pd.DataFrame(a)
print(df)
输出:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
0 48 59.0 28.0 6.0 36.0 41 36.0 52.0 3.0 22.0 32 86.0 66.0 68.0 9.0 71 23.0 6.0 98.0 19.0 18 92.0 66.0 6.0 54.0 Andy
1 56 89.0 6.0 32.0 50.0 3 68.0 49.0 93.0 15.0 27 65.0 94.0 96.0 66.0 40 96.0 71.0 22.0 83.0 96 23.0 5.0 49.0 14.0 Bob
2 43 34.0 42.0 11.0 73.0 42 41.0 17.0 91.0 35.0 81 74.0 24.0 95.0 95.0 89 57.0 35.0 66.0 56.0 54 76.0 55.0 72.0 63.0 David
3 58 8.0 62.0 63.0 8.0 15 93.0 97.0 38.0 5.0 13 96.0 42.0 51.0 48.0 23 88.0 20.0 91.0 39.0 9 67.0 45.0 58.0 92.0 Bill
4 2 3.0 80.0 28.0 38.0 100 68.0 83.0 26.0 45.0 79 57.0 40.0 76.0 83.0 12 98.0 76.0 63.0 53.0 60 88.0 70.0 13.0 50.0 Luke