Question

我抓取了一个网页表格，表格项目在一个连续的一维列表中，有重复的标题。我想将表重组为 DataFrame。

我有一个算法可以做到这一点，但我想知道是否有更pythonic/有效的方法来实现这一点？注意。我不一定知道我的表中有多少列。举个例子：

input = ['A',1,'B',5,'C',9,
         'A',2,'B',6,'C',10,
         'A',3,'B',7,'C',11,
         'A',4,'B',8,'C',12]

output = {}

it = iter(input)
val = next(it)

while val:
    if val in output:
        output[val].append(next(it))
    else:
        output[val] = [next(it)]

    val = next(it,None)

df = pd.DataFrame(output)

print(df)

结果：

   A  B   C
0  1  5   9
1  2  6  10
2  3  7  11
3  4  8  12

Answer 1

如果您的数据总是“表现良好”，那么这样的事情就足够了：

import pandas as pd

data = ['A',1,'B',5,'C',9,
         'A',2,'B',6,'C',10,
         'A',3,'B',7,'C',11,
         'A',4,'B',8,'C',12]

result = {}

for k,v in zip(data[::2], data[1::2]):
    result.setdefault(k, []).append(v)

df = pd.DataFrame(output)

Answer 2

你也可以使用 numpy reshape:

import numpy as np
cols = sorted(set(l[::2]))
df = pd.DataFrame(np.reshape(l, (int(len(l)/len(cols)/2), len(cols)*2)).T[1::2].T, columns=cols)

   A  B   C
0  1  5   9
1  2  6  10
2  3  7  11
3  4  8  12

说明：

# get columns
cols = sorted(set(l[::2]))

# reshape list into list of lists
shape = (int(len(l)/len(cols)/2), len(cols)*2)
np.reshape(l, shape)

# get only the values of the data
.T[1::2].T
# this transposes the data and slices every second step

将一维列表转换为二维数据帧

2 个答案: