Question

我有一个表格

的pandas数据框

0   x     y    z
1   .5   .1    4
2   .6   .2    5

我想把它转换成前两个cols的dicts列表，即 [{'x'：0.5，'y'：0.1}，{'x'：0.6，'y'：0.2} .......

我可以写一个循环并以愚蠢的方式做，是否有更好的更快方式？

Answer 1

您可以使用iterrows。这使您可以将行迭代为Series，而非迭代，但这非常相似（例如iteritems()，__getitem__等。

如果您必须使用dicts，则可以使用Series方法轻松地将每个to_dict()转换为dict。

例如：

list_of_dicts = list( row.to_dict() for key, row in df.iterrows() )

Answer 2

您可以使用to_dict()方法。让yourdata.csv成为.csv格式的数据：

df = pd.read_csv('yourdata.csv')

d = df[['x','y']].to_dict('index').values()

应该有效。它返回：

[{'y': 0.1, 'x': 0.5}, {'y': 0.2, 'x': 0.6}]

Answer 3

将to_dict(orient='records')与orient=records一起使用，速度更快。

In [2]: df[['x', 'y']].to_dict(orient='records')
Out[2]:
[{'x': 0.5, 'y': 0.1}, {'x': 0.6, 'y': 0.2}]

<强>计时

In [8]: df.shape
Out[8]: (10000, 4)

In [9]: %timeit df[['x', 'y']].to_dict(orient='records')
10 loops, best of 3: 68.4 ms per loop

In [10]: %timeit df[['x','y']].to_dict('index').values()
1 loop, best of 3: 570 ms per loop 

In [11]: %timeit list(row.to_dict() for key, row in df[['x', 'y']].iterrows())
1 loop, best of 3: 575 ms per loop

从Pandas df创建一个dicts列表

3 个答案: