Question

我试图基于另一个框架构建一个数据框架。为了构建第二个，我需要遍历第一个数据帧并对数据进行一些更改，然后将其插入第二个。我在for循环中使用namedTuple。

此循环需要大量时间来处理200万行数据。有没有最快的方法可以做到这一点？

Answer 1

由于通常pandas数据框是基于列构建的，因此似乎无法提供遍历行的方法。但是，这是我用来处理熊猫数据帧中每一行的方式：

rows = zip(*(table.loc[:, each] for each in table))
for rowNum, record in enumerate(rows):
    # If you want to process record, modify the code to process here:
    # Otherwise can just print each row
    print("Row", rowNum, "records: ", record)

顺便说一句，我仍然建议您寻找一些可以帮助您处理第一个数据框的熊猫方法-通常比编写自己的方法更快，更有效。希望这会有所帮助。

Answer 2

我建议使用熊猫内置的iterrows函数。

data = {'Name': ['John', 'Paul', 'George'], 'Age': [20, 21, 19]}
  db = pd.DataFrame(data)
  print(f"Dataframe:\n{db}\n")
    for row, col in db.iterrows():
      print(f"Row Index:{row}")
      print(f"Column:\n{col}\n")

上面的输出：

Dataframe:
     Name  Age
0    John   20
1    Paul   21
2  George   19

Row Index:0
Column:
Name    John
Age       20
Name: 0, dtype: object

Row Index:1
Column:
Name    Paul
Age       21
Name: 1, dtype: object

Row Index:2
Column:
Name    George
Age         19
Name: 2, dtype: object

在Python中遍历数据框的最佳方法是什么？

2 个答案: