Question

我试图以相反的顺序遍历数据帧行。

基于行位置而不是索引名称。

虽然该代码应该可以工作，但是不能。

for i, row in enumerate(df[::-1].iterrows()):  
    print (i)

当我运行它时，它会产生

而不是

Answer 1

有关如何在问题上使用iterrows()的评论为反向循环遍历DataFrame的行提供了答案。它还介绍了为了简单起见使用列表理解的想法。

将遇到越来越大的数据集的性能和内存问题。反向访问DataFrame中的数据是一种更有效的方法。

以下内容有助于为新的熊猫用户提供指导。要点是将数据帧索引标签放在一列中，该列将创建一个有序的新索引，以保留行位置，因此可逆。

import pandas as pd
import numpy as np
import timeit
print(pd.__version__)

# random dataframe, provides ordered rangeindex
df = pd.DataFrame(np.random.randint(0,1000,size=(1000, 4)), columns=list('ABCD'))
# toss the ordered rangeindex and make the random 'A' the index
df.set_index(['A'], inplace=True)
# df is now a dataframe with an unordered index

def iterate(df):
    for i,r in df[::-1].iterrows():
        # process
        pass

def sort_and_apply(df):
    # apply order to the index by resetting it to a column
    # this indicates original row position by create a rangeindex.
    # (this also copies the dataframe, critically slowing down this function 
    # which is still much faster than iterate()).
    new_df = df.reset_index()

    # sort on the newly applied rangeindex and process
    new_df.sort_index(ascending=False).apply(lambda x:x)

if __name__ == '__main__':
    print("iterate ", timeit.timeit("iterate(df)", setup="from __main__ import iterate, df", number=50))
    print("sort_and_apply ",timeit.timeit("sort_and_apply(df)", setup="from __main__ import sort_and_apply, df", number=50))

生产

0.24.2
iterate  2.893160949
sort_and_apply  0.12744747599999995

Answer 2

我接受重新编制索引，您也可以这样做

for i, row in enumerate(df.reindex().sort_index(ascending=False):  
    print (i)

反向遍历数据帧行

2 个答案: