Question

我试图遍历整个Python Pandas DataFrame，但似乎并没有遍历整个DataFrame。它适用于较短长度的DataFrame，但不适用于此长度。另外，我在Jupyter Notebook工作。

我添加了一些打印语句以进行尝试和调试。

def dropNotIn(df):

    print(df.shape)

    removedlist = []
    droplist = []

    for i, x in df.iterrows():
        rownum = i

    print(rownum)
    print(len(df))

dropNotIn（df）的结果：

(59610, 9)
3449 --> Expected to be 59610
59610

这是我的df.head（）：

    date    attendance  venue_city  venue_state venue_name  away_team   home_team   away_points home_points
9   2015-12-13  1740.0  Chicago IL  McGrath-Phillips Arena  Arkansas-Little Rock    DePaul  66  44
13  2015-11-22  0.0 St. Thomas  NaN Virgin Islands Sport & Fitness Center   Tulsa   Indiana State   67  59
14  2014-12-04  3469.0  St. Bonaventure NY  Reilly Center   Buffalo St. Bonaventure 63  72
21  2015-11-20  1522.0  St. Thomas  NaN Virgin Islands Sport & Fitness Center   Hofstra Florida State   82  77
24  2014-11-23  NaN St. Thomas  NaN Virgin Islands Sport & Fitness Center   Gardner-Webb    Seton Hall  67  85

Answer 1

在熊猫中，DataFrame.iterrows()产生 index 和该行。索引是由您控制的，查看示例数据时，您没有一个紧密堆积的整数的索引，而是其他东西。

请尝试以下代码：

def dropNotIn(df):

    print(df.shape)

    removedlist = []
    droplist = []

    num_rows = 0
    for i, x in df.iterrows():
        num_rows += 1

    print(num_rows)
    print(len(df))

这将显式计数行，而不是尝试使用索引。如果您真的想在操作过程中对行进行计数，建议您使用内置函数enumerate：

for num, (index, row) in enumerate(df.iterrows()):
   pass

但是，我怀疑您可能不想这样做，因为在使用数据框进行处理时，您希望将它们向量化。

Answer 2

iterrow遍历不等于rownum的索引。您可能有一些索引具有多于一行。

尝试打开x,y = df.shape()的包装，并在range(x)周围进行迭代

Python For Loop无法迭代整个数据框

2 个答案: