Question

我有一个Pandas DataFrame，如下所示：

       top         heading  page_no
0   000000           Intro        0
1   100164         Summary        1
2   100451      Experience        1
3   200131          Awards        2
4   200287         Skills         2
5   300147       Education        3
6   300273          Awards        3
7   300329       Interests        3
8   300434  Certifications        3
9   401135             End        4

我使用了一个过滤器，它使用这个数据帧从另一个数据帧中获取内容。它需要过滤顶部之间的所有东西，即从000000到100164，依此类推，直到300434到401135.

for index,row in df_heads.iterrows():
    begin = int(row['top'])
    end = ???
    filter_result = result['data'][(result.top < end) & (result.top > begin)]
    print(row['heading'])
    print(filter_result)
    sections[row['heading']] = filter_result
    end = begin

结束应该用什么来初始化，以便我们以正确的方式获取过滤器的内容？

Answer 1

我认为您可以按shift创建新列，然后根据需要将NaN替换为0 fillna：

df_heads['shifted_top'] = df_heads['top'].shift(-1).fillna(0)
print (df_heads)
      top         heading  page_no  shifted_top
0       0           Intro        0     100164.0
1  100164         Summary        1     100451.0
2  100451      Experience        1     200131.0
3  200131          Awards        2     200287.0
4  200287          Skills        2     300147.0
5  300147       Education        3     300273.0
6  300273          Awards        3     300329.0
7  300329       Interests        3     300434.0
8  300434  Certifications        3     401135.0
9  401135             End        4          0.0

for index,row in df_heads.iterrows():
    begin = int(row['top'])
    end =  int(row['shifted_top'])
    print (begin, end)

0 100164
100164 100451
100451 200131
200131 200287
200287 300147
300147 300273
300273 300329
300329 300434
300434 401135
401135 0

Answer 2

您无法使用df_heads.iterrows（）循环中的for索引行访问其他行的数据。如上例所示，需要在循环外部使用不同行的数据创建一个附加变量。

df_heads['shifted_top'] = df_heads['top'].shift(-1).fillna(0)

引用iterrows中的下一个索引（）

2 个答案: