Question

我有一个带有以下列的数据框

import pandas as pd

df = pd.DataFrame({'num': [1, 2, 2, 3, 4, 5, 6]})

我想创建一个使用df ['num']。shift（）的列，以将当前单元格值与其下方的单元格值进行比较，如果匹配，则该值应为True或False。

预期输出：

   num  matches?
0    1     False
1    2      True
2    2     False
3    3     False
4    4     False
5    5     False
6    6     False

当我使用以下代码时，我无法找到遍历每个单元格并比较条件的最佳方法：


df['matches?'] = ''

for i in range(len(df)):
    if df['num'] == df['num'].shift(1):
        df['matches?'] = True
    else:
        df['matches?'] = False

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

如何以最简单的方式实现这一目标？

Answer 1

当您使用shift时，您无需循环，该操作将被矢量化，并为您完成：

df['matches?'] = df['num'].shift(-1)==df['num']

输出：

    num    matches?
0   1      False
1   2      True
2   2      False
3   3      False
4   4      False
5   5      False
6   6      False

修改

如果要保留循环逻辑：

for ix, row in df.iterrows():
    if ix < len(df)-1:
        if df.loc[ix, 'num'] == df.loc[ix+1, 'num']:
            df.loc[ix, 'matches?'] = True
        else:
            df.loc[ix, 'matches?'] = False
    else: #last observation
        df.loc[ix, 'matches?'] = False

输出：

    num    matches?
0   1      False
1   2      True
2   2      False
3   3      False
4   4      False
5   5      False
6   6      False

Answer 2

在使用数值时，可以使用diff在两行之间进行计算，请参见下面的代码

df['matches?'] = df['num'].diff(-1).eq(0)  #eq means equal to

Answer 3

保持for循环不变。您可以尝试

i=0
for i in range(0,len(df)-1):
    if df['num'][i] == df['num'][i+1]:
        df['matches?'][i] = "T"
    else:
        df['matches?'][i] = "F"

您应该遍历len(df)-1，因为当循环到达最后一行时，之后没有任何可比较的内容。它将在那里引发索引错误。

输出

+-----+----------+
| num | matches? |
+-----+----------+
|   1 | F        |
|   2 | T        |
|   2 | F        |
|   3 | F        |
|   4 | F        |
|   5 | F        |
|   6 | F        |
+-----+----------+

Answer 4

您还可以添加elif并通过打破如下所述的逻辑3部分来实现

df['matches?'] = ''

for i in range(0,len(df['num'])):
    if i+1==len(df['num']):
       df['matches?'][i] = False
    elif df['num'][i] == df['num'][i+1]:
       df['matches?'][i] = True
    else:
       df['matches?'][i] = False

熊猫：将单元格值与同一列下面的单元格值进行比较？

4 个答案: