使用熊猫isin（）

Question

美好的一天！我在更新csv文件时遇到了一些问题（我使用的是熊猫）。如果row元素与我定义的数组相同，我希望代码删除csv文件中的一行

例如，如果我的csv文件中包含以下行：

and 2
hi  3
or  4
is  5
hello 6

和定义的数组a由以下给出：

a = ['and', 'or', 'is']

d = {}

for k, v in reader.values:
    if a == k:
        break
    else:
        d[k] = int(v)

reader是我使用熊猫打开csv文件的变量的名称

我期望有一个字典，其中数组中列出的单词不会存储在d中。我期待这种输出：

{'hi':3, 'hello': 6}

当我检查输出时，数组a中列出的单词仍包含在字典中。希望您能帮助我，谢谢！

Answer 1

使用熊猫isin（）

假设您的数据框看起来像下面的数据框，我称之为df，其中的列为“ word”和“ number”。

    word    number
0   and     2
1   hi      3
2   or      4
3   is      5
4   hello   6

我会使用熊猫的isin函数。

In [1]: a = ['and', 'or', 'is']
        df[~df['word'].isin(a)]
Out[1]: word    number
      1 hi      3
      4 hello   6

然后，如果您想要字典，则只需压缩所需的列即可。

In [2]: a = ['and', 'or', 'is']
        df2 = df[~df['word'].isin(a)]
        dict(zip(df2['word'], df2['number']))
Out[2]: {'hello': 6, 'hi': 3}

使用您的原始代码

如果您希望原始代码正常工作，只需替换if和break语句即可。

d = {}
for k, v in df.values:
    print(k)
    if k in a:
        continue
    else:
        d[k] = int(v)

请注意，a是列表，k是字符串。因此，a==k始终会得出false，并且您永远不会跳过任何值。相反，您需要检查是否为k in a。另外，break并不是您真正想要的，因为一旦您在a中遇到一个值，它就会停止for循环。您需要的是continue，因此您只需移至数据框中的下一个值即可。

Answer 2

使用df.replace()将列表a替换为nan，然后使用dropna()获得dict()：

#replace 0 with first col name
d=dict(df.replace(a,np.nan).dropna(subset=[0]).values)

{'hi': 3, 'hello': 6}

删除与数组中的元素不匹配的行

2 个答案:

使用熊猫isin（）

使用您的原始代码