Question

假设我们有这个数据框：

from pandas import *

d = {'one' : Series(["word", "other-word", "banana", "hello"]),
    'two' : Series(["I like that word", "Have you seen other-word", "do you like bananas", "hello-kitty doll"])}

df = DataFrame(d)

如何删除one中未出现two的行？例如，在第三行banana与bananas不匹配：删除行。在第四个：hello与hello-kitty不匹配：drop。最后一个很重要：使用连字符-构建的化合物是障碍物。

预期产出：

          one                       two
0        word          I like that word
1  other-word  Have you seen other-word

Answer 1

编辑：

另一种方法是计算要删除的索引列表并将它们存储在列表中，然后最后使用DataFrame.drop()。示例/演示 -

In [45]: dropseries = []

In [46]: for i, row in df.iterrows():
   ....:     if row['one'] not in row['two'].split():
   ....:         dropseries.append(i)
   ....:

In [47]: df.drop(dropseries)
Out[47]:
          one                       two
0        word          I like that word
1  other-word  Have you seen other-word

我不确定是否有更好的方法可以执行此操作，但您可以迭代每一行，然后在two列中拆分字符串，然后检查列one中的字符串是否存在是否在其中，然后追加与新数据帧匹配的行。

示例 -

newdf = pd.DataFrame()

for i, row in df.iterrows():
    if row['one'] in row['two'].split():
        newdf = newdf.append(row)

示例/演示 -

In [38]: newdf = pd.DataFrame()

In [39]: for i, row in df.iterrows():
   ....:     if row['one'] in row['two'].split():
   ....:         newdf = newdf.append(row)
   ....:

In [40]: newdf
Out[40]:
          one                       two
0        word          I like that word
1  other-word  Have you seen other-word

Answer 2

你可以这样做：

result = []
for x, y in zip(df.one, df.two):
    if x in y.split():
        result.append(True)
        continue
    result.append(False)

print df[result]

更好的方法：

df[[ x in y.split() for x, y in zip(df.one, df.two) ]]

使用pandas：如果在一行中，列中的单词不会出现在其他列的字符串中，则删除行

2 个答案: