假设我们有这个数据框:
from pandas import *
d = {'one' : Series(["word", "other-word", "banana", "hello"]),
'two' : Series(["I like that word", "Have you seen other-word", "do you like bananas", "hello-kitty doll"])}
df = DataFrame(d)
如何删除one
中未出现two
的行?例如,在第三行banana
与bananas
不匹配:删除行。在第四个:hello
与hello-kitty
不匹配:drop。最后一个很重要:使用连字符-
构建的化合物是障碍物。
预期产出:
one two
0 word I like that word
1 other-word Have you seen other-word
答案 0 :(得分:2)
编辑:
另一种方法是计算要删除的索引列表并将它们存储在列表中,然后最后使用DataFrame.drop()
。示例/演示 -
In [45]: dropseries = []
In [46]: for i, row in df.iterrows():
....: if row['one'] not in row['two'].split():
....: dropseries.append(i)
....:
In [47]: df.drop(dropseries)
Out[47]:
one two
0 word I like that word
1 other-word Have you seen other-word
我不确定是否有更好的方法可以执行此操作,但您可以迭代每一行,然后在two
列中拆分字符串,然后检查列one
中的字符串是否存在是否在其中,然后追加与新数据帧匹配的行。
示例 -
newdf = pd.DataFrame()
for i, row in df.iterrows():
if row['one'] in row['two'].split():
newdf = newdf.append(row)
示例/演示 -
In [38]: newdf = pd.DataFrame()
In [39]: for i, row in df.iterrows():
....: if row['one'] in row['two'].split():
....: newdf = newdf.append(row)
....:
In [40]: newdf
Out[40]:
one two
0 word I like that word
1 other-word Have you seen other-word
答案 1 :(得分:2)
你可以这样做:
result = []
for x, y in zip(df.one, df.two):
if x in y.split():
result.append(True)
continue
result.append(False)
print df[result]
更好的方法:
df[[ x in y.split() for x, y in zip(df.one, df.two) ]]