Question

Python 2.7

In [3]:import pandas as pd
df = pd.DataFrame(dict(A=['abc','abc','abc','xyz','xyz'],
                       B='abcdef','abcdefghi','notthisone','uvwxyz','orthisone']))
In [4]: df
Out[4]:
    A   B
0   abc abcdef
1   abc abcdefghi
2   abc notthisone
3   xyz uvwxyz
4   xyz orthisone

In [12]:  df[df.B.str.contains(df.A) == True] 
# just keep the B that contain A string

TypeError: 'Series' objects are mutable, thus they cannot be hashed

我正在努力：

    A   B
0   abc abcdef
1   abc abcdefghi
3   xyz uvwxyz

我尝试过str.contains语句的变体，但没有去。非常感谢任何帮助。

Answer 1

str.contains看起来不支持多种模式，因此您可能只需要应用于行：

substr_matches = df.apply(lambda row: row['B'].find(row['A']) > -1, axis=1)

df.loc[substr_matches]
Out[11]: 
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz

Answer 2

在行上应用lambda函数并测试A是否在B中。

>>> df[df.apply(lambda x: x.A in x.B, axis=1)]
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz

Answer 3

您可以在“A”列上致电unique，然后加入|以使用contains创建匹配模式：

In [15]:
df[df['B'].str.contains('|'.join(df['A'].unique()))]

Out[15]:
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz

Answer 4

这个怎么样？

In [8]: df[df.apply(lambda v: v['A'] in v['B'], axis=1)]
Out[8]: 
     A          B
0  abc     abcdef
1  abc  abcdefghi
3  xyz     uvwxyz

使用substring选择数据框行时遇到问题

Python 2.7

4 个答案: