Question

如果df['col']='a','b','c'和df2['col']='a123','b456','d789'如何创建df2['is_contained']='a','b','no_match'，其中df['col']的值来自df2['col'] df['col']值返回，如果未找到匹配项，则返回“no_match”？另外我不希望有多个匹配，但在不太可能的情况下，我想要返回一个像'Multiple Matches'这样的字符串。

Answer 1

使用此玩具数据集，我们要向df2添加一个新列，前三行将包含no_match，最后一行将包含值'd'到期事实上，该行的col值（字母'a'）出现在df1中。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


df1 = pd.DataFrame({'col': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'col': ['a123','b456','d789', 'a']})

换句话说，df1中的值只有在行df2值出现在df2['col']的某个位置时，才会用{}} df1['col']中的新列填充。

In [2]: df1
Out[2]:
  col
0   a
1   b
2   c
3   d

In [3]: df2
Out[3]:
    col
0  a123
1  b456
2  d789
3     a

如果这是理解您的问题的正确方法，那么您可以使用pandas isin执行此操作：

In [4]: df2.col.isin(df1.col)
Out[4]:
0    False
1    False
2    False
3     True
Name: col, dtype: bool

仅当True中的值也在df2.col中时，此评估结果为df1.col。

如果您熟悉R，则可以使用与np.where大致相同的ifelse。

In [5]:     np.where(df2.col.isin(df1.col), df1.col, 'NO_MATCH')
Out[5]:
0    NO_MATCH
1    NO_MATCH
2    NO_MATCH
3           d
Name: col, dtype: object

对于df2.col中出现df1.col值的行，将为给定的行索引返回df1.col的值。如果df2.col值不是df1.col的成员，则会使用默认的'NO_MATCH'值。

Answer 2

在0.13中，您可以使用str.extract：

In [11]: df1 = pd.DataFrame({'col': ['a', 'b', 'c']})

In [12]: df2 = pd.DataFrame({'col': ['d23','b456','a789']})

In [13]: df2.col.str.extract('(%s)' % '|'.join(df1.col))
Out[13]: 
0    NaN
1      b
2      a
Name: col, dtype: object

Answer 3

您必须首先保证索引匹配。为简化起见，我将显示列好像在同一数据框中。诀窍是在列轴中使用apply方法：

df = pd.DataFrame({'col1': ['a', 'b', 'c', 'd'],
                   'col2': ['a123','b456','d789', 'a']})
df['contained'] = df.apply(lambda x: x.col1 in x.col2, axis=1)
df
  col1  col2  contained
0    a  a123       True
1    b  b456       True
2    c  d789      False
3    d     a      False

检查Pandas列是否包含其他列的值

3 个答案: