Question

我正在尝试从一列到另一列搜索值和值的部分，并返回第三个值。

基本上，我有两个数据帧：df和df2。第一个在“ col1”中具有部件号。第二个在'col1'中包含部件号或其一部分，在“ col2”中包含我要放入df ['col2']的值。

import pandas as pd


df = pd.DataFrame({'col1': ['1-1-1', '1-1-2', '1-1-3',
    '2-1-1', '2-1-2', '2-1-3']})

df2 = pd.DataFrame({'col1': ['1-1-1', '1-1-2', '1-1-3', '2-1'],
    'col2': ['A', 'B', 'C', 'D']})

当然是这样：

df['col1'].isin(df2['col1'])

仅涵盖所有匹配的内容，而不包括各个部分：

df['col1'].isin(df2['col1'])
Out[27]: 
0     True
1     True
2     True
3    False
4    False
5    False
Name: col1, dtype: bool

我尝试过：

df[df['col1'].str.contains(df2['col1'])]

但得到：

TypeError: 'Series' objects are mutable, thus they cannot be hashed

我也尝试使用df2制成的字典；使用与上述相同的方法并映射它-没有运气

我需要的df结果如下：

 col1     col2
'1-1-1'    'A'
'1-1-2'    'B'
'1-1-3'    'C'
'2-1-1'    'D'  
'2-1-2'    'D'  
'2-1-3'    'D'

我不知道如何将'D'值转换为'col2'，因为df2 ['col1']包含'2-1'-仅零件号的一部分。

任何帮助将不胜感激。预先谢谢你。

Answer 1

我们可以str.findall

s=df.col1.str.findall('|'.join(df2.col1.tolist())).str[0].map(df2.set_index('col1').col2)

df['New']=s

df
    col1 New
0  1-1-1   A
1  1-1-2   B
2  1-1-3   C
3  2-1-1   D
4  2-1-2   D
5  2-1-3   D

Answer 2

如果您的df和df2像示例中那样是特定格式，则另一种方法是通过从map映射而将字典fillna与rsplit一起使用< / p>

d = dict(df2[['col1', 'col2']].values)
df['col2'] = df.col1.map(d).fillna(df.col1.str.rsplit('-',1).str[0].map(d))

Out[1223]:
    col1 col2
0  1-1-1    A
1  1-1-2    B
2  1-1-3    C
3  2-1-1    D
4  2-1-2    D
5  2-1-3    D

否则，除了在Wen的解决方案中使用findall之外，您还可以从上方将extract与字典d一起使用

df.col1.str.extract('('+'|'.join(df2.col1)+')')[0].map(d)

熊猫在另一列中映射所有和部分列值

2 个答案: