Question

真的希望得到一些帮助，因为我已经试图实现它的目标。

我有一个DataFrame：

    PagePath                        Source
0   /product/123/sometext           (Other)
1   /product/234?someutminfo        (Other)
2   /product/112?whatever       (Other)

A aslo有另一个产品路径短的数据框：

    Path           Other stuff
0   /product/123   Foo
1   /product/234   Bar
2   /product/345   Buzz
3   /product/456   Lol

我需要的是在第一个df中创建一个与第二个df匹配的新列，以便它包含短路径（如果有的话）。

到目前为止，我设法做了以下事情：

1）通过对其进行分组来创建第二个df的系列

2）使用第二个

中的列表迭代第一个df

df1['newcol'] = df1['PagePath'].str.contains('|'.join(list_from_df2))

根据是否找到匹配，我给了一个True / False列。

我理解我需要做的是从第一个df迭代每一行，遍历列表的每个值并在找到匹配时返回它。但是，如果我能为它编写适当的代码。我真的希望得到你的帮助。

Answer 1

自己解决了这个问题：

首先我们定义一个函数：

def return_match(row):
    try:
        return re.search(r'/product/.+-\d+/', row).group(0)
    except:
        return 'Not a product'

然后我们在必要的列上应用一个函数：

df['newcol'] = df['PagePath'].apply(return_match)

添加一个值包含特定列值的列

1 个答案: