Question

我有一个名为df_companies的数据框。

Òutput: 
    company     brand 
0   VW-Konzern  volkswagen
1   VW-Konzern  audi
2   VW-Konzern  bentley
3   VW-Konzern  bugatti
4   VW-Konzern  lamborghini

在下一步中，我会通过某种字符串格式在for循环中接收两个字符串。之后，我尝试检查数据帧“ df_companies”的“品牌”列中是否包含“ companyName”字符串。

如果是，则将两个数据框df_companies中的logo_url字符串添加到“ image_url”列中。

for image in images:
    companyName = image['alt'].lower().split(' ', 1)[0]
    logo_url = image['src']

    df_companies['image_url'] = np.where(df_companies['brand'].str.contains(companyName), logo_url, 'other')

到目前为止，这对于第一行仍然有效。对于其余的其余行，它仅输入如上定义的字符串“ other”。

Output: 
        company     brand       image_url
0       VW-Konzern  volkswagen  https://imgr.volkswagen.png
1       VW-Konzern  audi        Other
2       VW-Konzern  bentley     Other
3       VW-Konzern  bugatti     Other
4       VW-Konzern  lamborghini Other

我想要实现的是以下输出：

Output: 
        company     brand       image_url
0       VW-Konzern  volkswagen  https://imgr.volkswagen.png
1       VW-Konzern  audi        https://imgr.audi.png
2       VW-Konzern  bentley     https://imgr.bentley.png
3       VW-Konzern  bugatti     https://imgr.audi.png
4       VW-Konzern  lamborghini https://imgr.audi.png

问题是'companyName'字符串和数据帧'df_companies'的'brand'列中的字符串仅部分匹配，为什么我不能使用通用合并功能。

任何想法如何解决该问题？

在此先感谢您的帮助！

Answer 1

您可以在regex=True方法中使用str.contains，甚至部分名称也应匹配。例如下面的

df['image_url'] = np.where(df['brand'].str.contains('au' , regex=True), 'logo_url', 'other')

输出，我只是使用文本logo_url代替了网址。

company     brand           image_url
VW-Konzern  volkswagen      other
VW-Konzern  audi            logo_url
VW-Konzern  bentley         other
VW-Konzern  bugatti         other
VW-Konzern  lamborghini     other

Answer 2

我可以找到解决方案。我不得不将df_companies['image_url']放入np.where的{{1}}的else条件中。

'other'

然后我得到了预期的输出：

df_companies['image_url'] = ''

    for image in images:

        companyName = image['alt'].lower().split(' ', 1)[0]
        logo_url = image['src']

        df_companies['image_url'] = np.where(df_companies['brand'].str.contains(companyName), logo_url, df_companies['image_url'],)

如何基于str.contains（）合并两个不同的数据框列

2 个答案: