列出对熊猫的理解:如果熊猫中的列包含字符串,则返回包含字符串的新列

时间:2020-04-08 09:07:45

标签: python pandas list-comprehension

我正在尝试使用一种更加Python化的方式编写代码,例如列表理解。在这里,我尝试创建一个新的“ Tag”列,如果根据数据帧news_df_output将该元素包含在Pandas列中,则返回该列表的元素。

news = {'Text':['Nike invests in shoes', 'Adidas invests in t-shirts', 'dog drank water'], 'Source':['NYT', 'WP', 'Guardian']}
news_df = pd.DataFrame(news)
buyer = ['Amazon', "Adidas", 'Walmart', 'Children Place', 'Levi',  'VF']

# news_df['Tag'] = [x for x in buyer if news_df['Text'].str.contains(x) else 'n/a']

output_news = {'Text':['Nike invests in shoes', 'Adidas invests in t-shirts', 'dog drank water'], 'Source':['NYT', 'WP', 'Guardian'], 'Tag':['n/a', 'Adidas', 'n/a']}
news_df_output = pd.DataFrame(output_news)
news_df_output

但是,我的代码返回了无效的语法问题。

这是什么问题?

1 个答案:

答案 0 :(得分:1)

您可以使用|连接正则表达式or的列表值,并使用Series.str.extract

news_df['Tag'] = news_df['Text'].str.extract('(' + '|'.join(buyer) + ')')

print (news_df)
                         Text    Source     Tag
0       Nike invests in shoes       NYT     NaN
1  Adidas invests in t-shirts        WP  Adidas
2             dog drank water  Guardian     NaN

您的所有匹配解决方案都可以通过另一个嵌套列表理解来更改:

news_df['Tag'] = [[y for y in buyer if y in x] for x in news_df['Text']]

print (news_df)
                         Text    Source       Tag
0       Nike invests in shoes       NYT        []
1  Adidas invests in t-shirts        WP  [Adidas]
2             dog drank water  Guardian        []

对于首次匹配,如果没有匹配项,请将nextiter一起用于可能的集合NaN

news_df['Tag'] = [next(iter([y for y in buyer if y in x]), np.nan) for x in news_df['Text']]
print (news_df)
                         Text    Source     Tag
0       Nike invests in shoes       NYT     NaN
1  Adidas invests in t-shirts        WP  Adidas
2             dog drank water  Guardian     NaN