我有一个df,
Name Description
Ram Ram is one of the good cricketer
Sri Sri is one of the member
Kumar Kumar is a keeper
和一份清单, my_list = ["一个""良好""拉维""球"]
我正在尝试从my_list中获取至少有一个关键字的行。
我试过了,
mask=df["Description"].str.contains("|".join(my_list),na=False)
我收到了output_df,
Name Description
Ram Ram is one of ONe crickete
Sri Sri is one of the member
Ravi Ravi is a player, ravi is playing
Kumar there is a BALL
我还想添加"描述"中的关键字。并将其统计在一个单独的列中,
我想要的输出是,
Name Description pre-keys keys count
Ram Ram is one of ONe crickete one,good,ONe one,good 2
Sri Sri is one of the member one one 1
Ravi Ravi is a player, ravi is playing Ravi,ravi ravi 1
Kumar there is a BALL ball ball 1
答案 0 :(得分:4)
使用str.findall
+ str.join
+ str.len
:
extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')')
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
Name Description keys count
0 Ram Ram is one of the good cricketer one,good 2
1 Sri Sri is one of the member one 1
编辑:
import re
my_list=["ONE","good"]
extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE)
df['keys'] = extracted.str.join(',')
df['count'] = extracted.str.len()
print (df)
Name Description keys count
0 Ram Ram is one of the good cricketer one,good 2
1 Sri Sri is one of the member one 1
答案 1 :(得分:1)
使用str.findall
拍摄此照片。
c = df.Description.str.findall('({})'.format('|'.join(my_list)))
df['keys'] = c.apply(','.join) # or c.str.join(',')
df['count'] = c.str.len()
df[df['count'] > 0]
Name Description keys count
0 Ram Ram is one of the good cricketer one,good 2
1 Sri Sri is one of the member one 1