我有一个df,
Name Step Description
Ram 1 Ram is oNe of the good cricketer
Ram 2 gopal one
Sri 1 Sri is one of the member
Sri 2 ravi good
Kumar 1 Kumar is a keeper
Madhu 1 good boy
Vignesh 1 oNe little
Pechi 1 one book
mario 1 good randokm
Roger 1 one milita good
bala 1 looks good
raj 1 more one
venk 1 likes good
和一个清单,
my_list=["one","good"]
我正在尝试从my_list中获取至少有一个关键字的行。
我试过了, 掩模= DF [ “描述”] str.contains( “|”。加入(my_list),NA =假)。 我得到了output_df,
Name Description
Ram Ram is one of the good cricketer
Sri Sri is one of the member
我还想将“描述”中的关键字及其计数添加到单独的列中,
当df [“Name”]不是第一次出现时,甚至“描述”都包含一个关键字,它不应该复制键列中的关键字我想要的输出是,
my_desired输出是,
Name Step Description keys count
Ram 1 Ram is one of the good cricketer one,good 2
Ram 2 gopal one
Sri 1 Sri is one of the member one 1
Sri 2 ravi good
Kumar 1 Kumar is a keeper
Madhu 1 good boy good 1
Vignesh 1 oNe little oNe 1
Pechi 1 one book one 1
mario 1 good randokm good good 1
Roger 1 one milita good one,good 2
bala 1 looks good good 1
raj 1 more one one 1
venk 1 likes good good 1
答案 0 :(得分:1)
创建新面具并应用它:
my_list=["one","good"]
mask=df["Description"].str.contains("|".join(my_list),na=False,flags=re.IGNORECASE ) & \
(df.groupby('Name').cumcount() == 0)
print (mask)
0 True
1 False
2 True
3 False
4 False
5 True
6 True
7 True
8 True
9 True
10 True
11 True
12 True
dtype: bool
extracted = df['Description'].str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE)
df.loc[mask, 'keys'] = extracted.str.join(',')
df.loc[mask, 'count'] = extracted.str.len()
print (df)
Name Step Description keys count
0 Ram 1 Ram is oNe of the good cricketer oNe,good 2.0
1 Ram 2 gopal one NaN NaN
2 Sri 1 Sri is one of the member one 1.0
3 Sri 2 ravi good NaN NaN
4 Kumar 1 Kumar is a keeper NaN NaN
5 Madhu 1 good boy good 1.0
6 Vignesh 1 oNe little oNe 1.0
7 Pechi 1 one book one 1.0
8 mario 1 good randokm good 1.0
9 Roger 1 one milita good one,good 2.0
10 bala 1 looks good good 1.0
11 raj 1 more one one 1.0
12 venk 1 likes good good 1.0
编辑:
#transform all values if need same size of original
s = df.groupby('Name')['Description'].transform(','.join)
print (s)
0 Ram is oNe of the good cricketer,gopal one
1 Ram is oNe of the good cricketer,gopal one
2 Sri is one of the member,ravi good
3 Sri is one of the member,ravi good
4 Kumar is a keeper
5 good boy
6 oNe little
7 one book
8 good randokm good
9 one milita good
10 looks good
11 more one
12 likes good
Name: Description, dtype: object
#for mask use new Series s
mask=s.str.contains("|".join(my_list),na=False,flags=re.IGNORECASE ) & \
(df.groupby('Name').cumcount() == 0)
print (mask)
0 True
1 False
2 True
3 False
4 False
5 True
6 True
7 True
8 True
9 True
10 True
11 True
12 True
dtype: bool
#extract from new Series s
extracted = s.str.findall('(' + '|'.join(my_list) + ')', flags=re.IGNORECASE).apply(set)
df.loc[mask, 'keys'] = extracted.str.join(',')
df.loc[mask, 'count'] = extracted.str.len()
print (df)
Name Step Description keys count
0 Ram 1 Ram is oNe of the good cricketer good,oNe,one 3.0
1 Ram 2 gopal one NaN NaN
2 Sri 1 Sri is one of the member good,one 2.0
3 Sri 2 ravi good NaN NaN
4 Kumar 1 Kumar is a keeper NaN NaN
5 Madhu 1 good boy good 1.0
6 Vignesh 1 oNe little oNe 1.0
7 Pechi 1 one book one 1.0
8 mario 1 good randokm good good 1.0
9 Roger 1 one milita good good,one 2.0
10 bala 1 looks good good 1.0
11 raj 1 more one one 1.0
12 venk 1 likes good good 1.0