这是我的数据集
No Description
1 Paying Google ads
2 Purchasing Facebook Ads
3 Purchasing Ads
4 AirBnB repayment
我有txt
个文件,名为entity.txt
0, Google
1, Facebook
2, Ads
我需要的是检测数据框中entity.txt
上的所有关键字,只有一个或多个关键字,如果没有检测到一个关键字,我们将其称为Other
,因此我的输出期望是:
No Description Keyword
1 Paying Google ads Google
2 Purchasing Facebook Ads Facebook Ads
3 Purchasing LinkedIn Ads LinkedIn Ads
4 AirBnB repayment Other
这就是我所做的
with open('entity.txt') as f:
content = f.readlines()
content = [x.strip() for x in content ]
df['keyword'] = df['description'].apply(lambda x: ' '.join([i for i in content if i in x]))
df['keyword'] = df['keyword'].replace('', 'Other')
但是,结果是
No Description Keyword
1 Paying Google ads Other
2 Purchasing Facebook Ads Other
3 Purchasing LinkedIn Ads Other
4 AirBnB repayment Other
答案 0 :(得分:3)
使用str.findall
将http://user:pass@host:port/path
中的所有值提取到列表中,然后将空列表转换为df1
,所有填充的列表都以str.join
进行空格连接:
Other
您的解决方案:
df1 = pd.DataFrame({'entity':['Google','Facebook','Ads']})
s = df['Description'].str.findall(r'({})'.format('|'.join(df1['entity'])))
df['Keyword'] = np.where(s.astype(bool), s.str.join(' '), 'Other')
print (df)
No Description Keyword
0 1 Paying Google ads Google
1 2 Purchasing Facebook Ads Facebook Ads
2 3 Purchasing LinkedIn Ads Ads
3 4 AirBnB repayment Other
替代:
s = df['Description'].apply(lambda x: [i for i in set(df1['entity']) if i in x])
df['Keyword'] = np.where(s.astype(bool), s.str.join(' '), 'Other')
print (df)
No Description Keyword
0 1 Paying Google ads Google
1 2 Purchasing Facebook Ads Facebook Ads
2 3 Purchasing LinkedIn Ads Ads
3 4 AirBnB repayment Other
答案 1 :(得分:2)
使用findall
df.Description.str.findall(('|'.join(s.tolist()))).str[0]
0 Google
1 Facebook
2 Ads
3 NaN
Name: Description, dtype: object
df['Keyword']=df.Description.str.findall(('|'.join(s.tolist()))).str[0]
数据输入
s
0 Google
1 Facebook
2 Ads
Name: s, dtype: object
答案 2 :(得分:2)
使用str.extract()
df['Keyword']=df.Description.str.extract(r'({})'.format('|'.join(df1[1],)))
print(df)
No Description Keyword
0 1 Paying Google ads Google
1 2 Purchasing Facebook Ads Facebook
2 3 Purchasing LinkedIn Ads Ads
3 4 AirBnB repayment NaN