我正在尝试使用可能包含的某些正则表达式字符串按主题标记某些关键字。理想情况下,这会将“类别”列添加到数据框,该列带有其所属的标签,或者如果找不到则添加“其他”。
我要标记的数据基本上如下所示:
| Keyword | Volume |
|:-----------|------------:|
| audi specs | 4000 |
| bmw width | 170 |
| a45 bhp | 30 |
| a1 length | 210 |
| alfa co2 | 10 |
我当前得到的代码是:
import pandas as pd
import numpy as np
import re
from IPython.display import display
df = pd.read_csv("make-model-keywords.csv")
df = pd.DataFrame(df, columns=['Keyword', 'Volume','Keyword Difficulty','CPC (USD)', 'SERP Features'])
tags = [
{
"name": "Dimensions",
"regex": "dimension|width|height|length|size"
},
{
"name": "MPG",
"regex": "mpg|co2|emission|consumption|running|economy|fuel"
},
{
"name": "Specs",
"regex": "spec|specification|torque|bhp|weight|rpm|62|mph|kmh"
}
]
def basic_tagging(string, tags):
for tag in tags:
if re.match(tag['regex'], row['Keyword']):
return tag['name']
else:
return "other"
df['Category'] = df.apply(lambda x: basic_tagging(x['Keyword'], tags), axis=1)
但是它给了我以下错误:
---------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-71-31890ef48022> in <module>()
----> 1 df['Category'] = df.apply(lambda row: basic_tagging(row['Keyword'], tags), axis=1)
2 df.head()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)
6012 args=args,
6013 kwds=kwds)
-> 6014 return op.get_result()
6015
6016 def applymap(self, func):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self)
140 return self.apply_raw()
141
--> 142 return self.apply_standard()
143
144 def apply_empty_result(self):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
246
247 # compute the result using the series generator
--> 248 self.apply_series_generator()
249
250 # wrap results
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
275 try:
276 for i, v in enumerate(series_gen):
--> 277 results[i] = self.f(v)
278 keys.append(v.name)
279 except Exception as e:
<ipython-input-71-31890ef48022> in <lambda>(row)
----> 1 df['Category'] = df.apply(lambda row: basic_tagging(row['Keyword'], tags), axis=1)
2 df.head()
<ipython-input-68-1867110ca579> in basic_tagging(string, tags)
1 def basic_tagging(string, tags):
2 for tag in tags:
----> 3 if re.match(tag['regex'], row['Keyword']):
4 return tag['name']
5 else:
NameError: ("name 'row' is not defined", 'occurred at index 0')
我显然缺少某些东西吗?
答案 0 :(得分:1)
将功能更改为此:
def basic_tagging(row):
for tag in tags:
if re.match(tag['regex'], row['Keyword']):
return tag['name']
else:
return "other"
然后:
df['Category'] = df.apply(basic_tagging, axis=1)