我只想从该数据集中提取名词:
Text1 Text2
see if your area is affected afte... public health england have confir...
'i had my throat scraped'. i have been producing some of our...
drive-thru testing introduced at w... “a painless throat swab will be t...
live updates as first case confirm... the first case in ...
hampton hill medical centre love is actually just ...
berkshire: public health england a... an official public health england...
我需要在Text2中应用POS以便仅提取ADV。我做了如下
ans=[]
for x in
tagger = treetaggerwrapper.TreeTagger(TAGLANG="en", TAGDIR='path')
tags = tagger.tag_text(x)
ans.append(tags)
pprint(treetaggerwrapper.make_tags(tags))
但是我没有包括该列,因为我不知道我应该放什么(e.g. df['Text 2'].tolist()
)
我需要从文本中提取副词并将其添加到新的数组/空列表中。 我希望你能帮助我
答案 0 :(得分:0)
我更喜欢通过Google Colab进行spAcy这样的工作。通常,我更喜欢使用spAcy来完成此类任务。
如果您想在看到我的答案之前先尝试一下,请看这里。 https://spacy.io/usage/linguistic-features
如果可以,可以点安装...
# Please open this notebook in playground mode (File -> Open in playground mode) and then run this block first to download the spaCy model you will be using
!pip install spacy
!python -m spacy download en_core_web_sm
我们在这里仅使用Pandas和spAcy,不需要其他软件包。
import pandas as pd
import spacy
重新创建DF
list1 = '''see if your area is affected afte...
'i had my throat scraped'. drive-thru testing introduced at w...
live updates as first case confirm...'''
list2 = '''hampton hill medical centre
berkshire: public health england a...
public health england have confir...
i have been producing some of our...
a painless throat swab will be t...
the first case in ...
love is actually just ...
an official public health england...'''
df = pd.DataFrame([[list1, list2]], columns = ['Text1', 'Text2'])
获取字符串,并初始化spAcy
string = df.iloc[0,1]
nlp = spacy.load("en_core_web_sm")
接下来,我将所有内容都写到了这里。
def list_adv(string):
'''
input: list_adv will perform named entity recongition on the input
return: adv will be a list of all adverbs from the input
'''
# have to tell spacy we are doing NLP on the input data
doc = nlp(string)
# Blank list to append adverbs to as we search
adv = []
# For all named entites in the document
for token in doc:
# if the named entity is a adverb, append it
if token.pos_ == 'ADV':
adv.append(token.text)
# if not, skip it
else:
continue
# Return the final product
return adv
adv_list = list_adv(string)
最终产品将在您提出问题时提供副词列表!