Question

我只想从该数据集中提取名词：

        Text1                                      Text2
        
see if your area is affected afte...     public health england have confir...
'i had my throat scraped'.               i have been producing some of our...
drive-thru testing introduced at w...   “a painless throat swab will be t...
live updates as first case confirm...    the first case  in ...

hampton hill medical centre              love is actually just ...
berkshire: public health england a...    an official public health england...

我需要在Text2中应用POS以便仅提取ADV。我做了如下

ans=[]
for x in 
    tagger = treetaggerwrapper.TreeTagger(TAGLANG="en", TAGDIR='path')
    tags = tagger.tag_text(x)
    ans.append(tags)
    pprint(treetaggerwrapper.make_tags(tags))

但是我没有包括该列，因为我不知道我应该放什么（e.g. df['Text 2'].tolist()）

我需要从文本中提取副词并将其添加到新的数组/空列表中。我希望你能帮助我

Answer 1

我更喜欢通过Google Colab进行spAcy这样的工作。通常，我更喜欢使用spAcy来完成此类任务。

如果您想在看到我的答案之前先尝试一下，请看这里。 https://spacy.io/usage/linguistic-features

如果可以，可以点安装...

        # Please open this notebook in playground mode (File -> Open in playground mode) and then run this block first to download the spaCy model you will be using
    !pip install spacy
    !python -m spacy download en_core_web_sm

我们在这里仅使用Pandas和spAcy，不需要其他软件包。

import pandas as pd
import spacy

重新创建DF

list1 = '''see if your area is affected afte... 
'i had my throat scraped'. drive-thru testing introduced at w... 
live updates as first case confirm...'''


list2 = '''hampton hill medical centre             
berkshire: public health england a...   

public health england have confir...
i have been producing some of our...
a painless throat swab will be t...
the first case  in ...
love is actually just ...
an official public health england...'''

df = pd.DataFrame([[list1, list2]], columns = ['Text1', 'Text2'])

获取字符串，并初始化spAcy

string = df.iloc[0,1]
nlp = spacy.load("en_core_web_sm")

接下来，我将所有内容都写到了这里。

def list_adv(string):
    '''
    input: list_adv will perform named entity recongition on the input
    return: adv will be a list of all adverbs from the input
    '''
    # have to tell spacy we are doing NLP on the input data
    doc = nlp(string)

    # Blank list to append adverbs to as we search
    adv = []

    # For all named entites in the document
    for token in doc:

      # if the named entity is a adverb, append it
      if token.pos_ == 'ADV':
        adv.append(token.text)

      # if not, skip it
      else:
        continue
      
    # Return the final product
    return adv

adv_list = list_adv(string)

最终产品将在您提出问题时提供副词列表！

POS使用列（以熊猫为单位）

1 个答案: