如何使用spaCy在句子中标记动词?蟒蛇

时间:2018-06-03 13:18:22

标签: python-3.x pandas spacy

我想通过添加' X'来标记句子中的动词。在动词一词的末尾,如Well i have solved the problem by doing the code this way ` public function numbers() { $url = "https://www.lotteryinformation.us/apps/freq-chart.php?state=FL&game=MUPB&tb_state=&tb_links=&tb_country=US&tb_lang=0&adsurl=&tbsite=&d=."; $client = new Client(); $crawler = $client->request('GET', $url); $tr_elements = $crawler->filterXPath('//body/center/table/tr/td'); // iterate over filter results foreach ($tr_elements as $content) { $tds = array(); // create crawler instance for result $crawler = new Crawler($content); //iterate again /*$links_count =$crawler->filter('tr[valign="middle"]')->count(); return $links_count;*/ foreach ($crawler->filter('tr[valign="middle"]') as $node) { $crawler = new Crawler($node); foreach ($crawler->filter('td[class="td0"]')->eq(0) as $node) { $rank = $crawler->filter('td[class="td0"]')->eq(1)->text(); $tds[] = $rank; $hit = $crawler->filter('td[class="td0"]')->eq(2)->text(); $tds[] = $hit; $frequency_table = new FrequencyTable; $frequency_table->rank = $rank; $frequency_table->hit= $hit; $frequency_table->save(); /* $tds[] = $node->nodeValue;*/ } } dd($tds); } }` and it is what exactly i wanted. Thank you.

SpaCy将标签分配给Python不单独索引的句子元素。例如,spaCy会看到一个括号'('或完全停止在一个单词后面。'作为一个单独的位置,而Python则不会。因此,标签的索引不能用于将X可靠地插入到句子中。以下函数通过从标记重构句子来工作。但是,它只允许我在动词的开头插入X.

有没有办法将X粘贴到动词词的末尾,就像这个verbX一样? (动词和X之间没有空格。)

verbX

这给出了:

import pandas as pd
import spacy
nlp = spacy.load('en')

s = "Dr. John (a fictional chartacter) never shakes hands."
df = pd.DataFrame({'sentence':[s]})
k = df['sentence']

def marking(row):
    chunks = []
    for token in nlp(row):
        if token.tag_ == 'VBZ':
            chunks.append('X')
        chunks.append(token.text_with_ws)
    L = "".join(chunks)
    return L
x = k.apply(marking)
print(x)   

我怎么能得到这个?

"Dr. John (a fictional chartacter) never Xshakes hands."

1 个答案:

答案 0 :(得分:3)

问题在于您执行操作的顺序,以达到您想要的结果:

def marking(row):
    chunks = []
    for token in nlp(row):
        chunks.append(token.text_with_ws) #Append word first
        if token.tag_ == 'VBZ':
            chunks.append('X')            #Append 'X' second
    L = "".join(chunks)
    return L

要将'X'直接附加到动词上,将任何尾随空格移动到末尾,请使用以下逻辑:

def marking(row):
    chunks = []
    for token in nlp(row):
        if token.tag_ == 'VBZ':
            chunks.append(token.text + 'X' + token.whitespace_)
        else:
            chunks.append(token.text_with_ws)
    L = "".join(chunks)
    return L