我在每个单词上都有一些带有标签的文本。这是文本的样子:
text = "Wednesday/PROPN evening/NOUN to/PART reject/VERB a/DET no/DET -/PUNCT deal/NOUN Brexit/PROPN under/ADP any/DET circumstances/NOUN ./PUNCT No/DET date/NOUN has/VERB yet/ADV ./PUNCT Saturday/NOUN"
我想收集列表中PUNCT标签之后的所有名词和PROPN频率。我有一个字典,但是我想分离所需的值并将其添加到列表中。到目前为止,代码看起来像这样:
dictionary = {}
for w in text1:
words = w.split('/')
dictionary[words[0]] = words[1]
dictlist = []
for key, value in dictionary.items():
if value == "PUNCT" #HERE is the problem. I want something like this: If the value is PUNCT and NOUN is the next value then append it to the list
temp = [key, value]
temp.append(temp)
希望您能理解我的问题!
答案 0 :(得分:0)
Python字典在3.7之前的Python版本中没有排序,因此,即使您将两个新的键值对一个接一个地放入字典中,它们也不会在字典中并排出现。在python模块中添加OrderedDict数据结构或更新Python是在python中添加有序词典的一种选择。
import collections
dictionary = collections.OrderedDict()
# The rest of your code here
答案 1 :(得分:0)
要获得列表中PUNCT标签之后的名词和PROPN频率,您可以使用以下代码而无需字典。
word_tag_list = [word.split('/') for word in text.split(' ')]
propn_freq, noun_freq = [], []
for i, word_tag in enumerate(word_tag_list):
if word_tag[1] == "PUNCT" and i + 1 < len(word_tag_list):
next_tag = word_tag_list[i + 1][1]
if next_tag == "NOUN": noun_freq.append(word_tag_list[i + 1])
elif next_tag == "PROPN": propn_freq.append(word_tag_list[i + 1])
如果我理解正确,这将产生所需的输出
>>> noun_freq
[['deal', 'NOUN'], ['Saturday', 'NOUN']]