我有以下元组
text =[('Michael', 'PERSON'), ('Jordan', 'PERSON'), ("'s", 'O'), ('legacy', 'O'), ('in', 'O'), ('the', 'O'), ('90', 'O'), ("'s", 'O'), ('shows', 'O'), ('that', 'O'), ('he', 'O'), ('was', 'O'), ('the', 'O'), ('biggest', 'O'), ('player', 'O'), ('ever', 'O'), ('in', 'O'), ('the', 'O'), ('NBA', 'ORGANIZATION'), ('.', 'O')]
最初的句子是“迈克尔·乔丹(Michael Jordan)在90年代的表演中显示出他是NBA最大的球员。”
我需要删除归类为“ PERSON”的元素
我所做的
new_text = [x for x in text if x[1] != "PERSON"]
sentence= " ".join(x[0] for x in new_text)
print(sentence)
我得到的输出是
's legacy in the 90 's shows that he was the biggest player ever in the NBA .
请在开始时注意"'s"
。
现在我被困住了,因为我需要在成为"PERSON"
之前删除该元素上有条件的“'s”元素。在此示例中,有2个"'s"
,但是我只想删除在"PERSON"
之后的那个。有什么建议吗?
感谢您的任何输入。
答案 0 :(得分:2)
一种方法是使用text
遍历zip
及其转换版本,并根据以下条件保留字符串:
out = []
for i,j in zip(text[:-1], text[1:]):
if j[0] == "'s":
if i[1] == 'PERSON':
continue
else:
out.append(j[0])
else:
if i[1] != 'PERSON':
out.append(j[0])
' '.join(out)
"legacy in the 90 's shows that he was the biggest player ever in the NBA ."
答案 1 :(得分:1)
在这里使用简单的for循环会更容易。请注意,enumerate
用于检索前一个元素(text[pos-1]
),但是,只有在存在前一个元素(pos > 0
)时才能这样做。
#!/usr/bin/env python3
text =[('Michael', 'PERSON'), ('Jordan', 'PERSON'), ("'s", 'O'), ('legacy', 'O'), ('in', 'O'), ('the', 'O'), ('90', 'O'), ("'s", 'O'), ('shows', 'O'), ('that', 'O'), ('he', 'O'), ('was', 'O'), ('the', 'O'), ('biggest', 'O'), ('player', 'O'), ('ever', 'O'), ('in', 'O'), ('the', 'O'), ('NBA', 'ORGANIZATION'), ('.', 'O')]
new_text = []
for pos, (word, type_) in enumerate(text):
if type_ == "PERSON":
# we ignore words of type PERSON
continue
if word == "'s" and pos > 0 and text[pos-1][1] == "PERSON":
# ignore 's if the previous word was of type PERSON
continue
new_text.append((word, type_))
sentence= " ".join(x[0] for x in new_text)
print(sentence)shows
执行此脚本将产生以下文本:
legacy in the 90 's shows that he was the biggest player ever in the NBA .
答案 2 :(得分:1)
如果找到range
,您可以只使用O
并向后看:
text =[('Michael', 'PERSON'), ('Jordan', 'PERSON'), ("'s", 'O'), ('legacy', 'O'), ('in', 'O'), ('the', 'O'), ('90', 'O'), ("'s", 'O'), ('shows', 'O'), ('that', 'O'), ('he', 'O'), ('was', 'O'), ('the', 'O'), ('biggest', 'O'), ('player', 'O'), ('ever', 'O'), ('in', 'O'), ('the', 'O'), ('NBA', 'ORGANIZATION'), ('.', 'O')]
filtered_text = []
for idx in range(len(text)):
if text[idx][1] == "PERSON":
continue
if text[idx][1] == 'O' and idx > 0 and text[idx-1][1] == 'PERSON':
continue
filtered_text.append(text[idx][0])
sentence= " ".join(filtered_text)
print(sentence)
答案 3 :(得分:1)
text = [('', j[1]) if j[0] == "'s" and text[i-1][1]=='PERSON' else j for i, j in enumerate(text)]
print(' '.join([i for i, j in text if j !='PERSON']))
输出
legacy in the 90 's shows that he was the biggest player ever in the NBA .
答案 4 :(得分:1)
我来晚了,但是如果这只是要解决的一个条件,那么这也可以。一个非常简单的补充,您已经拥有的。
text =[('Michael', 'PERSON'), ('Jordan', 'PERSON'), ("'s", 'O'), ('legacy', 'O'), ('in', 'O'), ('the', 'O'), ('90', 'O'), ("'s", 'O'), ('shows', 'O'), ('that', 'O'), ('he', 'O'), ('was', 'O'), ('the', 'O'), ('biggest', 'O'), ('player', 'O'), ('ever', 'O'), ('in', 'O'), ('the', 'O'), ('NBA', 'ORGANIZATION'), ('.', 'O')]
new_text = [x for idx, x in enumerate(text) if x[1] != "PERSON" and not (idx > 1 and text[idx - 1][1] == "PERSON" and x[0] == "'s")]
sentence= " ".join(x[0] for x in new_text)
print(sentence)
输出如下
"legacy in the 90 's shows that he was the biggest player ever in the NBA ."