Question

我在Python（2.x）中有一个小标记脚本。我试图用字典标记语料库的每一行。脚本通常表现良好，但我正在寻找略有不同结果。

代码就像，

def tag_corpus():
    corpus1=open("Corpus1.txt","r")
    dict1=open("Dictnew1.txt","r")
    dictw=dict1.read().lower().split()
    list1=[]
    for line in corpus1:
        linew=line.lower().split()
        for word in linew:
            if word in dictw:
                word_i=dictw.index(word)
                word_i1=word_i+1
                tag=dictw[word_i1]
                str1=word+"/"+tag
                list1.append(str1)
            else:
                str2=word+"/"+"NA"
                list1.append(str2)
    str3=" ".join(list1)
    print str3

＆＃34; Corpus1.txt＆＃34;的内容是，

  London is situtated over Thames . 
  London is a village near Burgundy . 
  London is situated near Ontario .

和＆＃34; Dictnew1.txt＆＃34;是的，

伦敦LOC Thames LOC 勃艮第LOC 安大略省LOC

结果即将到来，

london/loc is/NA situtated/NA over/NA thames/loc ./NA london/loc is/NA a/NA village/NA near/NA burgundy/loc ./NA london/loc is/NA situated/NA near/NA ontario/loc ./NA

但我正在寻找标记字符串的输出，因为它打印出字符串，喜欢

london is situtated over thames .
  london/loc is/NA situtated/NA over/NA thames/loc .

如果有人可能会建议。

Answer 1

这会产生您期望的输出吗？

def tag_corpus():
    corpus1=open("Corpus1.txt","r")
    dict1=open("Dictnew1.txt","r")
    dictw=dict1.read().lower().split()
    for line in corpus1:
        list1=[]
        linew=line.lower().split()
        for word in linew:
            if word in dictw:
                word_i=dictw.index(word)
                word_i1=word_i+1
                tag=dictw[word_i1]
                str1=word+"/"+tag
                list1.append(str1)
            else:
                str2=word+"/"+"NA"
                list1.append(str2)
        str3=" ".join(list1)
        print line
        print str3

在Python中操作列表

1 个答案: