如何在python中的列表列表中查找特定单词

时间:2015-04-04 06:03:58

标签: python

我必须找到一个单词是否在列表中,如果在列表中找到,那么文件将使用标记#34; 1" else文件将使用标记" 0"来编写列表。我的python代码在下面遇到 TypeError的错误:只能连接列表(不是" str")到列表

f2 = open("C:/Python26/Semantics.txt",'w')
sem = ["cells", "gene","factor","alpha", "receptor", "t","promotor"];
with open("C:/Python26/trigram.txt") as f:
contents = f.readlines()
for lines in contents:
    tokens = lines.split('$')
    for t in tokens:
        if t.strip() in sem:
            f2.write(tokens+"\t"+"1 \n");
        else:
            f2.write(tokens+"\t"+"0 \n");
f2.close()

我的文件如下所示:

IL-2$gene$expression$and
IL-2$gene$expression$and$NF-kappa
IL-2$gene$expression$and$NF-kappa$B
IL-2$gene$expression$and$NF-kappa$B$activation
gene$expression$and$NF-kappa$B$activation$through
expression$and$NF-kappa$B$activation$through$CD28

我想要的输出

IL-2 gene expression and    1
IL-2 gene expression and NF-kappa   1
IL-2 gene expression and NF-kappa B   1
IL-2 gene expression and NF-kappa B activation   1
gene expression and NF-kappa B activation through   1
expression and NF-kappa B activation through CD28   0

如果我想生成输出

Token                                            cells   gene    factor……. promoter   
IL-2 gene expression and                          0       1       0     ………       0 
IL-2 gene expression and NF-kappa                 0       1       0     ………       0
IL-2 gene expression and NF-kappa B               0       1       0     ………       0
IL-2 gene expression and NF-kappa B activation    0       1       0     ………       0
gene expression and NF-kappa B activation through 0       1       0     ………       0  
expression and NF-kappa B activation through CD28 0       0       0     ………       0

我认为代码需要稍作修改

1 个答案:

答案 0 :(得分:1)

尝试这样:

sem = ["cells", "gene","factor","alpha", "receptor", "t","promotor"]
with open("C:/Python26/trigram.txt") as f, open("C:/Python26/Semantics.txt",'w') as f2:
    for x in f:
        x = x.strip().split("$")
        print " ".join(x), len(set(sem) & set(x))
        f2.write("{} {}\n".format(" ".join(x), len(set(sem) & set(x))))

或写入文件而不是将其打印到控制台

f2.write("{} {}\n".format(" ".join(x), len(set(sem) & set(x))))

输出:

IL-2 gene expression and 1
IL-2 gene expression and NF-kappa 1
IL-2 gene expression and NF-kappa B 1
IL-2 gene expression and NF-kappa B activation 1
gene expression and NF-kappa B activation through 1
expression and NF-kappa B activation through CD28 0
  

" ".join(x), len(set(sem) & set(x))

的说明

" " .join(x):这将加入由空格分隔的列表

len(set(sem)& set(x)):set将为您提供没有重复元素的列表,set(sem)& set(x)与math set和operation相同,只会给你两个列表中的匹配元素,然后我有列表的takne长度