我必须找到一个单词是否在列表中,如果在列表中找到,那么文件将使用标记#34; 1" else文件将使用标记" 0"来编写列表。我的python代码在下面遇到 TypeError的错误:只能连接列表(不是" str")到列表
f2 = open("C:/Python26/Semantics.txt",'w')
sem = ["cells", "gene","factor","alpha", "receptor", "t","promotor"];
with open("C:/Python26/trigram.txt") as f:
contents = f.readlines()
for lines in contents:
tokens = lines.split('$')
for t in tokens:
if t.strip() in sem:
f2.write(tokens+"\t"+"1 \n");
else:
f2.write(tokens+"\t"+"0 \n");
f2.close()
我的文件如下所示:
IL-2$gene$expression$and
IL-2$gene$expression$and$NF-kappa
IL-2$gene$expression$and$NF-kappa$B
IL-2$gene$expression$and$NF-kappa$B$activation
gene$expression$and$NF-kappa$B$activation$through
expression$and$NF-kappa$B$activation$through$CD28
我想要的输出
IL-2 gene expression and 1
IL-2 gene expression and NF-kappa 1
IL-2 gene expression and NF-kappa B 1
IL-2 gene expression and NF-kappa B activation 1
gene expression and NF-kappa B activation through 1
expression and NF-kappa B activation through CD28 0
如果我想生成输出
Token cells gene factor……. promoter
IL-2 gene expression and 0 1 0 ……… 0
IL-2 gene expression and NF-kappa 0 1 0 ……… 0
IL-2 gene expression and NF-kappa B 0 1 0 ……… 0
IL-2 gene expression and NF-kappa B activation 0 1 0 ……… 0
gene expression and NF-kappa B activation through 0 1 0 ……… 0
expression and NF-kappa B activation through CD28 0 0 0 ……… 0
我认为代码需要稍作修改
答案 0 :(得分:1)
尝试这样:
sem = ["cells", "gene","factor","alpha", "receptor", "t","promotor"]
with open("C:/Python26/trigram.txt") as f, open("C:/Python26/Semantics.txt",'w') as f2:
for x in f:
x = x.strip().split("$")
print " ".join(x), len(set(sem) & set(x))
f2.write("{} {}\n".format(" ".join(x), len(set(sem) & set(x))))
或写入文件而不是将其打印到控制台
f2.write("{} {}\n".format(" ".join(x), len(set(sem) & set(x))))
输出:
IL-2 gene expression and 1
IL-2 gene expression and NF-kappa 1
IL-2 gene expression and NF-kappa B 1
IL-2 gene expression and NF-kappa B activation 1
gene expression and NF-kappa B activation through 1
expression and NF-kappa B activation through CD28 0
的说明
" ".join(x), len(set(sem) & set(x))
" " .join(x):这将加入由空格分隔的列表
len(set(sem)& set(x)):set将为您提供没有重复元素的列表,set(sem)& set(x)与math
set和operation相同,只会给你两个列表中的匹配元素,然后我有列表的takne长度