我一直试图整天解决这个问题而没有成功。
我有一个'原始文件',我们称之为'infile',这是我要编辑的文件。 另外,我有另一个文件作为'字典',我们称之为'inlist'。
以下是infile的例子:
PRMT6 10505 Q96LA8 HMGA1 02829 NP_665906
WDR77 14387 NP_077007 SNRPE 00548 NP_003085
NCOA3 03570 NP_858045 RELA 01241 NP_068810
ITCH 07565 Q96J02 DTX1 03991 NP_004407
并列入名单:
NP_060607 Q96LA8
NP_001244066 Q96J02
NP_077007 Q9BQA1
NP_858045 Q9Y6Q9
我目前的方法是分割各列中的线条,用现有标签分割线条。 目标是阅读infile的每一行并检查一些内容:
这应检索输出:
PRMT6 10505 Q96LA8 HMGA1 02829 Q(...)
WDR77 14387 Q9BQA1 SNRPE 00548 Q(...)
NCOA3 03570 Q9Y6Q9 RELA 01241 Q(...)
ITCH 07565 Q96J02 DTX1 03991 Q(...)
注意:并非所有代码都以Q
开头我尝试过使用while循环,但是没有成功,我很惭愧在这里发布代码(我是编程的新手,所以我不想在这么早就失去动力'游戏')。 解决这个问题的完美方法是:
for line in inlist #, infile: <--- THIS PART! Reading both files, splitting both files, replacing both files...
inlistcolumns = line.split('\t')
infilecolumns = line.split('\t')
if inlistcolumns[0] in infilecolumns[2]:
outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(inlistcolumns[1]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(infilecolumns[5]) + "\n")
elif inlistcolumns[0] in infilecolumns[5]:
outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(infilecolumns[2]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(inlistcolumns[1]) + "\n")
else:
outfile.write('\t'.join(infilecolumns) + '\n')
非常感谢帮助。谢谢!
好的,在Sephallia和Jlengrand的暗示之后我得到了这个:
for line in infile:
try:
# Read lines in the dictionary
line2 = inlist.readline()
inlistcolumns = line.split('\t')
infilecolumns = line.split('\t')
if inlistcolumns[0] in infilecolumns[2]:
outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(inlistcolumns[1]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(infilecolumns[5]))
elif inlistcolumns[0] in infilecolumns[5]:
outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(infilecolumns[2]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(inlistcolumns[1]))
else:
outfile.write('\t'.join(infilecolumns))
except IndexError:
print "End of dictionary reached. Restarting from top."
问题是显然if语句没有完成它们的工作,因为输出文件仍然等于输入文件。我能做错什么?
编辑2:
正如一些人所问,这里是完整的代码:
import os
def replace(infilename, linename, outfilename):
# Open original file and output file
infile = open(infilename, 'rt')
inlist = open(linename, 'rt')
outfile = open(outfilename, 'wt')
# Read lines and find those to be replaced
for line in infile:
infilecolumns = line.split('\t')
line2 = inlist.readline()
inlistcolumns = line2.split('\t')
if inlistcolumns[0] in infilecolumns[2]:
outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(inlistcolumns[1]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(infilecolumns[5]))
elif inlistcolumns[0] in infilecolumns[5]:
outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(infilecolumns[2]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(inlistcolumns[1]))
outfile.write('\t'.join(infilecolumns))
# Close files
infile.close()
inlist.close()
outfile.close()
if __name__ == '__main__':
wdir = os.getcwd()
outdir = os.path.join(wdir, 'results.txt')
outname = os.path.basename(outdir)
original = raw_input("Type the name of the file to be parsed\n")
inputlist = raw_input("Type the name of the libary to be used\n")
linesdir = os.path.join(wdir, inputlist)
linesname = os.path.basename(linesdir)
indir = os.path.join(wdir, original)
inname = os.path.basename(indir)
replace(indir, linesdir, outdir)
print "Successfully applied changes.\nOriginal: %s\nLibrary: %s\nOutput:%s" % (inname, linesname, outname)
要使用的第一个文件是hprdtotal.txt:https://www.dropbox.com/s/hohvlcdqvziewte/hprdmap.txt 第二个是hprdmap.txt:https://www.dropbox.com/s/9hd0e3a8rt95pao/hprdtotal.txt
希望这有帮助。
答案 0 :(得分:1)
Woudl不是那样简单的工作吗?
(关注您的代码段)
for line in infile: # read file 1 one line after the other
try
line2 = inlist.readline() # read a line of file 2
catch Exception:
print "End of file 2 reached"
inlistcolumns = line.split('\t')
infilecolumns = line.split('\t')
if inlistcolumns[0] in infilecolumns[2]:
outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(inlistcolumns[1]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(infilecolumns[5]) + "\n")
elif inlistcolumns[0] in infilecolumns[5]:
outfile.write(str(infilecolumns[0]) + "\t" + str(infilecolumns[1]) + "\t" + str(infilecolumns[2]) + "\t" + str(infilecolumns[3]) + "\t" + str(infilecolumns[4]) + "\t" + str(inlistcolumns[1]) + "\n")
else:
outfile.write('\t'.join(infilecolumns) + '\n')
我真的不明白为什么不首先将文件保存在内存中,然后进行简单的模式研究。 我有正当理由同时读取这两个文件吗? (文件1的第45行是否与文件2的第45行匹配?)
答案 1 :(得分:1)
您需要做的是先将inlist
文件读入内存,以便进行检查。
initems = []
for line in inlist:
split = line.split()
t = tuple(split[0], split[1])
initems.append(t)
firstItems = dict(initems)
secondItems = [x[1] for x in initems]
这将为您提供数据。然后打开你的infile并通读它,检查你的数据。
for line in infile:
split = line.split('\t')
if split[2] in firstItems.keys():
split[2] = firstItems[split[2]] # proper field position
if split[5] in firstItems.keys():
split[5] = firstItems[split[5]] # proper field position
outfile.write('\t'.join(split)+'\n')
答案 2 :(得分:1)
我建议将inlist
作为查找表加载到内存中 - 这是Python中的dict
并循环infile
并使用查找表来决定是否要替换。
我不是100%确定我的逻辑是正确的,但它是你可以建立的基础。
import csv
lookup = {}
uniq2nd = set()
with open('inlist') as f:
tabin = csv.reader(f, delimiter='\t')
for c1, c2 in tabin:
lookup[c1] = c2
uniq2nd.add(c2)
with open('infile') as f, open('outfile', 'wb') as fout:
tabin = csv.reader(f, delimiter='\t')
tabout = csv.writer(fout, delimiter='\t')
for row in csv.reader(tabin):
if row[2] not in uniq2nd: # do nothing if col2 of inlist
row[2] = lookup.get(row[2], row[2]) # replace or keep same
# etc...
csvout.writerow(row)
答案 3 :(得分:1)
#!/usr/bin/python
inFile = open("file1.txt")
inList = open("file2.txt")
oFile = open("output.txt", "w")
entry = {}
dictionary = {}
# Creates the dict for inFile
for line in inFile:
lineData = line.split('\t')
data = []
for element in lineData:
element = element.strip()
data.append(element)
entry[lineData[0]] = data
# Creates the dict for inList
for line in inList:
lineData = line.split('\t')
dictionary[lineData[0].strip()] = lineData[1].strip()
# Applies transformation to inFile
for item in entry.values():
if item[2].startswith("-"):
item[2] = item[2][1:-1]
for key in dictionary.items():
if item[2] == key[0]:
item[2] = key[1]
item[5] = item[2]
# Writes the output file
for item in entry.values():
for element in item:
oFile.write(str(element))
oFile.write('\t')
oFile.write('\n')
作为注释,请确保使用正确的分隔符正确格式化inFile和inList。在这种情况下,我使用制表符(\ t)来分割线条。
答案 4 :(得分:0)
好的,我发现了。 这就是我所做的:
data = {}
for line in inlist:
k, v = [x.strip() for x in line.split('\t')]
data[k] = v
for line in infile:
infilecolumns = line.strip().split('\t')
value1 = data.get(infilecolumns[2])
value2 = data.get(infilecolumns[5])
if value1:
infilecolumns[2] = value1
if value2:
infilecolumns[5] = value2
outfile.write('\t'.join(infilecolumns) + '\n')
这使得所需的输出变得简单明了。 感谢您的所有答案,帮助了我很多!