我对python非常陌生,我一直在玩两个文件编写脚本。文件1包含许多ID号,例如:
1000012
1000015
1000046
1000047
1000050
1000072
1000076
100008
1000102
100013
另一个文件只有几行单个ID号,后跟一个ID号后面的行,后跟其他ID号,最后有一个+或 - :
951450
8951670
8951800
8951863
8951889
9040311
9255087 147+ 206041- 8852164- 4458078- 1424812- 3631438- 8603144+ 4908786- 4780663+ 4643406+ 3061176- 7523696- 5876052- 163881- 6234800- 395660-
9255088 149+ 7735585+ 6359867+ 620034- 4522360- 2810885- 3705265+ 5966368- 7021344+ 9165926- 2477382+ 4015358- 2497281+ 9166415+ 6837601-
9255089 217+ 6544241+ 5181434+ 4625589+ 7433598+ 7295233+ 3938917+ 4109401+ 2135539+ 4960823+ 1838531+ 1959852+ 5698864+ 1925066+ 8212560+ 3056544+ 82N 1751642+ 4772695+ 2396528+ 2673866+ 2963754+ 5087444+ 977167+ 2892617- 7412278- 6920479- 2539680- 4315259- 8899799- 733101- 5281901- 7055760+ 8508290+ 8559218+ 7985985+ 6391093+ 2483783+ 8939632+ 3373919- 924346+ 1618865- 8670617+ 515619+ 5371996+ 2152211+ 6337329+ 284813+ 8512064+ 3469059+ 3405322+ 1415471- 1536881- 8034033+ 4592921+ 4226887- 6578783-
我想使用这两个文件构建一个字典。我的脚本必须在文件2中搜索文件1中的ID号,并将这些行附加为键,其中键是文件1中的数字。因此,每个键可能有多个值。我只想搜索文件2中有多个数字的行(如果len(x)> 1)。
输出将是这样的:1000047:9292540 1000047+ 9126889+ 3490727- 8991434+ 4296324+ 9193432- 3766395+ 9193431+ 8949379-(我需要在File1中打印每个ID号作为键及其值,包含该ID号作为一个整体的行块
这是我的非常错误的脚本:
#!/usr/bin/python
f = open('file1')
z = open('file2')
d = dict() # d is an empty dictionary
for l in f:
p = l.rstrip()
d[p] = list() # sets the keys in the dictionary as p (IDs with newline characters stripped)
y = z.readlines() # retrieves a string from the path file
s = "".join(y) # makes a string from y
x = str.split(s) #splits the path file at white spaces
if len(x) > 1: # only the lines that include contigs IDs that were used to make another contig
for lines in y:
k = lines.rstrip()
w = tuple(x) # convert list x into a tuple called w
for i in w:
if i[:-1] in d:
d[p].append(k)
print d
答案 0 :(得分:0)
这是你在寻找什么? (我没有测试过它......)
#!/usr/bin/python
f = open('file1')
z = open('file2')
d = dict() # d is an empty dictionary
for l in f.readlines():
for l2 in z.readlines():
if l.rstrip() in l2.rstrip():
d[l] = l2
z.seek(0, 0)
f.close()
z.close()
如果您不想处理文件指针
,这是一个更简单的版本相同的代码f = open("file1")
z = open("file2")
d = dict() # d is an empty dictionary
file1_lines = f.readlines()
file2_lines = z.readlines()
for l in file1_lines:
for l2 in file2_lines:
if l.rstrip() in l2.rstrip():
d[l] = l2
print d
f.close()
z.close()
答案 1 :(得分:0)
尝试:
#!/usr/bin/python
f = open('file1')
z = open('file2')
d = dict() # d is an empty dictionary
for l in f:
p = l.rstrip()
d[p] = list() # Change #1
f.close()
# Now we have a dictinary with the keys from file1 and empty lists as values
for line in z:
items = item.split() # items will be a list from 1 line
if len(items) > 1: # more than initial item in the list
k = items[0] # First is the key line
for i in items[1:]: # rest of items
if d.haskey(i[:-1]): # is it in the dict
d[i].append(k) # Add the k value
z.close()
print d
N.B。这是未经测试的代码,但不应该太远。