根据python 2中的2列更改文本文件

时间:2018-11-25 23:48:43

标签: python

我有一个用制表符分隔的文本文件,例如以下示例:

infile

chr1    +   1071396 1271396 LOC
chr12   +   1101483 1121483 MIR200B

我想将columns 3 and 4infile之间的差异除以100,并在infile中每行制作100行,并创建一个名为newfile的新文件。 并制作包含6列的最终tab separated文件。前5列类似于infile,第6列是(第5列)_part number(数字是1到100)。 这是预期的输出文件:

expected output

chr1    +   1071396 1073396 LOC LOC_part1
chr1    +   1073396 1075396 LOC LOC_part2
.
.
.
chr1    +   1269396 1271396 LOC LOC_part100
chr12   +   1101483 1101683 MIR200B MIR200B_part1
chr12   +   1101683 1101883 MIR200B MIR200B_part2
.
.
.
chr12   +   1121283 1121483 MIR200B MIR200B_part100

我编写了以下代码来获得预期的输出,但未返回预期的结果。实际上,使用以下代码的输出的第3列和第4列不正确。问题是2nd代码段。

file = open('infile.txt', 'rb')
cont = []
for line in file:
    cont.append(list(filter(lambda x: not x.isspace(), line.split('\t'))))
    new = []
    for i in cont:
        new.append([s.replace('\n', '') for s in i])



newfile = []
for i in new:
    diff= (int(i[3])-int(i[2]))/100
    left = int(i[2])
    right = int(i[2]) + diff
    for j in range(100):
        add = [i[0], i[1], left, right, i[4],str(i[4])+'_part' + str(j)]
        newfile.append(add)


    with open('output.txt', 'w') as f:
        for i in newfile:
            for j in i:
                f.write(i + '\n')

您知道如何解决该问题吗?

1 个答案:

答案 0 :(得分:0)

首先,您不需要每次迭代都计算diff的值,因为它总是一样的。只需计算一次并重复使用即可。

此外,只有两条兴趣线,您可以使用split轻松阅读并string.split()

这是一个一般示例

x = 'chr1    +   1071396 1271396 LOC' # assuming we are reading this from file


x = x.split() # it gives you a list
left_num = int(x[2]) # convert numbers to int
right_num = int(x[3])
diff= (right_num-left_num)/100 # get the difference only once

last_column = x[4] + "_part" # generate last column


with open("output.txt", "w+") as op_file: # open file to write
    op_file.write('{}\t{}\t{}\t{}\t{}\t{}\n'.format(x[0], x[1], left_num, right_num, x[4], last_column + str(1))) # write first line 
    for num in range(2,101):
        temp = int(right_num) # temporary container to hold right value
        right_num = int(right_num + diff) # calc difference
        op_file.write('{}\t{}\t{}\t{}\t{}\t{}\n'.format(x[0], x[1], temp, right_num, x[4], last_column + str(num)))

这会给你

chr1    +   1071396 1271396 LOC LOC_part1
chr1    +   1271396 1273396 LOC LOC_part2
chr1    +   1273396 1275396 LOC LOC_part3
chr1    +   1275396 1277396 LOC LOC_part4
chr1    +   1277396 1279396 LOC LOC_part5
chr1    +   1279396 1281396 LOC LOC_part6
chr1    +   1281396 1283396 LOC LOC_part7
chr1    +   1283396 1285396 LOC LOC_part8
chr1    +   1285396 1287396 LOC LOC_part9
chr1    +   1287396 1289396 LOC LOC_part10
chr1    +   1289396 1291396 LOC LOC_part11
chr1    +   1291396 1293396 LOC LOC_part12
chr1    +   1293396 1295396 LOC LOC_part13
chr1    +   1295396 1297396 LOC LOC_part14
chr1    +   1297396 1299396 LOC LOC_part15
chr1    +   1299396 1301396 LOC LOC_part16
chr1    +   1301396 1303396 LOC LOC_part17
chr1    +   1303396 1305396 LOC LOC_part18
chr1    +   1305396 1307396 LOC LOC_part19
chr1    +   1307396 1309396 LOC LOC_part20
chr1    +   1309396 1311396 LOC LOC_part21
chr1    +   1311396 1313396 LOC LOC_part22
chr1    +   1313396 1315396 LOC LOC_part23
chr1    +   1315396 1317396 LOC LOC_part24
chr1    +   1317396 1319396 LOC LOC_part25
chr1    +   1319396 1321396 LOC LOC_part26
chr1    +   1321396 1323396 LOC LOC_part27
chr1    +   1323396 1325396 LOC LOC_part28
chr1    +   1325396 1327396 LOC LOC_part29
chr1    +   1327396 1329396 LOC LOC_part30
chr1    +   1329396 1331396 LOC LOC_part31
chr1    +   1331396 1333396 LOC LOC_part32
chr1    +   1333396 1335396 LOC LOC_part33
chr1    +   1335396 1337396 LOC LOC_part34
chr1    +   1337396 1339396 LOC LOC_part35
chr1    +   1339396 1341396 LOC LOC_part36
chr1    +   1341396 1343396 LOC LOC_part37
chr1    +   1343396 1345396 LOC LOC_part38
chr1    +   1345396 1347396 LOC LOC_part39
chr1    +   1347396 1349396 LOC LOC_part40
chr1    +   1349396 1351396 LOC LOC_part41
chr1    +   1351396 1353396 LOC LOC_part42
chr1    +   1353396 1355396 LOC LOC_part43
chr1    +   1355396 1357396 LOC LOC_part44
chr1    +   1357396 1359396 LOC LOC_part45
chr1    +   1359396 1361396 LOC LOC_part46
chr1    +   1361396 1363396 LOC LOC_part47
chr1    +   1363396 1365396 LOC LOC_part48
chr1    +   1365396 1367396 LOC LOC_part49
chr1    +   1367396 1369396 LOC LOC_part50
chr1    +   1369396 1371396 LOC LOC_part51
chr1    +   1371396 1373396 LOC LOC_part52
chr1    +   1373396 1375396 LOC LOC_part53
chr1    +   1375396 1377396 LOC LOC_part54
chr1    +   1377396 1379396 LOC LOC_part55
chr1    +   1379396 1381396 LOC LOC_part56
chr1    +   1381396 1383396 LOC LOC_part57
chr1    +   1383396 1385396 LOC LOC_part58
chr1    +   1385396 1387396 LOC LOC_part59
chr1    +   1387396 1389396 LOC LOC_part60
chr1    +   1389396 1391396 LOC LOC_part61
chr1    +   1391396 1393396 LOC LOC_part62
chr1    +   1393396 1395396 LOC LOC_part63
chr1    +   1395396 1397396 LOC LOC_part64
chr1    +   1397396 1399396 LOC LOC_part65
chr1    +   1399396 1401396 LOC LOC_part66
chr1    +   1401396 1403396 LOC LOC_part67
chr1    +   1403396 1405396 LOC LOC_part68
chr1    +   1405396 1407396 LOC LOC_part69
chr1    +   1407396 1409396 LOC LOC_part70
chr1    +   1409396 1411396 LOC LOC_part71
chr1    +   1411396 1413396 LOC LOC_part72
chr1    +   1413396 1415396 LOC LOC_part73
chr1    +   1415396 1417396 LOC LOC_part74
chr1    +   1417396 1419396 LOC LOC_part75
chr1    +   1419396 1421396 LOC LOC_part76
chr1    +   1421396 1423396 LOC LOC_part77
chr1    +   1423396 1425396 LOC LOC_part78
chr1    +   1425396 1427396 LOC LOC_part79
chr1    +   1427396 1429396 LOC LOC_part80
chr1    +   1429396 1431396 LOC LOC_part81
chr1    +   1431396 1433396 LOC LOC_part82
chr1    +   1433396 1435396 LOC LOC_part83
chr1    +   1435396 1437396 LOC LOC_part84
chr1    +   1437396 1439396 LOC LOC_part85
chr1    +   1439396 1441396 LOC LOC_part86
chr1    +   1441396 1443396 LOC LOC_part87
chr1    +   1443396 1445396 LOC LOC_part88
chr1    +   1445396 1447396 LOC LOC_part89
chr1    +   1447396 1449396 LOC LOC_part90
chr1    +   1449396 1451396 LOC LOC_part91
chr1    +   1451396 1453396 LOC LOC_part92
chr1    +   1453396 1455396 LOC LOC_part93
chr1    +   1455396 1457396 LOC LOC_part94
chr1    +   1457396 1459396 LOC LOC_part95
chr1    +   1459396 1461396 LOC LOC_part96
chr1    +   1461396 1463396 LOC LOC_part97
chr1    +   1463396 1465396 LOC LOC_part98
chr1    +   1465396 1467396 LOC LOC_part99
chr1    +   1467396 1469396 LOC LOC_part100