Question

我有2个文本文件，例如以下示例。我先命名其中一个（逗号分隔），再命名一个（制表符分隔）。

SELECT invoicenumber AS invoice , invoicedate AS date , SUM(orderqty*ordersell) AS 'Invoice Total' FROM invoice JOIN orderdetail ON invoice.invoiceid = orderdetail.invoiceid GROUP BY invoicenumber,invoicedate ORDER BY `Invoice Total` DESC LIMIT 1;：

first

chr1,105000000,105310000,2,1,3,2
chr1,5310000,5960000,2,1,5,4
chr1,1580000,1180000,4,1,5,3
chr19,107180000,107680000,1,1,5,4
chr1,7680000,8300000,3,1,1,2
chr1,109220000,110070000,4,2,3,3
chr1,11060000,12070000,6,2,7,4

second:

AKAP8L  chr19   107180100   107650000   transcript
AKAP8L  chr19   15514130    15529799    transcript
AKIRIN2 chr6    88384790    88411927    transcript
AKIRIN2 chr6    88410228    88411243    transcript
AKT3    chr1    105002000   105010000   transcript
AKT3    chr1    243663021   244006886   transcript
AKT3    chr1    243665065   244013430   transcript

列first file和2中的

是3和start。 end列second file和3中的分别是开始和结束。我想从第一个和第二个文件中创建一个新的文本文件。实际上，如果出现以下情况，我想先从文件中选择一些行：

在新文件中，我将添加2个新列，分别为1- the 1st column in file first is equal to 2nd column in file second. 2- the 3rd column in the file second is greater than the the 2nd column in the file first and also smaller than the 3rd column in the file first. 3- the 4th column in the file second should be also greater than the the 2nd column in the file first and also smaller than the 3rd column in the file first.和ID，基本上，我将计算第二个文件中具有上述3个条件的行数。对于ID，我将使用第二个文件的第一行，该第一行与文件中的第一行匹配。换句话说，我想根据上述3个条件，计算文件第二行中首先匹配文件中每一行的行数。该示例的预期输出将如下所示：

count

在此预期输出中，第1列第7列来自文件第一个，第8列为ID（第二个文件被获取），第9列为计数（第二个文件中的行数）首先与文件中的这些特定行匹配。

我试图用python做到这一点并编写了这段代码，但是它没有返回我想要的东西。

chr19,107180000,107680000,1,1,5,4,AKAP8L, 1
chr1,105000000,105310000,2,1,3,2, AKT3, 1

Answer 1

我真的不了解计数的必要性count，因为据我所知，它看起来像是行索引。它以注释的形式包含在下面的代码中（如果要使用count，请取消注释）。在if语句中，您将值作为字符串而不是整数进行比较，因此必须首先将它们转换为整数。在append的参数中，您试图连接单个元素，这是行不通的。只需将它们包裹在方括号中即可。此外，也无需将二进制格式的纯文本文件打开。

我使用了csv模块，我认为它简化了它。而且，您遍历数据的方式没有任何问题，但是使用for item in mylist而不是for i in range(len(mylist)): item = mylist[i]通常更容易，但是尝试一下：

import csv

with open('first.csv', 'r') as firstfile, open('second.txt', 'r') as secondfile:
    first = list(csv.reader(firstfile))
    second = list(csv.reader(secondfile, delimiter='\t'))

final = []
#count = 0

for row1 in first:
    for row2 in second:
        if (row1[0] == row2[1] and int(row1[2]) > int(row2[2]) > int(row1[1])
                and int(row1[2]) > int(row2[3]) > int(row1[1])):
            final.append(row1 + [row2[0]])
            #count += 1
            #final.append(row1 + [row2[0]] + [count])

with open('output.txt', 'w') as outfile:
    outwriter = csv.writer(outfile)
    outwriter.writerows(final)

总结两个文本文件，并在python中制作一个新文件

1 个答案: