如何基于python

时间:2018-08-02 12:52:26

标签: python file sorting

我有多个文件,其行格式为:

8 upchimy79 291160.8516853 345706.9991016
9 upchimy79 291160.8516853 345706.9991016
70 upchimy79 291178.7591454 345733.5179607
134 upchimy79 291391.9184244 345688.8950164
190 upchimy79 291511.4331200 345634.4573389

和:

0 eapceou79 289109.1707774 345638.6043512
60 eapceou79 289091.8125863 345656.2855532
120 eapceou79 289041.8477906 345702.7290361
183 eapceou79 288993.3282226 345747.8902265
215 eapceou79 289074.9134241 345759.2455079

我想将所有文件合并在一起,以便第一个数字按升序排列。因此输出如下:

0 eapceou79 289109.1707774 345638.6043512
8 upchimy79 291160.8516853 345706.9991016
9 upchimy79 291160.8516853 345706.9991016
60 eapceou79 289091.8125863 345656.2855532
70 upchimy79 291178.7591454 345733.5179607
120 eapceou79 289041.8477906 345702.7290361
134 upchimy79 291391.9184244 345688.8950164

我要执行此操作的文件很多,每个文件大约有1400行,因此不确定实现此目的的最佳方法。

3 个答案:

答案 0 :(得分:1)

当所有文件单独排序时(如您的示例),您可以使用带有heapq.merge参数的keydocs here)来合并它们。此示例有两个文件,但是您可以通过这种方式合并任意数量的文件:

from heapq import merge

with open('f1.txt', 'r', newline='') as f1_in, \
     open('f2.txt', 'r', newline='') as f2_in, \
     open('data_out.txt', 'w', newline='') as f_out:

     for line in merge(f1_in, f2_in, key=lambda l: int(l.split(' ')[0])):
        f_out.write(line)

输出文件中的行如下所示:

0 eapceou79 289109.1707774 345638.6043512
8 upchimy79 291160.8516853 345706.9991016
9 upchimy79 291160.8516853 345706.9991016
60 eapceou79 289091.8125863 345656.2855532
70 upchimy79 291178.7591454 345733.5179607
120 eapceou79 289041.8477906 345702.7290361
134 upchimy79 291391.9184244 345688.8950164
183 eapceou79 288993.3282226 345747.8902265
190 upchimy79 291511.4331200 345634.4573389
215 eapceou79 289074.9134241 345759.2455079

答案 1 :(得分:0)

熊猫非常适合这样的事情:

d1 = pd.read_csv(file1, delimiter=' ', index_col=0, header=None)
d2 = pd.read_csv(file2, delimiter=' ', index_col=0, header=None)

df = pd.concat([d1, d2], axis=0).sort_index()

答案 2 :(得分:0)

import pandas as pd

all_your_files = ["filenames","filename2",...]

all_dfs = ( pd.read_csv(f, delimiter=' ', delim_whitespace=True, header=["nr","name","d2","d3"], ) \
            for f in all_your_files)

df = pd.concat(all_dfs)
df.sort_values(by='nr', inplace=true)

使它们一次全部排序。然后用熊猫写出简历:

df.to_csv("file_name", index=False, header=None, delimiter=" ")

如果不使用第一位数字作为索引,则如果其中包含某些数字,您将受到欺骗。