Question

在我的目录中，存在以下文件（1_xxx.txt，2_xxx.txt，1_yyy.txt，2_yyy.txt，1_zzz.txt，2_zzz.txt）。这些文件的内容如下所示：

1_xxx.txt:
-114.265646442 34.0360392257
-112.977603537 31.6338662268
-117.239800991 36.1408246787
-114.716762067 32.958308901
-116.710069802 36.2660863375
-115.412539137 34.5790101356
-117.173651349 36.1032456689
-115.254332318 33.8689615728
-115.225643473 32.8079130497
-113.757416909 32.6491579487

2_xxx.txt:
-121.527298571 38.3074782763
-119.241009725 35.2597437123
-111.993090251 33.1087011262
-119.328464365 35.8944690935
-114.819870325 32.7076471384
-120.041889447 36.4080463723
-121.249592001 38.3951295581
-121.078565259 37.6730108558
-120.523147893 37.2889578323
-119.2383536 35.9028202963

1_yyy.txt:
-109.690156887 34.2072891001
-119.780672722 38.7665894396
-118.557741892 35.6314002547
-118.483411917 36.3579432166
-123.472136838 39.1714120111
-123.485136802 40.0894616596
-109.185105643 33.8647845733
-120.046426359 38.4660843951
-122.929234616 40.1186699391
-123.300682512 39.2757431576
2_yyy.txt:
-120.915282551 37.0468246029
-118.168309521 37.606220824
-111.172152572 35.5687631188
-110.999951025 34.8671827527
-120.375558342 37.7773687622
-121.028079242 36.5374775742
-118.53486589 36.7879815762
-115.771046166 39.1046390941
-117.618352132 39.3133019115
-110.163871705 34.6500104537

1_zzz.txt:
-117.442417987 34.0694542108
-118.624320171 34.3117074054
-111.915932786 33.6893480358
-118.214145399 34.0360392257
-122.189710383 37.6396159347
-122.413202409 37.9443375576
-115.524007077 32.9541312874
-117.735266836 33.9107314118
-110.840774505 32.3734158543
-122.399926026 37.7898915865

2_zzz.txt:
-106.544451063 31.5126888716
-112.728165588 32.3232796291
-117.793575105 34.8128904057
-116.464953895 32.3441697714
-116.206850112 34.2448798952
-121.758363934 37.9819048821
-113.317063698 33.5306154403
-115.999423067 31.4750816387
-115.257632657 37.8817248156
-117.558324417 37.4684639908

我想合并那些匹配xxx，yyy和zzz的文件，并为每个组写一个单独的文件：

files = glob.glob('*.txt')
my_classes = ['xxx', 'yyy', 'zzz']

files_dict ={}
for c in my_classes:
    files_dict[c] = [f for f in files if c in f]

all_lons, all_lats ={}, {}

for e, f in files_dict.iteritems():
    all_lons[e], all_lats[e] = [], []
    for x in f:
        fi = open(x, 'r')
        lines = fi.read().splitlines()
        lons = [l.split(' ')[0] for l in lines]
        lats = [l.split(' ')[1] for l in lines]
        all_lons[e].apend(lons)
        all_lats[e].apend(lats)
for g, h in all_lons.iteritems():
    for i, j in all_lats.iteritems():
        with open(g + '_final.txt', 'w') as fo:
            fo.write(str(h) + str(j) + '\n' )

由于我对Python的了解有限，我无法做到。我等着知道解决问题的最佳做法。对应于我的每个类（即xxx，yyy，zzz）的文本文件数量超过本示例中显示的两个。

Answer 1

由于你在windows下，shell选项可能不适合你，除非你正在使用类似POSIX的shell，所以这里有一个类似于python的方法。

import glob
files = glob.glob('*.txt')
my_classes = ['xxx', 'yyy', 'zzz']

for cls in my_classes:
    files = glob.glob('*{}*.txt'.format(cls))
    if files:
        with open(cls, 'w') as fout:
            for file_name in files:
                with open(file_name, 'r') as fin:
                    fout.write(fin.read())

这是基于这样一个事实，即你真的不需要对每个文件的内容进行任何处理，除了根据文件名中的某些关键字将它们全部放在一起。

Answer 2

不是Python，但你可以使用shell的强大功能：

for g in xxx yyy zzz
do
  cat *_$g.txt > $g.txt
done

这将合并文件xxx等中的所有xxx.txt个文件。

Answer 3

import glob

path = raw_input('Enter input path:')
files = glob.glob(path + '/*.*')
patterns = ['xxx', 'yyy', 'zzz']

for pattern in patterns :
    content = ''
    for filename in files :
        if pattern in filename :
            with open(filename) as fread:
                content = content + fread.read()            
    with open('output' + pattern +'.txt', 'w') as fwrite:
        fwrite.write(content)

Answer 4

import glob


files = glob.glob('*.txt')
my_classes = ['xxx', 'yyy', 'zzz']

files_dict ={}
for c in my_classes:
    files_dict[c] = [f for f in files if c in f]

for e, f in files_dict.iteritems():
    with open(e + '_final.txt', 'w') as fo:
        for x in f:
            with open (x, 'r') as fi:
                fo.write(fi.read())

使用Python组合属于不同组的多个文本文件

4 个答案: