我有不同名称(日期)的不同目录,我想从所有这些目录中提取具有大陆名称的文件,然后将该文件合并到所有日期。有人能告诉我在python中最有效的方法吗?
我已经使用glob包进入目录。但不知道如何合并它们:
import glob
path = '/home/Data/pb/2014-*/ank.txt.gz'
for file in glob.glob(path):
file.readlines()
答案 0 :(得分:1)
要阅读.gz文件,您需要gzip模块:
import glob
import gzip
path = '/home/Data/pb/2014-*/ank.txt.gz'
# loop for each file *name* matching the glob pattern
for fname in glob.glob(path):
# open the file as a gzip compressed file
with gzip.open(fname, 'rt') as f:
# for each line of the file
for data in f:
# do whatever you need here
# ...
答案 1 :(得分:1)
假设:
pb/2014-01-01/file_of_intereste.txt
pb/2014-02-01/file_of_intereste.txt
pb/2014-03-01/file_of_intereste.txt
...
首先,创建我的测试环境:
# Created 10 files in 10 directories named
# pb/2014-$i/file_of_interest.txt. Then
# pushed "contents_of_file_2014-$i" into each file.
jon$ for i in $(seq 1 10); do mkdir -p pb/2014-$i; echo contents_of_file_2014-$i > pb/2014-$i/file_of_interest.txt; done
# Run the merge.py (source below)
jon$ python merge.py
# See the output
jon$ cat output.txt
contents_of_file_2014-1
contents_of_file_2014-10
contents_of_file_2014-2
contents_of_file_2014-3
contents_of_file_2014-4
contents_of_file_2014-5
contents_of_file_2014-6
contents_of_file_2014-7
contents_of_file_2014-8
contents_of_file_2014-9
merge.py
$ cat merge.py
#!/usr/bin/env python
import glob
import gzip
merged_fname = "output.txt"
files = glob.glob('pb/2014-*/file_of_interest.txt')
with open(merged_fname, 'w') as merged_file_handle:
for fname in files:
# For gzip, use the gzip opener instead.
# @sylvain
#with gzip.open(fname, 'rt') as file_handle:
with open(fname, 'r') as file_handle:
merged_file_handle.write(file_handle.read())