循环遍历具有相同根名称和相同文件夹中的文件

时间:2020-07-07 09:25:28

标签: python

我有一个包含数百个文件的文件夹。从文件的名称来看,这些文件并非都是独立的,例如:name1_01.csvname1_02.csv,...,name1_10.csvname2_01.csvname_2_02.csv等 因此,存在多个根名称“ name1”,“ name2”,“ name3”等。我需要遍历同一根目录,以便可以合并这些文件的内容(例如,合并所有“ name1”文件的内容),然后删除重复的行,然后移至另一个根目录“ name2”,依此类推。 我不确定如何执行此操作,除非使用多个嵌套的for循环。有更好的方法吗?

2 个答案:

答案 0 :(得分:0)

这对我有用:

import os
path = os.path.normpath('C:\\Users\\mateo\\Documents\\files')
path_list = os.listdir(path)
main_files = {}
for file in path_list:
    name = file.split('_')[0]
    if name in main_files:
        with open(os.path.join(path, name + '.txt'), 'a+') as f:
            with open(os.path.join(path, file), 'r') as f1:
                for line in f1:
                    if line not in main_files[name]:
                        f.write(line)
                        main_files[name].add(line)

    else:
        main_files[name] = set()
        with open(os.path.join(path, file), 'r') as f:
            with open(os.path.join(path, name + '.txt'), 'w+') as f1:
                for line in f:
                    if line not in main_files[name]:
                        f1.write(line)
                        main_files[name].add(line)

enter image description here

在这里您具有合并的文件,其中包含重复项已删除的文件,请注意,要使此文件夹起作用,该文件夹必须只包含感兴趣的文件,并且这些文件应遵循命名约定:“ nameN_something.extension”,用于您的情况下,将扩展名从.txt更改为.csv

答案 1 :(得分:0)

from pathlib import Path
from itertools import groupby
import re

def groupby_fn(filename):
    m = re.match(r'[^_]+_', filename) # match everything up until the first '_'
    return m[0]


dir = sorted(str(f) for f in Path('.').glob('*_.csv') if f.is_file())
groups = []
for k, g in groupby(dir, groupby_fn):
    groups.append(list(g)) # these are your groupings
for group in groups:
    print(group) # this is a list of files the same prefix, such as 'name1_'

在我的目录中,我有文件:

04/10/2019  06:03 PM            28,744 donors1.csv
04/12/2019  12:02 PM             9,821 donors10.csv
04/12/2019  12:15 PM            14,019 donors3.csv
04/12/2019  12:01 PM            15,581 donors5.csv

因此,我修改了上面的代码:

from pathlib import Path
from itertools import groupby
import re

def groupby_fn(filename):
    m = re.match(r'donors\d', filename) # match everything up until the first digit
    return m[0]


dir = sorted(str(f) for f in Path('.').glob('donors*.csv') if f.is_file()) # or glob('*') if you want all files
groups = []
for k, g in groupby(dir, groupby_fn):
    groups.append(list(g)) # these are your groupings
for group in groups:
    print(group) # this is a list of files the same prefix, such as 'donors1'

打印:

['donors1.csv', 'donors10.csv']
['donors3.csv']
['donors5.csv']
相关问题