Question

如何确定文件中是否存在两个文件相同的模式。如果所有文件名都有两组文件名（csv.new和csv），则继续执行下一步，否则退出并显示错误消息。

前缀“abc_package”将包含两个文件，一个扩展名为“csv.new”，另一个文件扩展名为“csv”。 “list_of_files.txt”中可能有许多文件名。

Ex：List_of_files.txt

abc_package.1406728501.csv.new
abc_package.1406728501.csv
abc_package.1406724901.csv.new
abc_package.1406724901.csv

Answer 1

为了匹配python中的文件名名，你可以使用fnmatch module..I我将为你提供文档中的示例代码。

import fnmatch
import os

for file in os.listdir('.'):
if fnmatch.fnmatch(file, '*.txt'):
    print file

语法为fnmatch.fnmatchcase(filename, pattern)

请查看here了解更多示例

Answer 2

假设文件不是那么荒谬，以至于你无法将其放入内存中，只需创建一组所有.csv.new个文件和一组所有.csv个文件，确认它们是相同的。例如：

csvfiles = set()
newfiles = set()
with open('List_of_files.txt') as f:
    for line in f:
        line = line.rstrip()
        if line.endswith('.csv.new'):
            newfiles.add(line[:-4])
        elif line.endswith('.csv'):
            csvfiles.add(line)
if csvfiles != newfiles:
    raise ValueError('Mismatched files!')

如果您想知道哪些文件不匹配，csvfiles - newfiles会为您提供.csv个文件而不会显示相应的.csv.new，而newfiles - csvfiles会为您提供相反的文件。

（从使用os.path.splitext到使用通用分区 - 可迭代过滤器功能，有一些方法可以使这个更干净，更具可读性，但我认为这对于新手来说应该是最容易的理解。）

Answer 3

with open("in.txt","r") as fo:
    f = fo.readlines()
    cs_new = set()
    cs = set()
    for ele in f:
        ele = ele.rstrip()
        if not ele.endswith(".new"):
            cs.add(ele)
        else:
            cs_new.add(ele.split(".new")[0])
    diff = cs ^ cs_new
    for fi in diff:
        print fi

由于您需要任何一个文件名，您需要检查两个列表是否存在：

with open("in.txt","r") as f:
    f = [x.rstrip() for x in f]
    cs, cs_new, diff = [],[],[]
    for ind, ele in enumerate(f):
        if ele.endswith(".csv"):
            cs.append(ele)
        else:
            cs_new.append([ele.split(".new")[0],ind]) # keep track of original element in with the ind/index
    for ele in cs:
        if not any(ele in x for x in cs_new):
            diff.append(ele)
    for ele in cs_new:
        if not any(ele[0] in x for x in cs):
            diff.append(f[ele[1]]) # append original element with full extension

如何使用python匹配文件中的文件名

3 个答案: