Python:检查列表中的文件是否存在,仅在存在时执行函数

时间:2011-10-16 06:08:23

标签: python function csv for-loop

Python noob ...请保持温柔。在我当前的程序中,我有一个包含3个文件的列表,这些文件可能存在也可能不存在于我当前的目录中。如果它们确实驻留在我的目录中,我希望能够为它们分配值,以便稍后在其他函数中使用。如果文件不在目录中,则不应为其分配值,因为该文件仍然不存在。我到目前为止的代码如下:

import os, csv

def chkifexists():
    files = ['A.csv', 'B.csv', 'C.csv']
    for fname in files:
        if os.path.isfile(fname):
            if fname == "A.csv":
                hashcolumn = 7
                filepathNum = 5
            elif fname == "B.csv":
                hashcolumn = 15
                filepathNum = 5
            elif fname == "C.csv":
                hashcolumn = 1
                filepathNum = 0
        return fname, hashcolumn, filepathNum


def removedupes(infile, outfile, hashcolumn):
    fname, hashcolumn, filepathNum = chkifexists()
    r1 = file(infile, 'rb')
    r2 = csv.reader(r1)
    w1 = file(outfile, 'wb')
    w2 = csv.writer(w1)
    hashes = set()
    for row in r2:
        if row[hashcolumn] =="": 
            w2.writerow(row)       
            hashes.add(row[hashcolumn])  
        if row[hashcolumn] not in hashes:
            w2.writerow(row)
            hashes.add(row[hashcolumn])
    w1.close()
    r1.close()


def bakcount(origfile1, origfile2):
    '''This function creates a .bak file of the original and does a row count to determine
    the number of rows removed'''
    os.rename(origfile1, origfile1+".bak")
    count1 = len(open(origfile1+".bak").readlines())
    #print count1

    os.rename(origfile2, origfile1)
    count2 = len(open(origfile1).readlines())
    #print count2

    print str(count1 - count2) + " duplicate rows removed from " + str(origfile1) +"!"


def CleanAndPrettify():
    print "Removing duplicate rows from input files..."
    fname, hashcolumn, filepathNum = chkifexists()
    removedupes(fname, os.path.splitext(fname)[0] + "2.csv", hashcolumn)
    bakcount (fname, os.path.splitext(fname)[0] + "2.csv")


CleanAndPrettify()

我遇到的问题是代码在列表中运行并在找到的第一个有效文件处停止。

我不确定我是否完全以错误的方式思考它,但我认为我做得对。

此程序的当前输出与A.csv,B.csv和C.csv存在于同一目录中:

Removing duplicate rows from input files...
2 duplicate rows removed from A.csv!

所需的输出 应为:

Removing duplicate rows from input files...
2 duplicate rows removed from A.csv!
5 duplicate rows removed from B.csv!
8 duplicate rows removed from C.csv!

...然后继续创建.bak文件的下一部分。 此程序的输出没有同一目录中的任何CSV文件:

UnboundLocalError: local variable 'hashcolumn' referenced before assignment

3 个答案:

答案 0 :(得分:2)

当然它会在第一场比赛后停止,因为你正在从一个函数做return。相反,您应该在循环中填充一些数组,最后在return填充它,或者在每次迭代时使用yield创建一个生成器,如果没有找到则raise StopIteration。第一种方法更简单,更接近您的解决方案,这里是:

import os, csv

def chkifexists():
    files = ['A.csv', 'B.csv', 'C.csv']
    found = []
    for fname in files:
        if os.path.isfile(fname):
            if fname == "A.csv":
                hashcolumn = 7
                filepathNum = 5
            elif fname == "B.csv":
                hashcolumn = 15
                filepathNum = 5
            elif fname == "C.csv":
                hashcolumn = 1
                filepathNum = 0
            found.append({'fname': fname,
                          'hashcolumn': hashcolumn,
                          'filepathNum': filepathNum})
    return found

found = chkifexists()
if not found:
    print 'No files to scan'
else
    for f in found:
        print f['fname'], f['hashcolumn'], f['filepathNum']

答案 1 :(得分:2)

您正在使用的检查条件不是比较python中两个字符串的建议方法。 除非您明确interning字符串,否则不应使用is进行比较,因为无法保证它会返回True 请改用==

或者,您可以执行以下操作:

files=['A.csv', 'B.csv', 'C.csv']
filedict['A.csv']=(7,5)
filedict['B.csv']=(15,5)
filedict['C.csv']=(1,0)
print [(fname,filedict[fname]) for fname in files if filedict.has_key(fname) and os.path.isfile(fname)]

答案 2 :(得分:1)

您的代码中存在一些问题。

首先,chkifexists在找到现有文件后立即return,因此它永远不会检查任何剩余的名称;另外,如果没有找到文件,则永远不会设置hashcolumn和filepathNum - 为您提供UnboundLocalError

其次,您在chkifexistsremovedupes两个地方呼叫CleanAndPrettify。因此removedupes将针对每个现有文件的每个现有文件运行 - 而不是您想要的!事实上,由于CleanAndPrettify刚刚验证了文件存在removedupes应该只是随身携带。

至少有三种方法可以处理没有找到文件的情况:让chkifexists引发异常;在CleanAndPrettify中有一个标记,用于跟踪是否找到了文件;或者将chkifexists的结果转换为list,然后您可以检查空虚。

在修改后的代码中,我将文件移动到一个字典中,其名称为键,值为hashcolumnfilepathNum的元组。 chkifexists现在接受要查找的文件名作为字典,yield是找到文件时的值;如果没有找到文件,将引发NoFilesFound例外。

以下是代码:

import os, csv

# store file attributes for easy modifications
# format is 'filename': (hashcolumn, filepathNum)
files = {
        'A.csv': (7, 5),
        'B.csv': (15, 5),
        'C.csv': (1, 0),
        }

class NoFilesFound(Exception):
    "No .csv files were found to clean up"

def chkifexists(somefiles):
    # load all three at once, but only yield them if filename
    # is found
    filesfound = False
    for fname, (hashcolumn, filepathNum) in somefiles.items():
        if os.path.isfile(fname):
            filesfound = True
            yield fname, hashcolumn, filepathNum
    if not filesfound:
        raise NoFilesFound

def removedupes(infile, outfile, hashcolumn, filepathNum):
    # this is now a single-run function
    r1 = file(infile, 'rb')
    r2 = csv.reader(r1)
    w1 = file(outfile, 'wb')
    w2 = csv.writer(w1)
    hashes = set()
    for row in r2:
        if row[hashcolumn] =="": 
            w2.writerow(row)       
            hashes.add(row[hashcolumn])  
        if row[hashcolumn] not in hashes:
            w2.writerow(row)
            hashes.add(row[hashcolumn])
    w1.close()
    r1.close()


def bakcount(origfile1, origfile2):
    '''This function creates a .bak file of the original and does a row count
    to determine the number of rows removed'''
    os.rename(origfile1, origfile1+".bak")
    count1 = len(open(origfile1+".bak").readlines())
    #print count1

    os.rename(origfile2, origfile1)
    count2 = len(open(origfile1).readlines())
    #print count2

    print str(count1 - count2) + " duplicate rows removed from " \
        + str(origfile1) +"!"


def CleanAndPrettify():
    print "Removing duplicate rows from input files..."
    try:
        for fname, hashcolumn, filepathNum in chkifexists(files):
            removedupes(
                   fname,
                   os.path.splitext(fname)[0] + "2.csv",
                   hashcolumn,
                   filepathNum,
                   )
            bakcount (fname, os.path.splitext(fname)[0] + "2.csv")
    except NoFilesFound:
        print "no files to clean up"

CleanAndPrettify()

无法测试,因为我没有ABC .csv文件,但希望这会让您指向正确的方向。如您所见,raise NoFilesFound选项使用flag方法来跟踪未找到的文件;这是list方法:

def chkifexists(somefiles):
    # load all three at once, but only yield them if filename
    # is found
    for fname, (hashcolumn, filepathNum) in somefiles.items():
        if os.path.isfile(fname):
            filesfound = True
            yield fname, hashcolumn, filepathNum

def CleanAndPrettify():
    print "Removing duplicate rows from input files..."
    found_files = list(chkifexists(files))
    if not found_files:
        print "no files to clean up"
    else:
        for fname, hashcolumn, filepathNum in found_files:
            removedupes(...)
            bakcount(...)