使用Python 3.5.1

时间:2016-05-03 08:17:20

标签: python sorting filenames

我需要按照名称中最常见的部分对大量(约20000)量的pdf文件进行排序。每个文件的结构都非常相似:XXX_1500004898_CommonPART.pdf(某些文件用" _&#34分隔;有些文件用" -")

这是我用过的代码:

files = []
for root, dirnames, files in os.walk(r'C:PATH/TO/FILES'):
    for file in fnmatch.filter(files, '*0000*.pdf'):
             print (file)
             files.append(os.path.join(root, file))
time.sleep(2)
sorted_files = sorted(files, key=lambda x: str(x.split('-')[2]))

但是当我运行它时,我唯一得到的就是追溯:

Traceback (most recent call last):
  File "C:\PATH\Sorting.py", line 14, in <module>
    sorted_files = sorted(files, key=lambda x: str(x.split('-')[2]))
  File "C:\PATH\Sorting.py", line 14, in <lambda>
    sorted_files = sorted(files, key=lambda x: str(x.split('-')[2]))
IndexError: list index out of range

我是Python的新手,所以我可能看起来没有经验,我仍然不知道如何告诉Python通过这些常见部分创建文件夹并将文件移动到那里。

你可以帮我解决这个问题吗?

非常感谢!

更新代码:

files_result = []
for root, dirnames, files in   os.walk(r'C:\PATH\TESTT'):
    for file in fnmatch.filter(files, '*0000*.pdf'):
            print (file)
            files_result.append(os.path.join(root, file))
time.sleep(2)
sorted_files = sorted(file.replace("_", "-").split("-")[2] for file in files_result if (file.count("-")+file.count("_") == 2))
print (sorted_files)

这就是结果:

['ALOISE emma.pdf', 'ALOISEEMMA.pdf', 'ARETEIA.pdf', 'ASSEL.pdf', 'AVV.BELLOMI.pdf', 'BRACI E ABBRACCI.pdf', 'CERRATA D..pdf', 'CERRATA REFRIGERAZIONE.pdf', etc.....]

以下是一些典型的文件名:

ANI-150000000106SD_approvato.pdf
ANI-1500000006-CENTROCHIRURGIAAMBULATORIALEsrl_approvato.pdf
ANI-1500000007-EUROMED ECOLOGICA_APPROVATO.pdf
ANI-1500000008-TELECOM_APPROVATO.pdf
ANI-1500000009-TELECOM_APPROVATO.pdf
ANI-15000000100-ALOISE EMMA_approvato.pdf
ANI-15000000101-centro.chirurgia.ambulatoriale_approvato.pdf
ANI-15000000102-TELECOM_APPROVATO.pdf
ANI-15000000103-MCLINK_APPROVATO.pdf
ani-15000000104-idrafer.pdf
ANI-15000000105EUROMEDECOLOGICA_approvata.pdf
ANI-15000000107LAGSERVICE.pdf
ANI-15000000109TCHR_approvato.pdf
ANI-1500000011-COOPSERVICEn9117011288 approvate (2).pdf
ANI-1500000011-COOPSERVICEn°9117011288.pdf
ANI-15000000110-TELECOM_APPROVATO.pdf
ANI-15000000113-SECURLAB_approvato.pdf
ANI-15000000114-SECURLAB_approvato.pdf
ANI-15000000115-COOPSERVICE_approvato.pdf
ANI-15000000116-COOPSERVICE_approvato.pdf
ANI-15000000117-REPOWER_approvato.pdf
ANI-15000000118-CECCHINIlaura_approvato.pdf
ANI-15000000119-DESENA_approvato.pdf
ANI-1500000012-TCHRSERCICES.R.L._approvato (1).pdf
ANI-15000000121-ALOISE_approvato.pdf
ANI-15000000122-LAGSERVICE.pdf
ANI-15000000123-SECURLAB_approvata.pdf
ANI-15000000125-QUERZOLA_approvato.pdf
ANI-15000000129-TC HR_apprpvato.pdf
ANI-1500000013-TAV_approvato.pdf
ANI-15000000130-LAGSERVICE.pdf
ANI-15000000131EUROMEDecologica_approvato.pdf
ANI-15000000132-LAV.pdf
ANI-15000000133-REPOWER.pdf
ANI-15000000134-MCLINK.pdf
ANI-15000000135-COOPSERVICE_approvato.pdf
ANI-15000000136-COOPSERVICE_approvato.pdf
ANI-15000000138-TCHR._approvatopdf.pdf
ANI-15000000139-ALOISEEMMA.pdf
ANI-1500000014-OFFICEDEPOT_approvato.pdf
ANI-15000000140_TELECOM.pdf
ANI-15000000141-CHIRURGIAAMBULATORIALE_approvato.pdf
ANI-15000000142-LAG.pdf
ANI-15000000143-LAG.pdf
ANI-15000000145-TELECOM.pdf
ANI-15000000146-LAG.pdf
ANI-15000000147-WERFEN.pdf
ani-15000000148-enigas.pdf
ANI-15000000153TCHR_approvato.pdf
ANI-15000000154-ASSEL.pdf
ANI-15000000155-DIGIUSEPPEgiancarlo.pdf
ANI-15000000156-SD.pdf
ANI-15000000157-SAS.pdf
ani-15000000158-energeticSOURCE.pdf
ANI-15000000159-chirurgia ambulatoriale.pdf
ANI-1500000016-THEMIX_approvato.pdf
ANI-15000000160-CERRATA REFRIGERAZIONE.pdf
ANI-15000000162-ALOISE emma.pdf
ANI-1500000017-ASSEL_approvato.pdf
ANI-1500000018-QUERZOLA_approvato.pdf
ANI-1500000019-BDO_approvato.pdf
ANI-1500000020-THEMIXfatt_ approvato.134.pdf
ANI-1500000021-SECURLAB_approvato.pdf
ANI-1500000022-LYRECO+DDT_approvato.pdf
ANI-1500000023-COOPSERVICE approvato (1).pdf
ANI-1500000024-REPOWER135812_approvato.pdf
ANI-1500000025-DR.BRANDIMARTE-fatt.35_approvato (1).pdf
ANI-1500000026-D.SSA AMBRUZZI_approvato.pdf
ANI-1500000027-COOPSERVICE9117034433 approvato (1).pdf
ANI-1500000031-TAVf.314_approvato.pdf
ANI-1500000032-d.ALOISEmaggio2015_approvato.pdf
ANI-1500000033-CENTROchirurgiaAMBULATORIALEf201500306_approvato.pdf
ANI-1500000034-WINDf.7407817176_approvato.pdf
ANI-1500000035-avv.BELLOMI.pdf
ANI-1500000038-TOPCARf._approvato.pdf
ANI-1500000039-TCHRf.000544_approvato.pdf
ANI-1500000040-THEMIX_approvato.pdf
ANI-1500000041-DESENA_approvato.pdf
ANI-1500000042-TCHRSERVICESf.000565_approvato.pdf
ANI-1500000043-QUERZOLAf.109_approvato.pdf
ANI-1500000047-TELEPASS.pdf
ANI-1500000049-WIND_approvato.pdf
ANI-1500000051-MCLINKf.109493_approvato.pdf
ANI-1500000052-MCLINKf.88508_approvato.pdf
ANI-1500000053-OFFICEDEPOT_approvato.pdf
ANI-1500000054-COOPSERVICEapprovatof 9117037004.pdf
ANI-1500000055-COOPSERVICEf 9117039325approvato.pdf
ANI-1500000056-SD_approvato.pdf
ANI-1500000057-REPOWER_approvato.pdf
ANI-1500000058-MCLINK_approvato.pdf
ANI-1500000059-LAG.pdf
ANI-1500000059WERFEN_approvato.pdf
ANI-1500000060WERFEN_approvato.pdf
ANI-1500000063-CENTROCHIRURGIAAMBULATORIALE_approvato.pdf
ANI-1500000064-dott.ALOISEemma_approvato.pdf
ANI-1500000066-MERCURI_approvato.pdf
ANI-1500000067-QUERZOLA_approvato.pdf
ANI-1500000070-TIM_approvato.pdf
ANI-1500000071LIFEBRAIN.pdf
ANI-1500000072-TC HR_approvato.pdf
ANI-1500000073-LAVAGGIO E GOMMISTA_approvato.pdf
ANI-1500000075-THEMIX_approvato.pdf
ANI-1500000076-EUROMEDecologica_approvato.pdf
ANI-1500000077-REPOWER_approvato.pdf
ANI-1500000078-SAS_approvato.pdf
ANI-1500000079-LAGSERVICE.pdf
ANI-1500000080-COOPSERVICE appr.pdf
ANI-1500000081-COOPSERVICE appr.pdf
ANI-1500000083-TAV_approvato.pdf
ANI-1500000084-aloise emma_approvato.pdf
ANI-1500000085-centro.chirurgia.ambulatoriale_approvato.pdf
ANI-1500000088-lagSERVICE.pdf
ANI-1500000089-FARMACIACAMERUCCI.pdf
ANI-1500000091-LAGservice.pdf
ANI-1500000092-ASSEL_approvata.pdf
ANI-1500000093-COOPSERVICE_approvato.pdf
ANI-1500000095-TCHR_approvato.pdf
ANI-1500000097-SAS (2)_approvato.pdf
ANI-1500000099-REPOWER_approvato.pdf
ARE-1500000001SAS_approvato.pdf
ARE-1500000002ACEA_approvato.pdf
ARE-1500000004VERGARI_approvato.pdf
ARE-1500000005PINTO_approvato.pdf
ARE-1500000006COSMOPOL_approvato.pdf
ARE-1500000007LAGSERVICE.pdf
ARE-1500000009 OFFICE DEPOT_ARETEIA.pdf
ARE-1500000010 SERVIZI ABITAZIONE_aqpprovato.pdf
ARE-1500000011 TELECOM_approvato.pdf
ARE-1500000012 TELECOM_approvato.pdf
ARE-1500000013 THEMIX_approvato.pdf
ARE-1500000014 QUERZOLA_approvato.pdf
ARE-1500000015 DA.CA. ESTINTORI_approvato.pdf
ARE-1500000016 COOPSERVICE approvato.pdf
ARE-1500000017-SAS.pdf
ARE-1500000017-SAS_approvato.pdf
ARE-1500000018-DR.BRANDIMARTE_approvato.pdf
ARE-1500000019-COOPSERVICE approvato.pdf
ARE-1500000020-BRACI E ABBRACCI.pdf
ARE-1500000021-COSMOPOL_approvato.pdf
ARE-1500000023-SAS_approvato.pdf
ARE-1500000024-MESCHINI_approvato.pdf
ARE-1500000025-VERGARI_approvato.pdf
ARE-1500000026-AVV.BELLOMI.pdf
ARE-1500000027-PINTO_approvato.pdf
ARE-1500000032-DA.CA_approvato.pdf
ARE-1500000033-SERVIZI ABITAZIONE_approvato.pdf
ARE-1500000034-QUERZOLA_approvato.pdf
ARE-1500000035-CERRATA D_approvato..pdf
ARE-1500000036-SECURLAB_approvata.pdf
ARE-1500000037-COSMOPOL_approvato.pdf
ARE-1500000038-OFFICE DEPOT_approvato.pdf
ARE-1500000039-MONIGEST_approvato.pdf
ARE-1500000040-MONIGEST_approvato.pdf
ARE-1500000041-COOPSERVICE approvato.pdf
ARE-1500000042-COOPSERVICE approvato.pdf
ARE-1500000043-SECURLAB_APPROVATO.pdf
ARE-1500000044-MESCHINI_APPROVATO.pdf
ARE-1500000045-ACEA_approvato.pdf
ARE-1500000047-PINTO_approvato.pdf
ARE-1500000050-VERGARI_approvato.pdf
ARE-1500000052-QUERZOLA_approvato.pdf
ARE-1500000053-CONTI ROSELLA_approvato.pdf.pdf
ARE-1500000057-DE SENA_approvato.pdf
ARE-1500000058-SERVIZI ABITAZIONE_approvato.pdf
ARE-1500000059-SECURLAB_approvato.pdf
ARE_1500000048_TELECOM_approvato.pdf
ARE_1500000049_TELECOM_approvato.pdf
ARE_1500000144_CERRATA D..pdf
BIO_1500000048_GIROLAMO LUCIANA_APPROVATO.pdf
BIO_1500000049_SPORTELLI MARIO_APPROVATO20150505_10081133.pdf
BIO_1500000050_LEGROTTAGLIE BENEDETTO_APPROVATO.pdf
BIO_1500000051_ANTIFORTUNISTICA MERIDIONALE_APPROVATO.pdf
BIO_1500000052_SAIL_APPROVATO.pdf
BIO_1500000053_SAIL_APPROVATO.pdf
BIO_1500000056_PRONTO UFFICIO_APPROVATO.pdf
BIO_1500000057_H3G SPA_APPROVATO.pdf
BIO_1500000060_RITELLA BENEDETTA_APPROVATO.pdf
BIO_1500000061_POSTA 7_APPROVATO.pdf
BIO_1500000062_POSTASETTESAS_APPROVATO.pdf
BIO_1500000063_PIGNATELLI_APPROVATO.pdf
BIO_1500000064_DIALINE SRL_APPROVATO.pdf
BIO_1500000065_L2 SRL SRL_APPROVATO.pdf
BIO_1500000066_FARMACIA TREROTOLI_APPROVATO.pdf
BIO_1500000067_FARMACIA TREROTOLI_APPROVATO.pdf
BIO_1500000068_BIOGROUP_APPROVATO.pdf
BIO_1500000069_VITO RINALDI_APPROVATO.pdf
BIO_1500000070_EUROCOMPUTERS_APPROVATO.pdf
BIO_1500000071_SERVIZI DIAGNOSTICI_APPROVATO.pdf
BIO_1500000072_SERVIZI DIAGNOSTICI_APPROVATO.pdf
BIO_1500000073_SERVIZI DIAGNOSTICI_APPROVATO.pdf

3 个答案:

答案 0 :(得分:2)

您对结果数组和os.walk(files)使用相同的名称。以下是具有更正变量名称的代码:

import os
import fnmatch

files_result = []
for root, dirnames, files in os.walk(r'C:\PATH\TESTT'):
    for f in fnmatch.filter(files, '*0000*.pdf'):
        print(f)
        files_result.append(os.path.join(root, f))

#sorted_files = sorted(files, key=lambda x: x.split('-')[1])
sorted_files = sorted(files, key=lambda x: x.replace("_", "-").split('-')[1])  # as Byte Commander suggested
print(sorted_files)

正如Byte Commander建议的那样。用下划线替换

答案 1 :(得分:1)

我认为此错误发生在名称部分未由-分隔但_的文件上。

因此,在拆分之前,只需用减号字符替换所有下划线:

sorted_files = sorted(files, key=lambda x: x.replace("_", "-").split('-')[1])

字符串转换也是不必要的,因为您已经从字符串列表中选择了一个元素,因此它不能是其他任何内容。

<强>更新

要排除不包含两个分隔符字符的文件名(&#34; - &#34;或&#34; _&#34;),我建议使用以下过滤后的生成器表达式:

sorted_files = sorted( f.replace("_", "-").split("-")[2] 
                       for f in files if (f.count("-")+f.count("_") == 2) )

答案 2 :(得分:1)

下面的代码应正确排序完整的文件名,假设您更新的代码(基于salomonderossi和Byte Commander的答案中的代码)确实显示了正确的常用文件名部分。

请注意,此代码未经测试。如果您在问题中提供了一小组典型的文件名,那么我们写答案会更容易。这不仅有助于我们更好地理解任务,而且还可以更轻松地测试我们的代码。

import os
import fnmatch

files_result = []
for root, dirnames, files in os.walk(r'C:\PATH\TESTT'):
    for fn in fnmatch.filter(files, '*0000*.pdf'):
        if fn.count("-") + fn.count("_") == 2:
            print(fn)
            files_result.append(os.path.join(root, fn))

files_result.sort(key=lambda fn: fn.replace("_", "-").split("-")[2])
print("\nSorted")
for fn in files_result:
    print(fn)

如果你改变了

if fn.count("-") + fn.count("_") == 2:

if fn.count("-") + fn.count("_") >= 2:

然后它将处理具有2个或更多分隔符的名称。

如果您还需要将单个空间视为等同于-_的分隔符,您可以这样做:

import os
import fnmatch

files_result = []
for root, dirnames, files in os.walk(r'C:\PATH\TESTT'):
    for fn in fnmatch.filter(files, '*0000*.pdf'):
        if fn.count("-") + fn.count("_") + fn.count(" ") >= 2:
            print(fn)
            files_result.append(os.path.join(root, fn))

files_result.sort(key=lambda fn: fn.replace("_", "-").replace(" ", "-").split("-")[2])
print("\nsorted")
for fn in files_result:
    print(fn)