迭代字典中的值以替换目录中的文件名

时间:2013-06-01 02:54:42

标签: python csv os.walk

所有。我有一个csv文件,我已经安排了我发送的DNA样本ID,以便在96孔板中测序。这对于跟踪是很重要的,因为当我们从测序设备中取回板时,色谱图文件的标题很简单,例如, 13年5月3日-G-Templates_A01_Primer-G.ab1。

csv以制表符分隔,如下所示:(96个孔,12列[1-12],8行[A-H]):

1   2   3   4   5   6   7   8   9   10  11  12
A01 A02 A03 A04 A05_Grammatophyllum_scriptum_ITS1   A06_Eulophia_euglossa_ITS1  A07_Grammatophyllum_scriptum_17SE   A08_Graphorkis_lurida_X502F A09_Cymbidium_kanran_X502F  A10_Claderia_viridiflora_X502F  A11_Grammatophyllum_scriptum_X502F  A12_Eulophia_euglossa_X502F
B01 B02 B03 B04 B05_Grammatophyllum_scriptum_ITS4   B06_Eulophia_euglossa_ITS4  B07_Grammatophyllum_scriptum_1229R  B08_Graphorkis_lurida_X1599R    B09_Cymbidium_kanran_X1599R B10_Claderia_viridiflora_X1599R B11_Grammatophyllum_scriptum_X1599R B12_Eulophia_euglossa_X1599R
C01 C02 C03 C04 C05_Acriopsis_ridleyi_ITS1  C06_Cyrtopodium_polyphyllum_ITS1    C07_Cyrtopodium_polyphyllum_17SE    C08_Graphorkis_scripta_X502F    C09_Dipodium_conduplicatum_X502F    C10_Dipodium_5431_X502F C11_Cyrtopodium_polyphyllum_X502F   C12_Oeceoclades_gracillima_X502F
D01 D02 D03 D04 D05_Acriopsis_ridleyi_641R  D06_Cyrtopodium_polyphyllum_ITS4    D07_Cyrtopodium_polyphyllum_1229R   D08_Graphorkis_scripta_X1599R   D09_Dipodium_conduplicatum_X1599R   D10_Dipodium_5431_X1599R    D11_Cyrtopodium_polyphyllum_X1599R  D12_Oeceoclades_gracillima_X1599R
E01 E02 E03 E04_Dipodium_6052_ITS1  E05_Dipodium_5431_ITS1  E06_Bromheadia_finlaysoniana_ITS1   E07_Dressleria_dilecta_X502F    E08_Cyrtopodium_falciobum_X502F E09_Acriopsis_ridleyi_X502F E10_Dipodium_6052_X502F E11_Thecostele_alata_28_X502F   E12_Thecostele_alata_32_X502F
F01 F02 F03 F04_Dipodium_6052_ITS4  F05_Dipodium_5431_ITS4  F06_Bromheadia_finlaysoniana_641R   F07_Dressleria_dilecta_X1599R   F08_Cyrtopodium_falciobum_X1599R    F09_Acriopsis_ridleyi_X1599R    F10_Dipodium_6052_X1599R    F11_Thecostele_alata_28_X1599R  F12_Thecostele_alata_32_X1599R
G01 G02 G03 G04_Dipodium_6055_ITS1  G05_Dipodium_conduplicatum_ITS1 G06_Claderia_viridiflora_ITS1   G07_Ansellia_africana_X502F G08_Grammangis_ellisii_X502F    G09_Bromheadia_finlaysoniana_X502F  G10_Dipodium_6055_X502F G11_Grammatophyllum_stapeliiflorum_X502F    G12
H01 H02 H03 H04_Dipodium_6055_ITS4  H05_Dipodium_conduplicatum_ITS4 H06_Claderia_viridiflora_641R   H07_Ansellia_africana_X1599R    H08_Grammangis_ellisii_X1599R   H09_Bromheadia_finlaysoniana_X1599R H10_Dipodium_6055_X1599R    H11_Grammatophyllum_stapeliiflorum_X1599R   H12

每次手动拿回一个盘子时,我都没有花时间重命名96个文件,而是试图把这个我已经准备好的文件提前指导我加载盘子所以我没有搞砸了(在错误的井中错误的DNA),通过前缀识别位置(例如A06 ... H06),将其与目录中的文件名匹配,因为它们共享相同的单元格位置,以便脚本将迭代整个csv文件并重命名表单中的所有文件:5-3-13-G-Templates_A06_Primer-G.ab1将成为A06_Eulophia_euglossa_ITS1.ab1

我已经编写了Python脚本的一部分,但是我很难想象下一步:

import csv
data = csv.DictReader(open('Template.csv', 'rU'), delimiter='\t')
for row in data:
    values = row.values()
    values.sort()
    #Provides values by row in order from left to right

这就是我被困住的地方。现在我有这些清单,我该怎么办?对于循环?我只是想象解决方案。

我想解决方案的一部分将是以下代码,从我发现的另一个答案修改:

folder = r"/home/ryan/Desktop/MMEE/plateG" #Make sure only the .ab1 files are in this directory
import os
for root, dirs, filenames in os.walk(folder):
    for filename in filenames:
        fullpath = os.path.join(root, filename)
        filename_split = os.path.splitext(fullpath)
        filename_zero, fileext = filename_split
        os.rename(fullpath, SOMEVARIABLE + fileext)

上面的部分我用os.rename重命名文件并使用“SOMEVARIABLE”,我认为上面列表中的名称应该输入到文件名中。但是如何在那里实现目标超出了我的技能水平。或许我只是累了。

任何帮助将不胜感激。我希望这一点足够清楚,但如有必要,我可以提供澄清。干杯!

编辑添加: 旧文件名和新文件名仅共享位置ID,例如A01,B06,H12。新文件名将从csv文件中获取,以便名为5-3-13-G-Templates_F08_Primer-G.ab1的文件将从第8列中提取名称,但只能从标题中的“F08”中提取名称。行是A到H.基本上我想从RowF,Column8的位置挑选文本(虽然我目前没有行标题)并将该文本应用到文件名中,其中包含F08。我想可能有一种方法可以匹配生成的值列表中的每个子字符串A01到H12,并将每个子文本中的文本拉到他们要替换的旧文件名中,因为它们也匹配相同的子字符串A01到H12。

我想用这种方式重命名文件:(注意 - A01到D04是空白井,所以它们没有其他标签而不是ID)

5-3-13-G-Templates_E04_Primer-G.ab1 > E04_Dipodium_6052_ITS1.ab1
5-3-13-G-Templates_F04_Primer-G.ab1 > F04_Dipodium_6052_ITS4.ab1
5-3-13-G-Templates_G04_Primer-G.ab1 > G04_Dipodium_6055_ITS1.ab1
5-3-13-G-Templates_H04_Primer-G.ab1 > H04_Dipodium_6055_ITS4.ab1
5-3-13-G-Templates_A05_Primer-G.ab1 > A05_Grammatophyllum_scriptum_ITS1.ab1
5-3-13-G-Templates_B05_Primer-G.ab1 > B05_Grammatophyllum_scriptum_ITS4.ab1
...

1 个答案:

答案 0 :(得分:1)

  1. 处理CSV文件,收集所有新文件名,并将样本ID映射到新名称。

  2. 浏览目录,查找所有文件,从其基本名称中提取样本ID,然后从第1步创建的id_map中查找新名称。根据新名称重命名。

  3. import csv
    import os
    import re
    
    # First
    data = csv.DictReader(open('csv.csv', 'r'), delimiter = "\t")
    id_map = {}
    for row in data:
        for name in row.values():
            # find all sample IDs as a list in the cell, should only get 1 ID
            ids = re.findall(r'[A-H][0-9]{2}', name)
            if len(ids) != 1:
                print "Confused at " + name
            id_map[ids[0]] = name
    
    # Second
    folder = 'files/'
    for root, dirs, files in os.walk(folder):
        for filename in files:
            fullname = os.path.join(root, filename)
            basename, extension = os.path.splitext(filename)
            # find all sample IDs in the basename, should only get 1 ID
            ids = re.findall(r'[A-H][0-9]{2}', basename)
            if len(ids) != 1:
                print "Confused at " + os.path.join(root, filename)
            if ids[0] in id_map:
                new_name = id_map[ids[0]] + extension
                os.rename(fullname, os.path.join(root, new_name))
            else:
                print "New name for " + fullname + " not found"