我正在尝试使用python重命名目录中的一组文件。这些文件当前标有池编号,AR编号和S编号(例如Pool1_AR001_S13__fw_paired.fastq.gz。)每个文件都指特定的工厂序列名称。我想通过删除' Pool_AR_S'来重命名这些文件。并用序列名称替换它,例如' Lbienne_dor5_GS1',在留下后缀(例如fw_paired.fastq.gz,rv_unpaired.fastq.gz)的同时,我正在尝试将文件读入字典,但我对下一步该怎么做感到困惑。我有一个.txt文件,其中包含以下格式的必要信息:
Pool1_AR010_S17 - Lbienne_lla10_GS2
Pool1_AR011_S18 - Lbienne_lla10_GS3
Pool1_AR020_S19 - Lcampanulatum_borau4_T_GS1
我到目前为止的代码是:
from optparse import OptionParser
import csv
import os
parser = OptionParser()
parser.add_option("-w", "--wanted", dest="w")
parser.add_option("-t","--trimmed", dest="t")
parser.add_option("-d", "--directory", dest="working_dir", default="./")
(options, args) = parser.parse_args()
wanted_file = options.w
trimmomatic_output = options.t
#Read the wanted file and create a dictionary of index vs species identity
with open(wanted_file, 'rb') as species_sequence:
species_list = list(csv.DictReader(species_sequence, delimiter='-'))
print species_list
#Rename the Trimmomatic Output files according to the dictionary
for trimmed_sequence in os.listdir(trimmomatic_output):
os.rename(os.path.join(trimmomatic_output, trimmed_sequence),
os.path.join(trimmomatic_output, trimmed_sequence.replace(species_list[0], species_list[1]))
请你帮我换一半。我对python和堆栈溢出很新,所以如果之前已经问过这个问题,或者我在错误的地方问过这个问题,我很抱歉。
答案 0 :(得分:1)
第一项工作是摆脱所有这些模块。他们可能很好,但对于像你这样的工作,他们不太可能让事情变得更容易。
在这些.gz文件所在的目录中创建.py文件。
import os
files = os.listdir() #files is of list type
#'txt_file' is the path of your .txt file containing those conversions
dic=parse_txt(txt_file) #omitted the body of parse_txt() func.Should return a dictionary by parsing that .txt file
for f in files:
pre,suf=f.split('__') #"Pool1_AR001_S13__(1)fw_paired.fastq.gz"
#(1)=assuming prefix and suffix are divided by double underscore
pre = dic[pre]
os.rename(f,pre+'__'+suf)
如果您需要有关parse_txt()函数的帮助,请与我们联系。
答案 1 :(得分:0)
这是我用Python 2测试的解决方案。如果您使用自己的逻辑而不是get_mappings函数,那就很好。请参阅代码中的注释以获得解释。
import os
def get_mappings():
mappings_dict = {}
with(open('wanted_file.txt', 'r')) as f:
for line in f:
# if you have Pool1_AR010_S17 - Lbienne_lla10_GS2
# it becomes a list i.e ['Pool1_AR010_S17 ', ' Lbienne_lla10_GS2']
#note that there may be spaces before/after the names as shown above
text = line.split('-')
#trim is used to remove spaces in the names
mappings_dict[text[0].strip()] = text[1].strip()
return mappings_dict
#PROGRAM EXECUTION STARTS FROM HERE
#assuming all files are in the current directory
# if not replace the dot(.) with the path of the directory where you have the files
files = os.listdir('.')
wanted_names_dict = get_mappings()
for filename in files:
try:
#prefix='Pool1_AR010_S17', suffix='fw_paired.fastq.gz'
prefix, suffix = filename.split('__')
new_filename = wanted_names_dict[prefix] + '__' + suffix
os.rename(filename, new_filename)
print 'renamed', filename, 'to', new_filename
except:
print 'No new name defined for file:' + filename