仅检索与模式

时间:2015-12-14 17:12:55

标签: python regex

我的目录包含一堆文件

chr2.fa      chr2.fa.ann  chr2.fa.fai  chr2.fa.sa
chr2.fa.amb  chr2.fa.bwt  chr2.fa.pac

我要搜索的目录位于dir_path,我希望该函数返回reference_name只是chr2.fa,路径为

我尝试使用r.match('.*\.fa$', filename)

但是这没用。任何解决此问题的建议都会非常有用。

def searchforfile(dir_path):
    for files in os.listdir(dir_path):
        fileName,fileExtension = os.path.splitext(files)
        if fileExtension=='.fa':
            print 'This file is fa file %s' %files
            reference_name = dir_path + '/' + files
            return reference_name[0]
        elif  fileExtension=='.fasta':
            print 'This file is fasta file %s' %files
            reference_name = dir_path + '/' + files
            return reference_name[0]
        else:
            print 'Format is not valid'

我使用此方法得到的结果是:

    index file /.fai not found, generating...

    terminate called after throwing an instance of 'std::out_of_range'

      what():  vector::_M_range_check

    Format is not valid

    Format is not valid

    Format is not valid

    Format is not valid

This file is fa file chr2.fa

3 个答案:

答案 0 :(得分:1)

尝试使用匹配对象的end函数检查.fa之后是否还有其他内容。

import re
import os

def searchforfile(dir_path, pattern='.*\.fa'):
    r = re.compile(pattern)
    for f in os.listdir(dir_path):
        m = r.match(f)
        if m and m.end() == len(f):
            print 'This file is a fa file: %s'%f
        elif m:
            print 'This file contains more text after fa: %s'%f
        else:
            print 'This file does not contain the fa extension: %s'%f

答案 1 :(得分:1)

您的函数当前正在返回var dataset = []; var categories = ["Category_1", "Category_2"]; var myArray = []; categories.forEach(function (category, index) { for (var n = 1; n < 4; n++) { var d; d = {}; d.value = (index+1) * n; console.log("index: " + index); console.log("n: " + n); console.log("d.value: " + d.value); dataset[n] = d; } myArray.push(dataset); }); console.log(JSON.stringify(myArray));,而不是整个文件名和路径。将您的退货声明更改为:

/

<强>为什么吗 return reference_name 是一个字符串。如果你reference_name,那么你只是返回字符串的第一个元素。例如,

return reference_name[0]

答案 2 :(得分:-3)

我修改了我之前的代码,并且还有效。

def searchforfile(dir_path):
    for filename in os.listdir(dir_path):
        logfile.write(filename+"\n")
        if filename.endswith('fa'):
            reference_name = dir_path + '/' + filename,'r'
            return reference_name[0]
        elif filename.endswith('fasta'):
            reference_name = dir_path + '/' + filename,'r'
            return reference_name[0]
        else:
            print ("Reference file was not found")