Question

我正在尝试在新线上搜索染色体编号。以下是相关代码：

chrp = re.compile(r"^chr[^\t]+", re.MULTILINE)
for v in vcfs:
    vcffile = open(v, "r")
    vcf = vcffile.read()
    last_i = 0
    while chrp.search(vcf, last_i) is not None:
        find = chrp.search(vcf, last_i).group()  #next chrom
        print find
        last_i = vcf.index(find, last_i)  #index of chrom
        print vcf[last_i:10 + last_i]

然而，打印出来：

chr1
chr19/snps

问题是：

1）＆＃34; chr19 / snps ...＆＃34;不在新行上，它在斜线后面的行中间

2）即使是新行，正则表达式只匹配＆＃34; chr1＆＃34;它应该匹配＆＃34; chr19 / sn ....＆＃34;直到下一个标签

以下是它发现这一点的片段：

4186561/variants/chr19/snps.g

以下是我希望它找到的示例：

行chr19中的行chr19 18272190或chrX

chrX 13758375

我尝试过使用https://pythex.org/并且在那里工作正常。

Python Regex没有正确行事

0 个答案: