Question

正在进行的编辑有时会在稍后重新提交正在进行编辑有时会在稍后重新提交正在编辑的内容有时会在稍后重新提交

Answer 1

您的问题可能已被其他人回答，但由于您处理的是非常大的文件，因此您应该使用generator方法逐行输入。

有关详细信息，请参阅此问题。 Lazy Method for Reading Big File in Python?

Answer 2

在输入项目之前，您可以使用.strip()删除项目周围的任何空格。这样可以更加清晰，并解决任何缩进问题。

例如：

b=a.split('chr').strip() # No white space either side now
c=b[1].split(':').strip() # No white space
d=c[1].split('..').strip()
e=b[0]+'\t'+c[0]+'\t'+d[0]+'\t'+d[1]+'\t'+'\n'
rfh.write(e)

这样做会删除所有现有空格，只允许\t存在。

Answer 3

为什么不使用regex拆分？

import re
with open(<infile>) as inf:
    for annot_info in f:
        split_array = re.split(r'(\W+)(chr\w+):(\d+)..(\d+)', annot_info)
        #do your sql processing here.
        #write out to a file if you wish to.

会给你[''，'+'，'chr6'，'140302505'，'140302604'，'']。您可以在当前的mysql方法中使用相同的方法。

PS：我使用的正则表达式模式会在开头和结尾给你空字符串。修改正则表达式或更改sql插入以在推送时排除数组的第一个和最后一个元素。

Answer 4

这应该有效：

import re #Regex may be the easiest way to split that line

with open(infile) as in_f, open(outfile,'w') as out_f:
    f = (i for i in in_f if i.rstrip()) #iterate over non empty lines
    for line in f:
        _, k = line.split('\t', 1)
        x = re.findall(r'^1..100\t([+-])chr(\d+):(\d+)\.\.(\d+).+$',k)
        if not x:
            continue
        out_f.write(' '.join(x[0]) + '\n')

拆分后与Python不一致的缩进

4 个答案: