Question

我从Python那里休息了一段时间，现在我又需要你的帮助了。）

我的数组看起来像这样：

['>lcl|NC_003078.1_gene_1 [gene=lacE] [locus_tag=SM_b21652] [location=1..1275]\n','>lcl|NC_003078.1_gene_2 [gene=lacF] [locus_tag=SM_b21653] [location=complement(22345..23337)]\n']

该数组包含更多条目，所有条目都与提供的示例类似。我想使用Regex提取每个元素的一部分。我要提取的部分是

[location.....]

我使用Regexr构建我的正则表达式我试过这个：

locationArray=[]
for entry in storageArray:
    location.Array.append((re.findall("(\[location=\d*|complement\(\d*\.\.\d*\)\]|\.\.\d*\]))",str(entry))))
print(locationArray)

在浏览器中使用Regexr时，正则表达式似乎已经解决了。

预期/期望的输出：

['[location=...]','[location=...]' etc]

实际输出：

[['cE]', '_b21625]','[location=1','..1257]'],

与输入相比，部分来自gene和locus_tag。我不明白，为什么:(我的阵列结构错了吗？它是关于我的正则表达式吗？

帮助表示赞赏！

然而，这不是我最终的期望输出。提取完所有位置后，我想以最终结果处理它们：

Start:     1 End:  1275
Start: 22345 End: 23337

由于我甚至没有提取位置部分，我已经在这里问了。

感谢您的帮助。我也很欣赏解决问题的不同方法。可能，我的方式不是最好的方法吗？

Answer 1

import re
a = ['>lcl|NC_003078.1_gene_1 [gene=lacE] [locus_tag=SM_b21652] [location=1..1275]\n','>lcl|NC_003078.1_gene_2 [gene=lacF] [locus_tag=SM_b21653] [location=complement(22345..23337)]\n']
for i in a:
    val = re.findall("location\=.*?]", i)[0]     #Find Location.
    val = re.findall("\d+", val)                 #Find start and end.
    print("Start: {0} End:  {1}".format(val[0], val[1]))

<强>输出：

Start: 1 End:  1275
Start: 22345 End:  23337

Python 3.6 - 使用re.findall从Array-Element

1 个答案: