我尝试从每行文件中提取第一个字符,第二个数字和第三个字符,并存储到名为FirstChar,SecondNum,ThirdChar的三个变量中。
输入文件(MultiPointMutation.txt):
P1T,C11F,E13T
L7A
E2W
预期产出:
FirstChar="PCELE"
SecondNum="1 11 13 7 2"
ThirdChar="TFTAW"
我的代码:
import re
import itertools
ns=map(lambda x:x.strip(),open('MultiplePointMutation.txt','r').readlines())#reading file
for line in ns:
second="".join(re.findall(r'\d+',line))#extract second position numbers
print second # print second nums
char="".join(re.findall(r'[a-zA-Z]',line))#Extract all characters
c=str(char.rstrip())
First=0
Third=1
for index in range(len(c)):
if index==First:
FC=c[index]#here i got all first characters
print FC
First=First+2
if index==Third:
TC=c[index]
print TC
Third=Third+2#here i got all third characters
输出: 在这里,我将FirstCharacter和ThirdCharacter完全正确
FirstChar:
P
C
E
L
E
ThirdChar:
T
F
T
A
W
但问题在于获得SecondNum:
SecondNum:
11113
7
2
我想提取数字如下:
1
11
13
7
2
注意:在这里,我不想逐个打印。我希望逐个读取这个SecondNum变量值以供后者使用。
答案 0 :(得分:0)
对于secondNum,您只需修改该行:
second="".join(re.findall(r'\d+',line))#extract second position numbers
到
second="\n".join(re.findall(r'\d+',line))#extract second position numbers
但我认为你的第一个和第三个字符不能正常工作。从您想要接收的第一个输出中,您应该具有以下内容:
import re
x= """P1T,C11F,E13T
L7A
E2W"""
secondNum = []
firstChar = []
thirdChar = []
for line in x.split('\n'):
[secondNum.append(a) for a in re.findall('\d+',line)]
[firstChar.append(a) for a in re.findall('(?:^|,)([a-zA-Z])',line)]
# this is an inline for loop which takes each element returned from re.findall
# and appends it to the firstChar Array
# the regex searchs for the start of the string (^) or a comma(,) and this is a
# non capturing group (starting with (?: meaning that the result of this group
# is not considered for the returned result and finally capture 1 character
# [a-zA-Z] behind the comma or the start which should be the first character
[thirdChar.append(a) for a in re.findall('(?:^\w\d+|,\w\d+)([a-zA-Z])',line)
# the third char works quite similar, but the non capturing group searchs for a
# comma or start of the string again followed by 1 char and at least one number
# (\d+) after this number there should be the third character which is in the
# captured group again
print "firstChar=\""+str(firstChar)+"\""
print "secondNum=\""+str(secondNum)+"\""
print "thirdChar=\""+str(thirdChar)+"\""
但你的第三个角色是L7A(你想要A的数字)的第三个角色,但它也是P1TQ的第四个角色(你想要Q的地方)