Question

我尝试从每行文件中提取第一个字符，第二个数字和第三个字符，并存储到名为FirstChar，SecondNum，ThirdChar的三个变量中。

输入文件（MultiPointMutation.txt）：

P1T,C11F,E13T
L7A
E2W

预期产出：

FirstChar="PCELE"
SecondNum="1 11 13 7 2"
ThirdChar="TFTAW"

我的代码：

 import re 
 import itertools
 ns=map(lambda x:x.strip(),open('MultiplePointMutation.txt','r').readlines())#reading  file
 for line in ns:
         second="".join(re.findall(r'\d+',line))#extract second position numbers
         print second # print second nums
         char="".join(re.findall(r'[a-zA-Z]',line))#Extract all characters
         c=str(char.rstrip())
         First=0
         Third=1
         for index in range(len(c)):
                 if index==First:
                         FC=c[index]#here i got all first characters
                         print FC
                         First=First+2
                 if index==Third:
                         TC=c[index]
                         print TC
                         Third=Third+2#here i got all third characters

输出：在这里，我将FirstCharacter和ThirdCharacter完全正确

FirstChar:
          P
          C
          E
          L
          E
ThirdChar:
          T
          F
          T
          A
          W

但问题在于获得SecondNum：

我想提取数字如下：

注意：在这里，我不想逐个打印。我希望逐个读取这个SecondNum变量值以供后者使用。

Answer 1

对于secondNum，您只需修改该行：

second="".join(re.findall(r'\d+',line))#extract second position numbers

到

second="\n".join(re.findall(r'\d+',line))#extract second position numbers

但我认为你的第一个和第三个字符不能正常工作。从您想要接收的第一个输出中，您应该具有以下内容：

 import re

 x= """P1T,C11F,E13T
 L7A
 E2W"""

 secondNum = []
 firstChar = []
 thirdChar = []
 for line in x.split('\n'):

      [secondNum.append(a) for a in re.findall('\d+',line)]

      [firstChar.append(a) for a in re.findall('(?:^|,)([a-zA-Z])',line)]
      # this is an inline for loop which takes each element returned from re.findall  
      # and appends it to the firstChar Array
      # the regex searchs for the start of the string (^) or a comma(,) and this is a 
      # non capturing group (starting with (?:  meaning that the result of this group 
      # is not considered for the returned result and finally capture 1 character 
      # [a-zA-Z] behind the comma or the start which should be the first character

      [thirdChar.append(a) for a in re.findall('(?:^\w\d+|,\w\d+)([a-zA-Z])',line)
      # the third char works quite similar, but the non capturing group searchs for a 
      # comma or start of the string again followed by 1 char and at least one number 
      # (\d+) after this number there should be the third character which is in the 
      # captured group again

 print "firstChar=\""+str(firstChar)+"\""
 print "secondNum=\""+str(secondNum)+"\""
 print "thirdChar=\""+str(thirdChar)+"\""

但你的第三个角色是L7A（你想要A的数字）的第三个角色，但它也是P1TQ的第四个角色（你想要Q的地方）

如何从文件的每一行中提取字符和数字？

1 个答案: