使用至少n个空格分割字符串

时间:2019-07-20 10:13:44

标签: python python-3.x string list split

我有以下数据,无法更改:

data = """
-5,-2   -52.565           
-5,-1   -48.751           
-5, 0   -47.498           
-5, 1   -48.751          - 
-5, 2   -52.565          
"""

我想将这些列分成两个列表,即:

list1 = ['-5,-2','-5,-1','-5, 0','-5, 1','-5, 2']
list2 = ['-52.565,'-48.751','-47.498','-48.751','-52.565']

现在,我有兴趣正确地分割每一行:

lines = [l for l in s.splitlines()]

print(lines[2].split())
print(lines[3].split())
  

['-5,-1','-48.751']

     

['-5,','0','-47.498]

您会看到第[3]行未正确分割,因为'-5'和'0'之间有空格。为了解决这个问题,我尝试了以下方法(基于python split a string with at least 2 whitespaces):

import re
print(re.split(r'\s{2,}', lines[3]))

第一个列条目“ -5,0”成功,但同时在末尾添加了一个空列表条目:

  

['-5,0','-47.498','']

我该如何解决?也许有更好的拆分方法?

编辑:

如果我使用

print(re.split(r'\s{2,}', lines[3],  maxsplit = 1))

我得到:

  

['-5, 0', '-47.498 ']

5 个答案:

答案 0 :(得分:4)

使用re将行拆分为列,然后使用zip函数将列拆分为两组:

import re
data = """
-5,-2   -52.565           
-5,-1   -48.751           
-5, 0   -47.498           
-5, 1   -48.751          - 
-5, 2   -52.565          
"""
columns = [re.split('\s{2,}', line.strip()) for line in data.splitlines() if line.strip()]
print(columns)
first, second = map(list, zip(*columns))
print(first)
print(second)

输出:

[['-5,-2', '-52.565'], ['-5,-1', '-48.751'], ['-5, 0', '-47.498'], ['-5, 1', '-48.751', '-'], ['-5, 2', '-52.565']]
['-5,-2', '-5,-1', '-5, 0', '-5, 1', '-5, 2']
['-52.565', '-48.751', '-47.498', '-48.751', '-52.565']

答案 1 :(得分:1)

尝试在分割之前剥离线条:

print(re.split(r'\s{2,}', lines[3].strip(),  maxsplit = 1)) 
#or 
print(re.split(r'\s{2,}', lines[3].strip()))

答案 2 :(得分:1)

尝试一下

data = """
    -5,-2   -52.565           
    -5,-1   -48.751           
    -5, 0   -47.498           
    -5, 1   -48.751          - 
    -5, 2   -52.565          
"""
lines = [l.strip() for l in data.splitlines()]
list1 = []
list2 = []
for line in lines:
    if not line:
        continue
    columns = line.split(' '*3)
    list1.append(columns[0])
    list2.append(columns[1])
print(list1)
print(list2)

答案 3 :(得分:1)

如果您要使用非正则表达式,过于复杂且冗长得多,则可以执行以下操作:

data = """
-5,-2   -52.565           
-5,-1   -48.751           
-5, 0   -47.498           
-5, 1   -48.751          - 
-5, 2   -52.565          
"""

jumbled_fields = data.split("\n")

divided = list()
for n in range(len(jumbled_fields)):
    for split_field in jumbled_fields[n].split("   "):
        if split_field != "" and split_field[0] != " ":
            divided.append(split_field)

first = list()
second = list()
for n in range(len(divided)):
    if n % 2 == 0:
        first.append(divided[n])
    else:
        second.append(divided[n])
print(first)  # ['-5,-2', '-5,-1', '-5, 0', '-5, 1', '-5, 2']
print(second)  # ['-52.565', '-48.751', '-47.498', '-48.751', '-52.565']

答案 4 :(得分:0)

import re

data = """
-5,-2   -52.565           
-5,-1   -48.751           
-5, 0   -47.498           
-5, 1   -48.751          - 
-5, 2   -52.565          
"""

patternA = re.compile('(-\d+,[\s|-]\d+)')
matches = re.findall(patternA, data)
listA = []

for elem in matches:
    listA.append(elem)


patternB = re.compile('\s(-\d+\.\d+)')
matches = re.findall(patternB, data)
listB = []

for elem in matches:
    listB.append(elem)

print(listA)
print(listB)

正则表达式的解释(抱歉,我试图在手机上输入奇数格式)

In re.compile:
\d+   -   Number with one or more digits
\s     -   Matches single whitespace
[\s|-]  -  Matches whitespace or -
( )      -  Captures group, this part is returned by the findall