Python - spilt() over many spaces

时间:2017-04-06 17:15:58

标签: python split

I followed this answer's (Python: Split by 1 or more occurrences of a delimiter) directions to a T and it keeps failing so I'm wondering if it's something simple I'm missing or if I need a new method to solve this.

I have the following .eml file:

enter image description here

My goal is to eventually parse out all the fish stocks and their corresponding weight amounts, but for a test I'm just using the following code:

with open(file_path) as f:
    for line in f:
        if ("Haddock" in line):
            #fish, remainder = re.split(" +", line)
            fish, remainder = line.split()
            print(line.lower().strip())
            print("fish:", fish)
            print("remainder:", remainder)

and it fails on the line fish, remainder = line.split() with the error

ValueError: too many values to unpack (expected 2)

which tells me that Python is failing because it is trying to split on too many spaces, right? Or am I misunderstanding this? I want to get two values back from this process: the name of the fish (a string containing all the text before the many spaces) and the quantity (integer from the right side of the input line).

Any help would be appreciated.

4 个答案:

答案 0 :(得分:2)

您可以使用以下正则表达式进行拆分

fish, remainder = re.split(r'(?<=\w)\s+(?=\d)',line.strip())

it will split and give `['GB Haddock West', '22572']`

答案 1 :(得分:1)

  

我希望fishGB Haddock Westremainder22572

你可以做一些事情:

s = line.split()
fish, remainder = " ".join(s[:-1]), s[-1]

您可以使用split()而不是使用rindex(),并找到最后一个空格并在那里分开。

at = line.rindex(" ")
fish, remainder = line[:at], line[at+1:]

两者都会输出:

print(fish) # GB Haddock West  
print(remainder) # 22572

答案 2 :(得分:1)

是的......您可以拆分多个空格。但是,除非您可以指定空格数,否则您将在中间获得额外的空白字段,就像您现在正在获取的那样。例如:

in_stuff = [
    "GB Haddock West          22572",
    "GB Cod West               7207",
    "GB Haddock East           3776"
]

for line in in_stuff:
    print line.split("   ")

输出:

['GB Haddock West', '', '', ' 22572']
['GB Cod West', '', '', '', '', '7207']
['GB Haddock East', '', '', '  3776']

但是,一个简单的改变就会得到你想要的东西:从中选出第一个和最后一个字段:

for line in in_stuff:
    fields = line.split("   ")
    print fields[0], int(fields[-1])

输出:

GB Haddock West 22572
GB Cod West 7207
GB Haddock East 3776

这会解决你的问题吗?

答案 3 :(得分:1)

以@ Vallentin的答案为基础,但使用Python 3的扩展解包功能:

In [8]: line = "GB Haddock West 22572"

In [9]: *fish, remainder = line.split()

In [10]: print(" ".join(fish))
GB Haddock West

In [11]: print(int(remainder))
22572