I followed this answer's (Python: Split by 1 or more occurrences of a delimiter) directions to a T and it keeps failing so I'm wondering if it's something simple I'm missing or if I need a new method to solve this.
I have the following .eml file:
My goal is to eventually parse out all the fish stocks and their corresponding weight amounts, but for a test I'm just using the following code:
with open(file_path) as f:
for line in f:
if ("Haddock" in line):
#fish, remainder = re.split(" +", line)
fish, remainder = line.split()
print(line.lower().strip())
print("fish:", fish)
print("remainder:", remainder)
and it fails on the line fish, remainder = line.split()
with the error
ValueError: too many values to unpack (expected 2)
which tells me that Python is failing because it is trying to split on too many spaces, right? Or am I misunderstanding this? I want to get two values back from this process: the name of the fish (a string containing all the text before the many spaces) and the quantity (integer from the right side of the input line).
Any help would be appreciated.
答案 0 :(得分:2)
您可以使用以下正则表达式进行拆分
fish, remainder = re.split(r'(?<=\w)\s+(?=\d)',line.strip())
it will split and give `['GB Haddock West', '22572']`
答案 1 :(得分:1)
我希望
fish
为GB Haddock West
而remainder
为22572
你可以做一些事情:
s = line.split()
fish, remainder = " ".join(s[:-1]), s[-1]
您可以使用split()
而不是使用rindex()
,并找到最后一个空格并在那里分开。
at = line.rindex(" ")
fish, remainder = line[:at], line[at+1:]
两者都会输出:
print(fish) # GB Haddock West
print(remainder) # 22572
答案 2 :(得分:1)
是的......您可以拆分多个空格。但是,除非您可以指定空格数,否则您将在中间获得额外的空白字段,就像您现在正在获取的那样。例如:
in_stuff = [
"GB Haddock West 22572",
"GB Cod West 7207",
"GB Haddock East 3776"
]
for line in in_stuff:
print line.split(" ")
输出:
['GB Haddock West', '', '', ' 22572']
['GB Cod West', '', '', '', '', '7207']
['GB Haddock East', '', '', ' 3776']
但是,一个简单的改变就会得到你想要的东西:从中选出第一个和最后一个字段:
for line in in_stuff:
fields = line.split(" ")
print fields[0], int(fields[-1])
输出:
GB Haddock West 22572
GB Cod West 7207
GB Haddock East 3776
这会解决你的问题吗?
答案 3 :(得分:1)
以@ Vallentin的答案为基础,但使用Python 3的扩展解包功能:
In [8]: line = "GB Haddock West 22572"
In [9]: *fish, remainder = line.split()
In [10]: print(" ".join(fish))
GB Haddock West
In [11]: print(int(remainder))
22572