我有一个嵌套列表的混乱,看起来像这样,只是更长:
fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]
最终我想要一些看起来像这样的东西:
neat_fruit = [['watermelon',0,1.0], ['apple',0,1.0], ['pineapple',0,1.0], ['strawberry, banana',0,1.0], ['peach plum pear',0,1.0], ['orange, grape',0,1.0]]
但是我不知道如何处理引号中的双引号以及如何从数字中分割水果,特别是用逗号分隔一些水果。我尝试了很多东西,但是一切似乎都让它变得更加混乱。任何建议将不胜感激。
答案 0 :(得分:6)
使用csv
模块(在标准库中)以名称中的逗号处理双引号水果:
import csv
import io
fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]
# flatten the list of lists into a string:
data='\n'.join(item[0].strip() for item in fruit_mess)
reader=csv.reader(io.BytesIO(data))
neat_fruit=[[fruit,int(num1),float(num2)] for fruit,num1,num2 in reader]
print(neat_fruit)
# [['watermelon', 0, 1.0], ['apple', 0, 1.0], ['pineapple', 0, 1.0], ['strawberry, banana', 0, 1.0], ['peach plum pear', 0, 1.0], ['orange, grape', 0, 1.0]]
答案 1 :(得分:1)
一个更简单的解决方案:
fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]
for i,x in enumerate(fruit_mess):
data = x[0].rstrip('\n').rsplit(',', 2)
fruit_mess[i] = [data[0], int(data[1]), float(data[2])]
答案 2 :(得分:0)
基于正则表达式的解决方案:
>>> import re
>>> regex = re.compile(r'("[^"]*"|[^,]*),(\d+),([\d.]+)')
>>> neat_fruit = []
>>> for item in fruit_mess:
... match = regex.match(item[0])
... result = [match.group(1).strip('"'), int(match.group(2)), float(match.group(3))]
... neat_fruit.append(result)
...
>>> neat_fruit
[['watermelon', 0, 1.0], ['apple', 0, 1.0], ['pineapple', 0, 1.0], ['strawberry,
banana', 0, 1.0], ['peach plum pear', 0, 1.0], ['orange, grape', 0, 1.0]]