清理嵌套列表

时间:2011-07-20 13:12:31

标签: python nested-lists

我有一个嵌套列表的混乱,看起来像这样,只是更长:

fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]

最终我想要一些看起来像这样的东西:

neat_fruit = [['watermelon',0,1.0], ['apple',0,1.0], ['pineapple',0,1.0], ['strawberry, banana',0,1.0], ['peach plum pear',0,1.0], ['orange, grape',0,1.0]]

但是我不知道如何处理引号中的双引号以及如何从数字中分割水果,特别是用逗号分隔一些水果。我尝试了很多东西,但是一切似乎都让它变得更加混乱。任何建议将不胜感激。

3 个答案:

答案 0 :(得分:6)

使用csv模块(在标准库中)以名称中的逗号处理双引号水果:

import csv
import io

fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]

# flatten the list of lists into a string:
data='\n'.join(item[0].strip() for item in fruit_mess)    
reader=csv.reader(io.BytesIO(data))
neat_fruit=[[fruit,int(num1),float(num2)] for fruit,num1,num2 in reader]

print(neat_fruit)    
# [['watermelon', 0, 1.0], ['apple', 0, 1.0], ['pineapple', 0, 1.0], ['strawberry, banana', 0, 1.0], ['peach plum pear', 0, 1.0], ['orange, grape', 0, 1.0]]

答案 1 :(得分:1)

一个更简单的解决方案:

fruit_mess = [['watermelon,0,1.0\n'], ['apple,0,1.0\n'], ['"pineapple",0,1.0\n'], ['"strawberry, banana",0,1.0\n'], ['peach plum pear,0,1.0\n'], ['"orange, grape",0,1.0\n']]
for i,x in enumerate(fruit_mess):
    data = x[0].rstrip('\n').rsplit(',', 2)
    fruit_mess[i] = [data[0], int(data[1]), float(data[2])]

答案 2 :(得分:0)

基于正则表达式的解决方案:

>>> import re
>>> regex = re.compile(r'("[^"]*"|[^,]*),(\d+),([\d.]+)')
>>> neat_fruit = []
>>> for item in fruit_mess:
...     match = regex.match(item[0])
...     result = [match.group(1).strip('"'), int(match.group(2)), float(match.group(3))]
...     neat_fruit.append(result)
...
>>> neat_fruit
[['watermelon', 0, 1.0], ['apple', 0, 1.0], ['pineapple', 0, 1.0], ['strawberry,
 banana', 0, 1.0], ['peach plum pear', 0, 1.0], ['orange, grape', 0, 1.0]]