我想删除python列表中的某些字符

时间:2019-01-14 14:07:05

标签: python parsing

这是我的清单:

[('11 August 1902\xa0(1902-08-11)Paris, France', None), 
 ('29 July 1991(1991-07-29) (aged\xa088)Paris, France', None), 
 ('\xa0France', None), ('\xa0French Army', None), ('1921-1959', None), 
 ('General de brigade', None), 
 ('Mobile Group 2Mobile Group 1Operational Group North-West', None),
 ('World War IIFirst Indochina War*Battle of Dien Bien Phu', None)]

我想从列表中删除None'\xa0'

我的朋友说我需要将其转换为字符串以删除文本并将其转换回列表。如果这是唯一的方法,我如何将列表中的每个项目彼此分开?

3 个答案:

答案 0 :(得分:3)

您不必将列表转换为字符串(这将是最糟糕的方法之一)。您可以简单地使用列表推导,例如:

>>> my_list = [
    ('11 August 1902\xa0(1902-08-11)Paris, France', None),
    ('29 July 1991(1991-07-29) (aged\xa088)Paris, France', None), 
    ('\xa0France', None), 
    ('\xa0French Army', None), 
    ('1921-1959', None), 
    ('General de brigade', None), 
    ('Mobile Group 2Mobile Group 1Operational Group North-West', None), 
    ('World War IIFirst Indochina War*Battle of Dien Bien Phu', None)]
>>> [t[0].replace('\xa0', ' ') for t in my_list]
['11 August 1902 (1902-08-11)Paris, France', '29 July 1991(1991-07-29) (aged 88)Paris, France', ' France', ' French Army', '1921-1959', 'General de brigade', 'Mobile Group 2Mobile Group 1Operational Group North-West', 'World War IIFirst Indochina War*Battle of Dien Bien Phu']

这将在每个内部元组中使用第一个元素(因此消除了第二个元素None),并将其中的任何\xa0个字符替换为一个空格(" ")。

答案 1 :(得分:0)

这里是如何实现此操作的(不好的)示例……但是,更优雅的方法是将字符串编码为ISO 8859-1(我认为这是\ xa0的来源)。

my_list = [('11 August 1902\xa0(1902-08-11)Paris, France', None), 
           ('29 July 1991(1991-07-29) (aged\xa088)Paris, France', None), 
           ('\xa0France', None),
           ('\xa0French Army', None),
           ('1921-1959', None), 
           ('General de brigade', None),
           ('Mobile Group 2Mobile Group 1Operational Group North-West', None),
           ('World War IIFirst Indochina War*Battle of Dien Bien Phu', None)]

my_new_list = []

for my_item in my_list:
    tuple_first = my_item[0]

    tuple_first = tuple_first.replace('\xa0', ' ') # I think really this should be
                                                   # encoded with the ISO 8859-1 and
                                                   # in this encoding \xa0 is a non
                                                   # breaking space... but for now
                                                   # I just replace it with a space char
    my_new_list.append(tuple_first)

这是输出(每项新行)

['11 August 1902 (1902-08-11)Paris, France',
'29 July 1991(1991-07-29) (aged 88)Paris, France',
'France',
'French Army',
'1921-1959',
'General de brigade',
'Mobile Group 2Mobile Group 1Operational Group North-West',
'World War IIFirst Indochina War*Battle of Dien Bien Phu'
]

答案 2 :(得分:0)

这是查看Selcuk提供的列表理解的另一种方法。

注意:接受Selcuk的解决方案,因为它是正确的。我刚刚发布的内容是为了展示与for循环相比,列表理解的工作原理/外观

my_list = [('11 August 1902\xa0(1902-08-11)Paris, France', None), 
('29 July 1991(1991-07-29) (aged\xa088)Paris, France', None), 
('\xa0France', None), ('\xa0French Army', None), ('1921-1959', None), 
('General de brigade', None), ('Mobile Group 2Mobile Group 1Operational Group North-West', None),
 ('World War IIFirst Indochina War*Battle of Dien Bien Phu', None)]

new_list = []
for t in my_list:
    t = t[0].replace('\xa0',' ')
    new_list.append(t)

输出:

print (new_list)
['11 August 1902 (1902-08-11)Paris, France', '29 July 1991(1991-07-29) (aged 88)Paris, France', ' France', ' French Army', '1921-1959', 'General de brigade', 'Mobile Group 2Mobile Group 1Operational Group North-West', 'World War IIFirst Indochina War*Battle of Dien Bien Phu']