Python -unicode- translate表不会删除字符

时间:2015-03-25 13:21:50

标签: python python-2.7 unicode

我有一个列表,该列表包含我要删除的unicode元素')'和\ n以及列表中的空格。基本上创造一个"清洁"列表副本。

我尝试将此SO解决方案Remove specific characters from a string in pythonpython docs strings引用为2.7。

我使用bs4导入创建我的列表,以最小化大小。

def isNotBlank(myString):
    if myString and myString.strip():
        return True
    return False

names = soup.find_all('span', class_="TextLarge")
bucket_list = []

for name in names:
    for item in name.contents:
        for value in item.split('('):
            if isNotBlank(value):
                bucket_list.append(value)

translation_table = dict.fromkeys(map(ord, ')(@\\n#$'), None)
[x.translate(translation_table) for x in bucket_list ]

所以print(names)返回

[<span class="TextLarge">Mossfun (11) (Rtg:103)</span>, <span class="TextLarge">58.0</span>, <span class="TextLarge scratched">Atmospherical (8)
      (Rtg:99)</span>, <span class="TextLarge">56.5</span>, <span class="TextLarge scratched">Chloe In Paris (7)
      (Rtg:97)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge">Bound For Earth (5) (Rtg:92)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge">Fine Bubbles (4) (Rtg:91)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge">Brook Road (9) (Rtg:90)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge">Shamalia (10) (Rtg:89)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge scratched">Tawteen (6) (Rtg:88)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge">Ygritte (2) (Rtg:77)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge">Tahni Dancer (1) (Rtg:76)</span>, <span class="TextLarge">55.5</span>, <span class="TextLarge">All Salsa (3) (Rtg:72)</span>, <span class="TextLarge">55.5</span>]

和bucket_list返回

[u'Mossfun ', u'11) ', u'Rtg:103)', u'58.0', u'Atmospherical ', u'8) \n      ', u'Rtg:99)', u'56.5', u'Chloe In Paris ', u'7) \n      ', u'Rtg:97)', u'55.5', u'Bound For Earth ', u'5) ', u'Rtg:92)', u'55.5', u'Fine Bubbles ', u'4) ', u'Rtg:91)', u'55.5', u'Brook Road ', u'9) ', u'Rtg:90)', u'55.5', u'Shamalia ', u'10) ', u'Rtg:89)', u'55.5', u'Tawteen ', u'6) ', u'Rtg:88)', u'55.5', u'Ygritte ', u'2) ', u'Rtg:77)', u'55.5', u'Tahni Dancer ', u'1) ', u'Rtg:76)', u'55.5', u'All Salsa ', u'3) ', u'Rtg:72)', u'55.5']

希望

[['Mossfun', 11, 103, 58.0],[Atmospherical, 8, 99, 56.5]]

目前,它传递所有字符的翻译

1 个答案:

答案 0 :(得分:1)

你忽略了这里的返回值;你翻译得很好(尽管实际上没有处理换行符):

>>> bucket_list = [u'Mossfun ', u'11) ', u'Rtg:103)', u'58.0', u'Atmospherical ', u'8) \n      ', u'Rtg:99)', u'56.5', u'Chloe In Paris ', u'7) \n      ', u'Rtg:97)', u'55.5', u'Bound For Earth ', u'5) ', u'Rtg:92)', u'55.5', u'Fine Bubbles ', u'4) ', u'Rtg:91)', u'55.5', u'Brook Road ', u'9) ', u'Rtg:90)', u'55.5', u'Shamalia ', u'10) ', u'Rtg:89)', u'55.5', u'Tawteen ', u'6) ', u'Rtg:88)', u'55.5', u'Ygritte ', u'2) ', u'Rtg:77)', u'55.5', u'Tahni Dancer ', u'1) ', u'Rtg:76)', u'55.5', u'All Salsa ', u'3) ', u'Rtg:72)', u'55.5']
>>> translation_table = dict.fromkeys(map(ord, ')(@\\n#$'), None)
>>> [x.translate(translation_table) for x in bucket_list ]
['Mossfu ', '11 ', 'Rtg:103', '58.0', 'Atmospherical ', '8 \n      ', 'Rtg:99', '56.5', 'Chloe I Paris ', '7 \n      ', 'Rtg:97', '55.5', 'Boud For Earth ', '5 ', 'Rtg:92', '55.5', 'Fie Bubbles ', '4 ', 'Rtg:91', '55.5', 'Brook Road ', '9 ', 'Rtg:90', '55.5', 'Shamalia ', '10 ', 'Rtg:89', '55.5', 'Tawtee ', '6 ', 'Rtg:88', '55.5', 'Ygritte ', '2 ', 'Rtg:77', '55.5', 'Tahi Dacer ', '1 ', 'Rtg:76', '55.5', 'All Salsa ', '3 ', 'Rtg:72', '55.5']

但结果存储在新列表中;原始字符串就地更改,因为它们是不可变的。将结果分配回bucket_list,并使用\n而不是\\n修复换行问题:

translation_table = dict.fromkeys(map(ord, ')(@\n#$'), None)
bucket_list = [x.translate(translation_table) for x in bucket_list ]

你可能想要投入str.strip()来摆脱剩余的空白;结果将是:

>>> [x.translate(translation_table).strip() for x in bucket_list ]
['Mossfun', '11', 'Rtg:103', '58.0', 'Atmospherical', '8', 'Rtg:99', '56.5', 'Chloe In Paris', '7', 'Rtg:97', '55.5', 'Bound For Earth', '5', 'Rtg:92', '55.5', 'Fine Bubbles', '4', 'Rtg:91', '55.5', 'Brook Road', '9', 'Rtg:90', '55.5', 'Shamalia', '10', 'Rtg:89', '55.5', 'Tawteen', '6', 'Rtg:88', '55.5', 'Ygritte', '2', 'Rtg:77', '55.5', 'Tahni Dancer', '1', 'Rtg:76', '55.5', 'All Salsa', '3', 'Rtg:72', '55.5']

在这种特定情况下,str.strip()也将处理新行。