我正在拉一些licensure data并将其放入列表中。
rank = ['\r\n\t\t', 'RANK2', 'Rank II', '07', '-', '01', '-', '2016', u'\xa0', '06', '-', '30', '-', '2021', u'\xa0', '\r\n\t']
cert = ['\r\n\t\t', 'KEL', 'Professional Certificate For Teaching In Elementary School, Primary Through Grade 5', '07', '-', '01', '-', '2016', u'\xa0', '06', '-', '30', '-', '2021', u'\xa0', '\r\n\t']
我想从列表中删除unicode字符和非ascii字符,最终让我的列表看起来像这样:
rank = ['RANK2', 'Rank II', '07-01-2016', '06-30-2021']
cert = ['KEL', 'Professional Certificate For Teaching In Elementary School, Primary Through Grade 5', '07-01-2016', '06-30-2021']
我已经查看了remove escape sequences from lists,remove unicode,remove non-ascii和一些others的其他一些问题,但我无法让它们适应我的情况。
有些人接近但没有雪茄:
[word for word in cert if word.isalnum()]
>>> ['KEL', '07', '01', '2016', '06', '30', '2021']
def recursive_map(lst, fn):
return [recursive_map(x, fn) if isinstance(x, list) else fn(x) for x in lst]
recursive_map(rank, lambda x: x.encode("ascii", "ignore"))
>>>['\r\n\t\t', 'RANK2', 'Rank II', '07', '-', '01', '-', '2016', '', '06', '-', '30', '-', '2021', '', '\r\n\t']
此刻我陷入困境......任何人都有任何想法?
答案 0 :(得分:1)
这里有一些快速的东西:
rank = ['\r\n\t\t', 'RANK2', 'Rank II', '07', '-', '01', '-', '2016', u'\xa0', '06', '-', '30', '-', '2021', u'\xa0', '\r\n\t']
cert = ['\r\n\t\t', 'KEL', 'Professional Certificate For Teaching In Elementary School, Primary Through Grade 5', '07', '-', '01', '-', '2016', u'\xa0', '06', '-', '30', '-', '2021', u'\xa0', '\r\n\t']
def clean(L):
'''Removes non-printable characters and filters result for empty strings.
'''
cleaned = [scrubbed(x) for x in L if scrubbed(x)]
# I use a character not in the ASCII range to rejoin the hyphenated dates.
return '\xa0'.join(cleaned).replace('\xa0-\xa0','-').split('\xa0')
def scrubbed(s):
'''Removed control and non-ASCII characters.
'''
return ''.join([n for n in s if 32 <= ord(n) <= 127])
print(clean(rank))
print(clean(cert))
输出:
['RANK2', 'Rank II', '07-01-2016', '06-30-2021']
['KEL', 'Professional Certificate For Teaching In Elementary School, Primary Through Grade 5', '07-01-2016', '06-30-2021']