我需要清除一组嵌套列表(不超过三个)。类似的例子是这样:
test = [['qte%#', 'EKO*^'], ['eoim&', ['35ni%', 'mmie']]]
我想运行以下命令:
re.sub(r'[^a-zA-Z\d\[\] ], '', test)
我知道这里的问题是我需要遍历嵌套列表,但是我在维护结构时遇到了麻烦。也许还有一种更简单的方法来解决该问题。我尝试过这种变化:
for a in test:
for b in a:
if isinstance(b, list):
for c in b:
c = re.sub(r'[^a-zA-Z\d\[\] ]', ' ', c)
clean.append(c)
else:
print(b)
b = re.sub(r'[^a-zA-Z\d\[\] ]', ' ', b)
clean.append(b)
答案 0 :(得分:1)
此脚本将按原样保留列表的结构-只需应用re.sub
函数:
test = [['qte%#', 'EKO*^'], ['eoim&', ['35ni%', 'mmie']]]
import re
def clean(lst):
if not isinstance(lst, list):
return re.sub(r'[^a-zA-Z\d\[\] ]', '', lst)
return [clean(v) for v in lst]
print( clean(test) )
打印:
[['qte', 'EKO'], ['eoim', ['35ni', 'mmie']]]
答案 1 :(得分:0)
由于您只需要将所有嵌套列表编译为一个扁平化列表,因此可以在列表上使用flatten function并对其进行正则表达式。
def flatten(lst):
flat = []
for x in lst:
if hasattr(x, '__iter__') and not isinstance(x, basestring):
flat.extend(flatten(x))
else:
flat.append(x)
return flat
clean = []
for c in flatten(test):
clean.append(re.sub(r'[^a-zA-Z\d\[\] ]', ' ', c))