我的python函数定义如下:
def name_extractor(dirty_name):
print Name
clean_name = re.sub('\W'," ", dirty_name)
print clean_name
脏名称的样本包含:
(10) Johny Doe
Eric E. Shelby
(1) Chris Melton - ŗ≤ēŗ≤Ņŗ≤įŗ≤Ņŗ≤ēŗ≥ć ŗ≤ēŗ≥Äŗ≤įŗ≥ćŗ≤§ŗ≤Ņ
Jonas Alexander Bay
Christopher Rockstar - An awesome guy
Jones Collier
我想要输出只打印:
Johny Doe
Eric E. Shelby
Chris Melton
Jonas Alexander Bay
Christopher Rockstar
Jones Collier
如何调整正则表达式以仅按原样打印名称并排除&#34之后的所有内容(随机字符或正常的ascii字符) - "?
答案 0 :(得分:2)
您不需要正则表达式。拆分' - '
上的每一行,然后过滤掉你不想要的字符,剥去额外的空格:
>>> l = '''(10) Johny Doe
... Eric E. Shelby
... (1) Chris Melton - ŗ≤ēŗ≤Ņŗ≤įŗ≤Ņŗ≤ēŗ≥ć ŗ≤ēŗ≥Äŗ≤įŗ≥ćŗ≤§ŗ≤Ņ
... Jonas Alexander Bay
... Christopher Rockstar - An awesome guy
... Jones Collier'''.splitlines()
>>> for line in l:
... print(''.join(c for c in line.split(' - ')[0] if c.isalpha() or c in ' .').strip())
...
Johny Doe
Eric E. Shelby
Chris Melton
Jonas Alexander Bay
Christopher Rockstar
Jones Collier
答案 1 :(得分:0)
要排除所有非ascii字符以及在连字符-
之后的所有其他字符 - 用空字符串""
替换它们就足够了。
使用特定正则表达式模式的简短解决方案:
dirty_name = '''
(10) Johny Doe
Eric E. Shelby
(1) Chris Melton - ŗ≤ēŗ≤Ņŗ≤įŗ≤Ņŗ≤ēŗ≥ć ŗ≤ēŗ≥Äŗ≤įŗ≥ćŗ≤§ŗ≤Ņ
Jonas Alexander Bay
Christopher Rockstar - An awesome guy
Jones Collier'''
clean_name = '\n'.join(l.lstrip() for l in re.sub(r'[^\x00-\x7f]|[\d()]| - .+\b(?=\n)', "", dirty_name).split('\n'))
print(clean_name)
输出:
Johny Doe
Eric E. Shelby
Chris Melton
Jonas Alexander Bay
Christopher Rockstar
Jones Collier
编辑: 删除左前方空格导致@ TigerhawkT3太"空间敏感" (在他自己的宗教中) )
P.S。 \x00-\x7f
ASCII 字符范围