我正在通过csv.reader()
阅读csv文件,我希望从各种符号或空格中读取值,并以小写字母表示:
a = " Kingston-(USB) = 1GB, 2GB, 32GB . . . "
像这样转换'a':
a = "kingstonusb1gb2gb32gb"
我只想在我的数据中使用字母和数字
应删除任何类型的符号
我使用.lower()
函数和.strip()
函数来执行此操作。但是条带功能只是删除了单词之间的空格,而我想删除各种符号
感谢。
答案 0 :(得分:3)
您可以使用正则表达式:
>>> import re
>>> a = " Kingston-(USB) = 1GB, 2GB, 32GB . . . "
>>> re.sub(r'[^a-z0-9]', '', a.lower())
'kingstonusb1gb2gb32gb'
或发电机:
>>> import string
>>> allowed = string.lowercase + string.digits
>>> ''.join(c for c in a.lower() if c in allowed)
'kingstonusb1gb2gb32gb'
或者快点:
>>> import string
>>> allowed = set(string.lowercase + string.digits)
>>> ''.join(filter(allowed.__contains__, a.lower()))
'kingstonusb1gb2gb32gb'
答案 1 :(得分:3)
如果你使用的是ascii,你可以使用str.translate从小写字符串中删除所有数字和alpha的内容
>>> from string import ascii_letters, digits
>>> ascii = set(chr(e) for e in range(128))
>>> ascii_sans_alpha = ''.join(ascii.difference(ascii_letters + digits))
>>> a = " Kingston-(USB) = 1GB, 2GB, 32GB . . . "
>>> a.lower().translate(None,ascii_sans_alpha)
'kingstonusb1gb2gb32gb'
这有多快?使用timeit参考
>>> def mgilson(a, good = set(string.ascii_lowercase + string.digits)):
return ''.join(x for x in a.lower() if x in good)
>>> def blender_re(a):
return re.sub(r'[^a-z0-9]', '', a.lower())
>>> def blender_gen(a, allowed = string.lowercase + string.digits):
return ''.join(c for c in a.lower() if c in allowed)
>>> def blender_fast(a, allowed = set(string.lowercase + string.digits)):
return ''.join(filter(allowed.__contains__, a.lower()))
>>> def abhijit(a, ascii_sans_alpha = ''.join(set(chr(e) for e in range(128)).difference(ascii_letters + digits))):
return a.lower().translate(None,ascii_sans_alpha)
>>> time_dict = {"abhijit": "from __main__ import string, a",
"blender_re":"from __main__ import re, a, abhijit",
"blender_gen": "from __main__ import string, a",
"blender_fast": "from __main__ import string, a",
"mgilson": "from __main__ import string, a"}
>>> for k, v in time_dict.items():
t = timeit.Timer(stmt="{}(a)".format(k),setup = "{},{}".format(v,k))
print "Timing for {} is {}".format(k, t.timeit(100000))
Timing for blender_fast is 0.573348026237
Timing for blender_re is 0.632169556846
Timing for blender_gen is 0.720916486331
Timing for mgilson is 0.586592185393
Timing for abhijit is 0.203489867547
>>>
答案 2 :(得分:2)
我可能会这样做:
import string
good = set(string.ascii_lowercase + string.digits)
a = ''.join(x for x in a.lower() if x in good)
这可能不是最有效的,但它应该有效,而且很容易阅读/理解。