我想将csv文件中的第2列全部小写并删除所有标点并保存文件。我怎么能这样做?
import re
import csv
with open('Cold.csv', 'rb') as f_input1:
with open('outing.csv', 'wb') as f_output:
reader = csv.reader(f_input1)
writer = csv.writer(f_output)
for row in reader:
row[1] = re.sub('[^a-z0-9]+', ' ', str(row[1].lower()))
writer.writerow(row)
f_input1.close()
我如何添加:
re.sub('[^A-Za-z0-9]+', ' ', str(row))
filewriter.writerow([new_row.lower()])
或.lower在这段代码中?
答案 0 :(得分:2)
您可以添加代码来修改您的单元格,如下所示:
import re
import csv
with open('in.csv', 'rb') as f_input, open('out.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
for row in csv.reader(f_input):
row[1] = re.sub('[^A-Za-z0-9]+', '', row[1].lower())
csv_output.writerow(row)
.lower()
用于首先将字符串转换为小写。使用with
可确保您的文件最终自动关闭。
注意,您的正则表达式sub应该用空字符串替换任何无效字符,例如''
,您目前将其设置为单个空格。
答案 1 :(得分:1)
只需编辑适当的行并将其写回
with open('Cold.csv', 'rb') as f_input1, open('outing.csv', 'wb') as f_output:
reader = csv.reader(f_input1)
writer = csv.writer(f_output)
for row in reader:
row[1] = re.sub('[^a-z0-9]+', ' ', str(row[1].lower()))
writer.writerow(row)
答案 2 :(得分:-1)
最简单的解决方案是将这两种方法结合在一起。
import string
s.translate(None, string.punctuation) --> to remove punctuation
##if speed is not an issue
exclude = set(string.punctuation)
s = ''.join(ch for ch in s if ch not in exclude)
row.lower() --> to convert to lower case