我有一个csv文件,我试图读入python,操作,然后写入另一个csv文件。
我目前的问题是虽然文件以逗号分隔,但并非所有逗号都是分隔符。
只有 NOT 前面和/或后跟空格的逗号才算作分隔符。 (仅","不","或",")。
这是我的代码的样子:
import csv
#open file for reading
with open(mypath, 'r', encoding = 'utf_8') as csvfile:
myfile = list(csv.reader(csvfile, dialect = 'excel', delimiter = ','))
#specifying columns to be deleted
BadCols = [29,28,27,25,21,20,19,18,16,15,14,13,12,11,8,7,4,3]
#Loop through column indices to be deleted
for col in BadCols:
#Loop through each row to delete columns
for i, row in enumerate(myfile):
#Delete Column, which is basically a list item at that row
myfile[i].pop(col)
#Open file for writing
with open(mypath2, "w", encoding = 'utf_8', newline='') as csvfile:
csv_file = csv.writer(csvfile, dialect = 'excel', delimiter = ',')
for i, row in enumerate(myfile):
for j, col in enumerate(row):
csvfile.write('%s, ' %col)
csvfile.write('\n')
csvfile.close
以下是我的数据:
Date,Name,City
May 30, 2016,Ryan,Boston
以下是我在使用Excel打开文件时要查看的内容:
Date Name City
May 30, 2016 Ryan Boston
以下是我在Excel中实际看到的内容:
Date [Blank column name] Name City
May 30 2016 Ryan Boston
因此,日期被视为两个元素而不是一个。
非常感谢任何帮助。
答案 0 :(得分:2)
正则表达式可能是您最好的选择:
seq(as.Date("1900-02-01"), length = 12, by="1 month") - 1
# [1] "1900-01-31" "1900-02-28" "1900-03-31" "1900-04-30" "1900-05-31" "1900-06-30"
# [7] "1900-07-31" "1900-08-31" "1900-09-30" "1900-10-31" "1900-11-30" "1900-12-31"
## the same can be achieved with `seq.Date()`
# seq.Date(as.Date("1900-02-01"), by = "1 month", length.out = 12) - 1
哪会给你:
import re
patt = re.compile(r"\b,\b")
with open("in.csv") as f:
for row in map(patt.split, f):
print(row)
你将不得不照顾尾随的空白,但这不应该是一个大问题。显然,如果你有['Date', 'Name', 'City\n']
['May 30, 2016', 'Ryan', 'Boston']
作为一个名字,你也会遇到问题,但如果没有,那么re方法就没问题了。
另一种选择可能是只用空格替换"foo,bar"
或", "
:
" ,"
所以:
import csv
import re
patt = re.compile(r"\s(,)|(,)\s")
with open("in.csv") as f:
for line in csv.reader(map(lambda s: patt.sub(" ", s), f)):
print(line)
你会得到:
Date,Name,City
May 30, 2016,Ryan,Boston
May 31 ,2016,foo,Narnia