如何使Python只查看逗号分隔符之前或之后没有空格的逗号

时间:2016-06-22 22:19:39

标签: python csv delimiter python-3.5

我有一个csv文件,我试图读入python,操作,然后写入另一个csv文件。

我目前的问题是虽然文件以逗号分隔,但并非所有逗号都是分隔符。

只有 NOT 前面和/或后跟空格的逗号才算作分隔符。 (仅","不","或",")。

这是我的代码的样子:

import csv

#open file for reading
with open(mypath, 'r', encoding = 'utf_8') as csvfile:
    myfile = list(csv.reader(csvfile, dialect = 'excel', delimiter = ','))
    #specifying columns to be deleted
    BadCols = [29,28,27,25,21,20,19,18,16,15,14,13,12,11,8,7,4,3] 
    #Loop through column indices to be deleted
    for col in BadCols:        
        #Loop through each row to delete columns
        for i, row in enumerate(myfile):
            #Delete Column, which is basically a list item at that row
            myfile[i].pop(col)


#Open file for writing
with open(mypath2, "w", encoding = 'utf_8', newline='') as csvfile:
    csv_file = csv.writer(csvfile, dialect = 'excel', delimiter = ',')
    for i, row in enumerate(myfile):
        for j, col in enumerate(row):
            csvfile.write('%s, ' %col)
        csvfile.write('\n')
csvfile.close

以下是我的数据:

Date,Name,City
May 30, 2016,Ryan,Boston

以下是我在使用Excel打开文件时要查看的内容:

Date            Name    City
May 30, 2016    Ryan    Boston

以下是我在Excel中实际看到的内容:

Date     [Blank column name]    Name   City
May 30   2016                   Ryan   Boston

因此,日期被视为两个元素而不是一个。

非常感谢任何帮助。

1 个答案:

答案 0 :(得分:2)

正则表达式可能是您最好的选择:

seq(as.Date("1900-02-01"), length = 12, by="1 month") - 1
# [1] "1900-01-31" "1900-02-28" "1900-03-31" "1900-04-30" "1900-05-31" "1900-06-30"
# [7] "1900-07-31" "1900-08-31" "1900-09-30" "1900-10-31" "1900-11-30" "1900-12-31"

## the same can be achieved with `seq.Date()`
# seq.Date(as.Date("1900-02-01"), by = "1 month", length.out = 12) - 1 

哪会给你:

import re

patt = re.compile(r"\b,\b")
with open("in.csv") as f:
    for row in map(patt.split, f):
        print(row)

你将不得不照顾尾随的空白,但这不应该是一个大问题。显然,如果你有['Date', 'Name', 'City\n'] ['May 30, 2016', 'Ryan', 'Boston'] 作为一个名字,你也会遇到问题,但如果没有,那么re方法就没问题了。

另一种选择可能是只用空格替换"foo,bar"", "

" ,"

所以:

import csv
import re

patt = re.compile(r"\s(,)|(,)\s")

with open("in.csv") as f:
    for line in csv.reader(map(lambda s: patt.sub(" ", s), f)):
        print(line)

你会得到:

Date,Name,City
May 30, 2016,Ryan,Boston
May 31 ,2016,foo,Narnia