所以我有一个.csv文件,其中每一行都是这样的:
,11:00:14,4,5.,93.7,0.01,0.0,7,20,0.001,10,49.3,0.01,
,11:00:15,4,5.,94.7,0.04,0.5,7,20,0.005,10,49.5,0.04,
它应该是这样的:
11:00:14,4,5.,93.7,0.01,0.0,7,20,0.001,10,49.3,0.01
11:00:15,4,5.,94.7,0.04,0.5,7,20,0.005,10,49.5,0.04
我认为这就是为什么pandas没有正确创建数据框架的原因。我该怎么做才能删除这些逗号?
生成原始csv文件的代码是
def tsv2csv():
# read tab-delimited file
with open(file_location + tsv_file,'r') as fin:
cr = csv.reader(fin, delimiter='\t')
filecontents = [line for line in cr]
# write comma-delimited file (comma is the default delimiter)
# give the exact location of the file
#"newline=''" at the end of the line stops there being spaces between each row
with open(new_csv_file,'w', newline='') as fou:
cw = csv.writer(fou, quotechar='', quoting=csv.QUOTE_NONE)
cw.writerows(filecontents)
答案 0 :(得分:2)
您可以使用usecols
指定要导入的列,如下所示:
import pandas as pd
csv_df = pd.read_csv('temp.csv', header=None, usecols=range(1,13))
这将跳过第一个和最后一个空列。
答案 1 :(得分:2)
尾随逗号对应于缺少的数据。在您的数据框中加载时,它们会以NaN形式加载,因此您只需要使用dropna
或将它们切片 - 即可将其删除 -
df = pd.read_csv('file.csv', header=None).dropna(how='all', axis=1)
或者,
df = pd.read_csv('file.csv', header=None).iloc[:, 1:-1]
df
1 2 3 4 5 6 7 8 9 10 11 12
0 11:00:14 4 5.0 93.7 0.01 0.0 7 20 0.001 10 49.3 0.01
1 11:00:15 4 5.0 94.7 0.04 0.5 7 20 0.005 10 49.5 0.04
答案 2 :(得分:-1)
您可以使用strip
删除文本开头和结尾处的任何字符,并提供一个字符串,其中包含您不想作为参数转义的字符。
x = ',11:00:14,4,5.,93.7,0.01,0.0,7,20,0.001,10,49.3,0.01,'
print x.strip(',')
>11:00:14,4,5.,93.7,0.01,0.0,7,20,0.001,10,49.3,0.01
答案 3 :(得分:-1)
不确定如果它适用于您的情况,您尝试导入了一下:
df = pd.read_csv('filename', sep=';')