我正在为我的项目使用pycharm,pandas框架和python 3.3。我最初有200个.csv.gz文件,我想从中创建一个大的.csv文件。下面是它的代码,它正在运行
import gzip
import csv
import glob
import os
pathname=os.path.expanduser("~/Desktop/downloaded/*.csv.gz")
path=os.path.expanduser("~/Desktop/downloaded")
#path = pathname+"/*.csv.gz" #folder containing all .csv.gz files
counter = 1 #counts total number of files read
files = glob.glob(pathname)
try:
for file in files: # read files one by one
with gzip.open(file,'rt',encoding="utf-8",errors="ignore") as mycsvread:
filecontent = csv.reader(mycsvread)
if counter > 1:
header = mycsvread.readline() # skip header file
with open(path+"/combinedcsv.csv", 'a') as mycsvwrite:
datawriter = csv.writer(mycsvwrite)
for row in filecontent:
datawriter.writerow(row)
except IOError:
print("File not found")
except EOFError:
print("No input")
else:
counter = counter + 1
但问题是所有列都被转换为“object”数据类型。当我创建数据帧并将大的.csv文件读入其中时,我发现了这一点。这是dataframe.dtypes
输出:
BEGIN_YEARMONTH object
BEGIN_DAY object
BEGIN_TIME object
END_YEARMONTH object
END_DAY object
END_TIME object
EPISODE_ID object
EVENT_ID object
STATE object
STATE_FIPS object
YEAR object
MONTH_NAME object
EVENT_TYPE object
CZ_TYPE object
CZ_FIPS object
CZ_NAME object
WFO object
BEGIN_DATE_TIME object
CZ_TIMEZONE object
END_DATE_TIME object
INJURIES_DIRECT object
INJURIES_INDIRECT object
请告诉我保留原始数据类型的列吗?