所以我写了一段代码,成功地读取了目录中的所有.xls文件,然后将其转换为csv格式,最后将它们连接起来,因此程序随后在该单个连接的csv文件上运行。
此代码也适用于.txt,只需将.xls替换为.txt。我认为在.xlsx格式的情况下会类似,但是我错了。由于某种原因,它显示了一个错误。
代码是:
path="C:\\Users\\AD\\Downloads\\Excess data" #Change this directory to the location of your directory.
allFiles = glob.glob(path + "\\*.xls") #Searches for all files with .txt/.xls.
list_ = []
for file in allFiles:
print(file)
bytes = open(file, 'rb').read()
df=pd.read_csv(io.StringIO(bytes.decode('utf-8')), sep='\t', parse_dates=['Time'] )
list_.append(df)
Source = pd.concat(list_)
Source.head()
此代码可针对.xls和.txt成功运行,但对于.xlsx,我会遇到一些错误:
*UTF-8 can't decode ...at position.. something like this*
感谢您的帮助!
答案 0 :(得分:2)
我建议将read_excel
用于列表理解:
#Change this directory to the location of your directory.
path="C:\\Users\\AD\\Downloads\\Excess data"
#Searches for all files with .txt/.xls.
allFiles = glob.glob(path + "\\*.xls")
list_ = [pd.read_excel(file) for file in allFiles]
Source = pd.concat(list_, ignore_index=True)
print Source
#convert to csv
Source.to_csv('out.csv', index=False)