我有15个xlsx文件,每个文件都有多个工作表。我想动态地循环它,以便在单个数据框中读取所有工作表和工作簿。
我曾尝试通过以下方式使用pd.read_excel
library(dplyr)
library(purrr)
library(lubridate)
data %>% mutate(Int = interval(start.date, end.date),
overlaps = map(seq_along(Int), function(x){
#browser()
#Get all Int indexes other than the current one
y = setdiff(seq_along(Int), x)
#The interval overlaps with all other intervals
#return(all(int_overlaps(Int[x], Int[y])))
#The interval overlaps with any other intervals
return(any(int_overlaps(Int[x], Int[y])))
}))
start.date end.date Int overlaps
1 2019-09-01 2019-09-10 2019-09-01 UTC--2019-09-10 UTC TRUE
2 2019-09-05 2019-09-07 2019-09-05 UTC--2019-09-07 UTC TRUE
3 2019-08-25 2019-09-05 2019-08-25 UTC--2019-09-05 UTC TRUE
4 2019-10-10 2019-10-15 2019-10-10 UTC--2019-10-15 UTC FALSE
运行pd.concat时出现的错误是
filenames = glob.glob("*.xlsx")
dfList=[]
colnames =['dummy','dummy1','dummy2']
for a in filenames:
df=pd.read_excel(a, sheet_name=None, header = None, encoding = "ISO-8859-1")
dfList.append(df)
df= pd.concat(dfList, axis=0, ignore_index= True)
df.columns= colnames
并尝试了
TypeError: cannot concatenate object of type "<class 'collections.OrderedDict'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
我遇到以下错误
for a in filenames:
df=[pd.read_excel(a, sheet_name=None, header = None, encoding = "ISO-8859-1").values()]
dfList.append(df)
答案 0 :(得分:0)
您为什么需要concat?
您能否使用df = pd.DataFrame(dfList)
将列表转换为数据框
答案 1 :(得分:0)
我认为您需要更改:
filenames = glob.glob("*.xlsx")
dfList=[]
colnames =['dummy','dummy1','dummy2']
for a in filenames:
df=pd.read_excel(a, sheet_name=None, header = None, encoding = "ISO-8859-1")
dfList.append(df)
df = []
df = pd.concat(dfList, axis=0, ignore_index= True)
df.columns= colnames