Question

我有15个xlsx文件，每个文件都有多个工作表。我想动态地循环它，以便在单个数据框中读取所有工作表和工作簿。

我曾尝试通过以下方式使用pd.read_excel

library(dplyr)
library(purrr)
library(lubridate)
data %>% mutate(Int = interval(start.date, end.date), 
                overlaps = map(seq_along(Int), function(x){
                  #browser()
                  #Get all Int indexes other than the current one
                  y = setdiff(seq_along(Int), x)
                  #The interval overlaps with all other intervals
                  #return(all(int_overlaps(Int[x], Int[y])))
                  #The interval overlaps with any other intervals
                  return(any(int_overlaps(Int[x], Int[y])))
                }))

  start.date   end.date                            Int overlaps
1 2019-09-01 2019-09-10 2019-09-01 UTC--2019-09-10 UTC     TRUE
2 2019-09-05 2019-09-07 2019-09-05 UTC--2019-09-07 UTC     TRUE
3 2019-08-25 2019-09-05 2019-08-25 UTC--2019-09-05 UTC     TRUE
4 2019-10-10 2019-10-15 2019-10-10 UTC--2019-10-15 UTC    FALSE

运行pd.concat时出现的错误是


filenames = glob.glob("*.xlsx")
dfList=[]
colnames =['dummy','dummy1','dummy2']
for a in filenames:
    df=pd.read_excel(a, sheet_name=None, header = None, encoding = "ISO-8859-1")
    dfList.append(df)

df= pd.concat(dfList, axis=0, ignore_index= True)
df.columns= colnames

并尝试了

TypeError: cannot concatenate object of type "<class 'collections.OrderedDict'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

我遇到以下错误

for a in filenames:
    df=[pd.read_excel(a, sheet_name=None, header = None, encoding = "ISO-8859-1").values()]
    dfList.append(df)

Answer 1

您为什么需要concat？您能否使用df = pd.DataFrame(dfList)将列表转换为数据框

Answer 2

我认为您需要更改：

filenames = glob.glob("*.xlsx")
dfList=[]
colnames =['dummy','dummy1','dummy2']
for a in filenames:
    df=pd.read_excel(a, sheet_name=None, header = None, encoding = "ISO-8859-1")
    dfList.append(df)

df = []
df = pd.concat(dfList, axis=0, ignore_index= True)
df.columns= colnames

如何使用循环读取python数据框中具有多个工作簿的多个Xlsx文件

2 个答案: