
时间:2017-03-12 07:00:11

标签: r excel import xlsx readxl


ATVI  49.02   0.44   0.91   7193022   .3 
ADBE  119.91  0.31   0.26   1984225   .1 
AKAM  64.2    0.65   1.02   1336622   .1 
ALXN  126.55  0.86   0.67   2182253   .2
GOOG  838.68  3.31   0.4    1261517  1.0 
AMZN  853     2.5    0.29   2048187  1.0


Keyword             $AAPL -
Total tweets        166631
Total audience      221363515
Contributors        42738
Original tweets     91614
Replies             4964
RTs                 70053
Images and links    43361


Keyword                        $AAPL -
Total audience                 221363515
Contributors                   42738
Total tweets                   166631
Total potential impressions    1.250.920.501
Measured data from             2016-04-02 18:06
Measured data to               2016-06-15 12:23
Tweets per contributor         3,90
Impressions / Audience         5,65
Measured time in seconds       6373058
Measured time in minutes       106218
Measured time in hours         1770
Measured time in days          74
Tweets per second              0.026146161
Tweets per minute              1.568769655
Tweets per hour                94.1261793
Tweets per day                 2259.028303



#create empty dataframe to assemble all the rows
cdf <- data.frame()


#constructing list of all .xlsx files in current directory
file.list <- list.files(pattern='*.xlsx')

#using read_excel function to read each file in list and put in a dataframe of lists 
df.list <- lapply(file.list, read_excel)

#converting the dataframe of lists to a 77x2 dataframe
df <- as.data.frame(do.call(rbind, df.list),stringsAsFactors=FALSE)

#transposing the dataframe to prepare to stack multiple companies data in single dataframe
df <- t(df)

#making sure that the dataframe entry values are numeric
df <- transform(df,as.numeric)

#appending the 2nd row with the actual data into the dataframe that will have all companies' data
cdf <- rbind(cdf,df[2,])


> cdf[,1:8]
            X1        X2    X3    X4   X5    X6    X7        X8
$AAL      6507  14432722  1645  5211  459   837   938  14432722
$AAPL - 166631 221363515 42738 91614 4964 70053 43361 221363515



根据文档,这不是> cdf[,2] $AAL $AAPL - 14432722 221363515 Levels: 14432722 Total audience 221363515 的论据。有没有办法继续使用它,但避免这些水平?



但这会产生dir.list <- list.dirs(recursive = F) for (subdir in dir.list) { file.list <- list.files(pattern='*.xlsx') df.list <- lapply(file.list, read_excel) df <- as.data.frame(do.call(rbind, df.list),stringsAsFactors=FALSE) df <- t(df) df <- transform(df,as.numeric) cdf <- rbind(cdf,df[2,]) } ?我知道没有一个代码优雅或紧凑(并且rbind在for循环中是不明智的),但它是我能够拼凑在一起的东西。我非常容易接受样式修正和替代方法,但如果在这里描述的整体问题/解决方案中解释它们的上下文(即:不仅仅是“使用包xyz”或“读取ldply()),我们将非常感激。的文件“)。


2 个答案:

答案 0 :(得分:1)


df.list <- lapply(file.list, function(x) {
             as.data.frame(read_excel(x), stringsAsFactors=FALSE)


答案 1 :(得分:1)



# Get list of files
file.list <- list.files(path = ".", pattern = "*.xlsx")

# Iterate over files
dt_list <- lapply(seq_along(file.list), function(x) {
  # Read sheet 1 as data.table
  dt <- data.table(read_excel(file.list[x], sheet = 1))
  # Get company based on name of second column
  company <- gsub(colnames(dt)[2], pattern = "[^A-Z]*", replacement = "")
  # Set company and file_name (optional for debugging)
  dt[, ":="(company = company, file_name = file.list[x])]
  setnames(dt, c("key", "value", "company", "file_name"))
dt <- rbindlist(dt_list, use.names = TRUE)

# Get rid of file_name and remove duplicates
dt[, file_name := NULL]
dt <- unique(dt)

# Optional filtering on key
# dt <- dt[key %in% c("Total tweets", "Total audience")]

# Use dcast to make wide format table with one row per company
dt_wide <- dcast(dt, formula = company~key)


    company Average contributor followers Average contributor following Contributor followers median ...
 1:    AAPL                       5197,58                        832,06                       141,00 ...
 2:    ATVI                       9769,01                       1389,17                       562,00 ...


data.frame转换为标准df <- as.data.frame(dt_wide)