从Excel电子表格导入特定单元格

时间:2017-12-17 17:20:19

标签: r web-scraping

我有以下电子表格,上传到:https://files.fm/u/6uhc3qwr

我正在尝试导入资产负债表的特定单元格,因此在Assets部分(大约第19行)下我们有Total Current Assets,然后我想导入所有Total current Assets该行的值。

31/12/2016              31/12/2015              31/12/2014          31/12/2013          31/12/2012                      31/12/2011          31/12/2010          31/12/2009          31/12/2008          31/12/2007      
th USD              th USD              th USD          th USD          th USD                      th USD          th USD          th USD          th USD          th USD      

    21.855.481              23.407.658          30.740.856              35.002.444          34.819.795                      36.161.838          24.317.544          20.191.164          51.185.242          22.041.144  

对我来说,困难的部分是将这些数据与上面几行的日期行一起导入。我不是要导入数据文件的第19行,而是导入与行名Total Current Assets对应的值。我有很多这些资产负债表,excel行号略有变化。

1 个答案:

答案 0 :(得分:2)

 library(xlsx)
 library(tidyverse)

 # Read the data from excel file
 df <- read.xlsx('~/Downloads/balance_upload_stack.xlsx',
                 sheetIndex = 1, stringsAsFactors = F)

 # Identify the rows of interest based on the name and subset the original data.frame
 rows_of_interest <- which(df[,1] %in% c("Annual report/Consolidated"," Total Current Assets"))
 new_df <- df[rows_of_interest,]
 new_df <- new_df[!(duplicated(new_df[,1])),]

 # Remove the column with NA and align the data which are spread across different columns
 new_df <- new_df[colSums(!is.na(new_df)) > 0]
 new_df <- cbind(Index = rev(new_df[,1]), 
                 new_col = na.omit(unlist(new_df[,-1]))) %>% as.data.frame()

输出数据框

                      Index          new_col
       Total Current Assets 21855481.4195938
 Annual report/Consolidated       31/12/2016
       Total Current Assets    23407658.3146
 Annual report/Consolidated       31/12/2015
       Total Current Assets 30740855.9858115
 Annual report/Consolidated       31/12/2014
       Total Current Assets 35002443.8754019
 Annual report/Consolidated       31/12/2013
       Total Current Assets 34819794.6592976
 Annual report/Consolidated       31/12/2012
       Total Current Assets 36161837.8907298
 Annual report/Consolidated       31/12/2011
       Total Current Assets 24317543.6967938
 Annual report/Consolidated       31/12/2010
       Total Current Assets 20191164.1378803
 Annual report/Consolidated       31/12/2009
       Total Current Assets 51185242.2723579
 Annual report/Consolidated       31/12/2008
       Total Current Assets 22041143.7373581
 Annual report/Consolidated       31/12/2007