我有以下电子表格,上传到:https://files.fm/u/6uhc3qwr
我正在尝试导入资产负债表的特定单元格,因此在Assets
部分(大约第19行)下我们有Total Current Assets
,然后我想导入所有Total current Assets
该行的值。
31/12/2016 31/12/2015 31/12/2014 31/12/2013 31/12/2012 31/12/2011 31/12/2010 31/12/2009 31/12/2008 31/12/2007
th USD th USD th USD th USD th USD th USD th USD th USD th USD th USD
21.855.481 23.407.658 30.740.856 35.002.444 34.819.795 36.161.838 24.317.544 20.191.164 51.185.242 22.041.144
对我来说,困难的部分是将这些数据与上面几行的日期行一起导入。我不是要导入数据文件的第19行,而是导入与行名Total Current Assets
对应的值。我有很多这些资产负债表,excel行号略有变化。
答案 0 :(得分:2)
library(xlsx)
library(tidyverse)
# Read the data from excel file
df <- read.xlsx('~/Downloads/balance_upload_stack.xlsx',
sheetIndex = 1, stringsAsFactors = F)
# Identify the rows of interest based on the name and subset the original data.frame
rows_of_interest <- which(df[,1] %in% c("Annual report/Consolidated"," Total Current Assets"))
new_df <- df[rows_of_interest,]
new_df <- new_df[!(duplicated(new_df[,1])),]
# Remove the column with NA and align the data which are spread across different columns
new_df <- new_df[colSums(!is.na(new_df)) > 0]
new_df <- cbind(Index = rev(new_df[,1]),
new_col = na.omit(unlist(new_df[,-1]))) %>% as.data.frame()
输出数据框
Index new_col
Total Current Assets 21855481.4195938
Annual report/Consolidated 31/12/2016
Total Current Assets 23407658.3146
Annual report/Consolidated 31/12/2015
Total Current Assets 30740855.9858115
Annual report/Consolidated 31/12/2014
Total Current Assets 35002443.8754019
Annual report/Consolidated 31/12/2013
Total Current Assets 34819794.6592976
Annual report/Consolidated 31/12/2012
Total Current Assets 36161837.8907298
Annual report/Consolidated 31/12/2011
Total Current Assets 24317543.6967938
Annual report/Consolidated 31/12/2010
Total Current Assets 20191164.1378803
Annual report/Consolidated 31/12/2009
Total Current Assets 51185242.2723579
Annual report/Consolidated 31/12/2008
Total Current Assets 22041143.7373581
Annual report/Consolidated 31/12/2007