R:跨excel文件的条件格式

时间:2018-08-16 22:15:41

标签: r excel conditional-formatting openxlsx

我正在尝试基于单独的excel文件中各列的匹配来突出显示excel文件中的行。差不多,如果该行中的单元格与file2中的单元格匹配,我想突出显示file1中的一行。

我看到R包“ conditionalFormatting”具有某些此功能,但是我不知道如何使用它。

我认为伪代码看起来像这样:

file1 <- read_excel("file1")
file2 <- read_excel("file2")

conditionalFormatting(file1, sheet = 1, cols = 1:end, rows = 1:22, 
rule = "number in file1 is found in a specific column of file 2")

请告诉我这是否有意义,或者我需要澄清一些内容。

谢谢!

1 个答案:

答案 0 :(得分:0)

conditionalFormatting()函数将有效的条件格式嵌入到excel文档中,但可能比一次性突出显示要复杂得多。我建议将每个文件加载到数据框中,确定哪些行包含匹配的单元格,创建突出显示样式(黄色背景),将文件作为工作簿对象加载,将适当的行设置为突出显示样式,并保存更新的工作簿宾语。

以下函数用于确定哪些行匹配。 magrittr软件包提供了%>%管道,而data.table软件包提供了transpose()函数。

find_matched_rows <- function(df1, df2) {
  require(magrittr)
  require(data.table)

  # the dataframe object treats each column as a list making it much easier and
  # faster to search via column than row. Transpose the original file1 dataframe
  # to treat the rows as columns.
  df1_transposed <- data.table::transpose(df1)

  # assuming that the location of the match in the second file is irrelevant,
  # unlist the file2 dataframe so that each value in file1 can be searched in a
  # vector
  df2_as_vector <- unlist(df2)

  # determine which columns contain a match. If one or more matches are found,
  # attribute the row as 'TRUE' in the output vector to be used to subset the 
  # row numbers
  match_map <- lapply(df1_transposed,FUN = `%in%`, df2_as_vector) %>%
    as.data.frame(stringsAsFactors = FALSE) %>%
    sapply(function(x) sum(x) > 0)

  # make a vector of row numbers using the logical match_map vector to subset
  matched_rows <- seq(1:nrow(df1))[match_map]
  return(matched_rows)
}

以下代码加载数据,找到匹配的行,应用突出显示并保存在原始file1.xlsx上。第二个tst_df1和tst_df2提供了一种测试find_matched_rows()函数的简便方法。不出所料,它发现第一个数据帧的第一行和第三行包含与第二个数据帧中的单元格匹配的单元格。

# used to ensure that the correct rows are highlighted. the dataframe does not
# include the header as an independent row unlike excel.
file1_header_row <- 1
file2_header_row <- 1

tst_df1 <- openxlsx::read.xlsx("./file1.xlsx",
                               startRow = file1_header_row)
tst_df2 <- openxlsx::read.xlsx("./file2.xlsx",
                               startRow = file2_header_row)

#example data for testing
tst_df1 <- data.frame(fname = c("John", "Bob", "Bill"), 
                  lname = c("Smith", "Johnson", "Samson"), 
                  wage = c(10, 15.23, 137.38), 
                  stringsAsFactors = FALSE)
tst_df2 <- data.frame(a = c(10, 34, 284.2), 
                   b = c("Billy", "Bill", "Billy-Bob"), 
                   c = c("Samson", "Johansson", NA), 
                   stringsAsFactors = FALSE)

df_matched_rows <- find_matched_rows(tst_df1, tst_df2)

# any color found in colours() can be used here or hex color beginning with "#"
highlight_style <- openxlsx::createStyle(fgFill = "yellow") 

file1_wb <- openxlsx::loadWorkbook(file = "./file1.xlsx")
openxlsx::addStyle(wb = file1_wb, 
                   sheet = 1, 
                   style = highlight_style,
                   rows = file1_header_row + df_matched_rows,
                   cols = 1:ncol(tst_df1),
                   stack = TRUE,
                   gridExpand = TRUE)
openxlsx::saveWorkbook(wb = file1_wb, 
                       file = "./file1.xlsx",
                       overwrite = TRUE)