使用R在Excel工作表中创建图表

时间:2016-08-11 06:20:34

标签: r excel

我正在使用openXLSX包从我的R输出生成excel文件 我没有找到将excel图表添加到excel工作簿的方法 我看到python有一个creating Excel files模块,其中有一个用于添加Excel图表的类 有没有办法用R做?

2 个答案:

答案 0 :(得分:3)

这是使用包XLConnect的解决方案。 但需要注意的是,它依赖于您需要提前创建的图表模板,它会生成新文件,而不是将图纸或图表附加到现有文件中。

它由两个阶段组成:

  1. 为您要使用的图表类型准备Excel模板。
  2. 每次都根据需要使用R中的数据更新模板文件。
  3. 第一步:根据您需要的图表类型,在Excel中准备模板。您可以将所有模板放在同一文件(在不同的工作表中)或几个不同的文件中。准备模板时,请在工作表中包含所需的图表类型,但不要引用特定单元格,而是需要使用“命名范围”。 见for example。您也可以使用sample file I created。请注意在文件和图表的数据引用中使用命名范围(Sheet1!bar_namesSheet1!values而不是Sheet1!$A$2:$A$4Sheet1!$B$2:$B$4)。

    Excel中命名范围的附注。命名范围意味着您为要在图表中使用的数据指定名称,然后“告诉图表”使用命名范围,而不是绝对位置。您可以在“公式”菜单中访问Excel中的“名称管理器”。我们使用命名范围的原因是XLConnect能够控制命名范围,因此当我们修改命名范围时,图表将动态更新。

    第二步:使用以下代码的改编,以满足您的需求。主要使用您自己的数据框并更新createName函数中的引用。

    library(XLConnect) # load library
    wb1 <- loadWorkbook(filename = "edit_chart_via_R_to_excel.xlsx") 
    new.df <- data.frame(Type = c("Ford", "Hyundai", "BMW", "Other"),
              Number = c(45, 35, 25, 15)) # sample data
    writeWorksheet(wb1, data = new.df, sheet = "Sheet1", 
                   startRow = 1, startCol = 1, header = TRUE)
    # update named ranges for the chart's use.
    # Note that 
    # "Sheet1!$A$2:$A$5" and "Sheet1!$B$2:$B$5" 
    # should change according to the data you are updating
    createName(wb1, "bar_names", "Sheet1!$A$2:$A$5", overwrite = TRUE) 
    createName(wb1, "values", "Sheet1!$B$2:$B$5", overwrite = TRUE)
    saveWorkbook(wb1)
    

    这应该可以解决问题。

    请注意,如果要将模板作为新文件提供(并保留原始模板而不覆盖它),则可以在开始修改之前复制并保存模板。

答案 1 :(得分:3)

我考虑使用reticulate从头开始编写.xlsx文件,并使用基于数据的本机excel图表,而不必制作模板。下面的脚本生成一些数据,将其保存到.xlsx文件,然后在数据下方构建折线图。有关不同图表类型,请参见https://xlsxwriter.readthedocs.io/chart.html上的文档!

还请注意,如果reticulate找不到现有安装,此提示会提示您安装Python。

可在以下要点获得该代码:https://gist.github.com/jsavn/cbea4b35d73cea6841489e72a221c4e9

Python脚本write_xlsx_and_chart_to_file.py

(此文件名稍后在R脚本的source()调用中使用)

import pandas as pd
import xlsxwriter as xw

# The skeleton of below function based on example from: https://xlsxwriter.readthedocs.io/example_pandas_chart.html#ex-pandas-chart
# We pass the function a pandas dataframe;
# The dataframe is inserted in an .xslx spreadsheet
# We take note of the number of rows and columns, and use those to position the chart below the data
# We then iterate over the rows of the data and insert each row as a separate line (series) in the line chart

def save_time_series_as_xlsx_with_chart(pandas_df, filename):
  if not(filename.endswith('.xlsx')):
    print("Warning: added .xlsx to filename")
    filename = filename + '.xlsx'
  # Create a Pandas dataframe from the data.
  # pandas_df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})

  ## get dimensions of data frame to use for positioning the chart later
  pandas_df_nrow, pandas_df_ncol = pandas_df.shape

  # Create a Pandas Excel writer using XlsxWriter as the engine.
  writer = pd.ExcelWriter(filename, engine='xlsxwriter')

  # Convert the dataframe to an XlsxWriter Excel object.
  pandas_df.to_excel(writer, sheet_name='Sheet1', index=False)

  # Get the xlsxwriter workbook and worksheet objects.
  workbook  = writer.book
  worksheet = writer.sheets['Sheet1']

  # Create a chart object.
  chart = workbook.add_chart({'type': 'line'})

  # Configure the series of the chart from the dataframe data
  # THe coordinates of each series in the line chart are the positions of the data in the excel file
  # Note that data starts at row 2, column 1, so the row/col values need to be adjusted accordingly
  # However, python counts rows & columns from 0
  for row_in_data in range(0,pandas_df_nrow):
    row_in_sheet = row_in_data+1  # data starts on 2nd row
    last_col_in_sheet = pandas_df_ncol-1 # number of columns minus one in 0-notation
    first_col_with_data = 1  # 2nd column in 0-notation
    range_of_series = xw.utility.xl_range(
      first_row=row_in_sheet,  # read from the current row in loop only
      first_col=first_col_with_data, # data starts in 2nd column, i.e. 1 in 0-notation
      last_row=row_in_sheet,
      last_col=last_col_in_sheet
      )
    range_of_categories = xw.utility.xl_range(
      first_row=0, # read from 1st row only - header
      first_col=first_col_with_data,  # read from 2nd column for month headers
      last_row=0, 
      last_col=last_col_in_sheet
      )
    formula_for_series = '=Sheet1!' + range_of_series
    col_with_series_name = 0  # first column
    name_of_series = '=Sheet1!' + xw.utility.xl_rowcol_to_cell(row=row_in_sheet, col=col_with_series_name)
    formula_for_categories = 'Sheet1!' + range_of_categories
    chart.add_series({'values': formula_for_series, 'name': name_of_series, 'categories': formula_for_categories})

  # Insert the chart into the worksheet.
  worksheet.insert_chart(pandas_df_nrow+2, 2, chart)

  # Close the Pandas Excel writer and output the Excel file.
  writer.save()

R脚本

library(tidyverse)
library(reticulate)

set.seed(19)  # random seed fixed

# check if packages are available, otherwise install
for (package in c("pandas","xlsxwriter")) {
  if (py_module_available(package)) {
    message(package, " already installed! Proceeding...")
  } else {
    py_install(packages = package)  
  }
}

## generate some time series data for month & year
tbl <- expand_grid(Year=2017:2020, Month=month.name) %>% mutate(N=sample(1:100, size=nrow(.), replace=TRUE))

## ggplot2 plot of the data so we know what to expect
fig <- 
  ggplot(data=tbl) +
  geom_line(aes(x=Month, y=N, group=Year, colour=factor(Year)), size=1) +
  theme_minimal() +
  NULL
print(fig)  # see a ggplot2 version of same plot

# convert data to wide format to put in excel
tbl_wide_format <- tbl %>%
  pivot_wider(names_from=Month, values_from=N)

# convert wide format data to pandas dataframe, to pass to python script
tbl_pandas <- r_to_py(tbl_wide_format)

## import python script
source_python("write_xlsx_and_chart_to_file.py")

## save chart using python script
save_time_series_as_xlsx_with_chart(tbl_pandas, "reticulate_pandas_writexlsx_excel_line_chart.xlsx")