我正在使用openXLSX
包从我的R
输出生成excel文件
我没有找到将excel图表添加到excel工作簿的方法
我看到python
有一个creating Excel files模块,其中有一个用于添加Excel图表的类
有没有办法用R做?
答案 0 :(得分:3)
这是使用包XLConnect
的解决方案。
但需要注意的是,它依赖于您需要提前创建的图表模板,它会生成新文件,而不是将图纸或图表附加到现有文件中。
它由两个阶段组成:
第一步:根据您需要的图表类型,在Excel中准备模板。您可以将所有模板放在同一文件(在不同的工作表中)或几个不同的文件中。准备模板时,请在工作表中包含所需的图表类型,但不要引用特定单元格,而是需要使用“命名范围”。
见for example。您也可以使用sample file I created。请注意在文件和图表的数据引用中使用命名范围(Sheet1!bar_names
和Sheet1!values
而不是Sheet1!$A$2:$A$4
和Sheet1!$B$2:$B$4
)。
Excel中命名范围的附注。命名范围意味着您为要在图表中使用的数据指定名称,然后“告诉图表”使用命名范围,而不是绝对位置。您可以在“公式”菜单中访问Excel中的“名称管理器”。我们使用命名范围的原因是XLConnect
能够控制命名范围,因此当我们修改命名范围时,图表将动态更新。
第二步:使用以下代码的改编,以满足您的需求。主要使用您自己的数据框并更新createName
函数中的引用。
library(XLConnect) # load library
wb1 <- loadWorkbook(filename = "edit_chart_via_R_to_excel.xlsx")
new.df <- data.frame(Type = c("Ford", "Hyundai", "BMW", "Other"),
Number = c(45, 35, 25, 15)) # sample data
writeWorksheet(wb1, data = new.df, sheet = "Sheet1",
startRow = 1, startCol = 1, header = TRUE)
# update named ranges for the chart's use.
# Note that
# "Sheet1!$A$2:$A$5" and "Sheet1!$B$2:$B$5"
# should change according to the data you are updating
createName(wb1, "bar_names", "Sheet1!$A$2:$A$5", overwrite = TRUE)
createName(wb1, "values", "Sheet1!$B$2:$B$5", overwrite = TRUE)
saveWorkbook(wb1)
这应该可以解决问题。
请注意,如果要将模板作为新文件提供(并保留原始模板而不覆盖它),则可以在开始修改之前复制并保存模板。
答案 1 :(得分:3)
我考虑使用reticulate
从头开始编写.xlsx文件,并使用基于数据的本机excel图表,而不必制作模板。下面的脚本生成一些数据,将其保存到.xlsx文件,然后在数据下方构建折线图。有关不同图表类型,请参见https://xlsxwriter.readthedocs.io/chart.html上的文档!
还请注意,如果reticulate
找不到现有安装,此提示会提示您安装Python。
可在以下要点获得该代码:https://gist.github.com/jsavn/cbea4b35d73cea6841489e72a221c4e9
write_xlsx_and_chart_to_file.py
(此文件名稍后在R脚本的source()
调用中使用)
import pandas as pd
import xlsxwriter as xw
# The skeleton of below function based on example from: https://xlsxwriter.readthedocs.io/example_pandas_chart.html#ex-pandas-chart
# We pass the function a pandas dataframe;
# The dataframe is inserted in an .xslx spreadsheet
# We take note of the number of rows and columns, and use those to position the chart below the data
# We then iterate over the rows of the data and insert each row as a separate line (series) in the line chart
def save_time_series_as_xlsx_with_chart(pandas_df, filename):
if not(filename.endswith('.xlsx')):
print("Warning: added .xlsx to filename")
filename = filename + '.xlsx'
# Create a Pandas dataframe from the data.
# pandas_df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
## get dimensions of data frame to use for positioning the chart later
pandas_df_nrow, pandas_df_ncol = pandas_df.shape
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter(filename, engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
pandas_df.to_excel(writer, sheet_name='Sheet1', index=False)
# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets['Sheet1']
# Create a chart object.
chart = workbook.add_chart({'type': 'line'})
# Configure the series of the chart from the dataframe data
# THe coordinates of each series in the line chart are the positions of the data in the excel file
# Note that data starts at row 2, column 1, so the row/col values need to be adjusted accordingly
# However, python counts rows & columns from 0
for row_in_data in range(0,pandas_df_nrow):
row_in_sheet = row_in_data+1 # data starts on 2nd row
last_col_in_sheet = pandas_df_ncol-1 # number of columns minus one in 0-notation
first_col_with_data = 1 # 2nd column in 0-notation
range_of_series = xw.utility.xl_range(
first_row=row_in_sheet, # read from the current row in loop only
first_col=first_col_with_data, # data starts in 2nd column, i.e. 1 in 0-notation
last_row=row_in_sheet,
last_col=last_col_in_sheet
)
range_of_categories = xw.utility.xl_range(
first_row=0, # read from 1st row only - header
first_col=first_col_with_data, # read from 2nd column for month headers
last_row=0,
last_col=last_col_in_sheet
)
formula_for_series = '=Sheet1!' + range_of_series
col_with_series_name = 0 # first column
name_of_series = '=Sheet1!' + xw.utility.xl_rowcol_to_cell(row=row_in_sheet, col=col_with_series_name)
formula_for_categories = 'Sheet1!' + range_of_categories
chart.add_series({'values': formula_for_series, 'name': name_of_series, 'categories': formula_for_categories})
# Insert the chart into the worksheet.
worksheet.insert_chart(pandas_df_nrow+2, 2, chart)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
library(tidyverse)
library(reticulate)
set.seed(19) # random seed fixed
# check if packages are available, otherwise install
for (package in c("pandas","xlsxwriter")) {
if (py_module_available(package)) {
message(package, " already installed! Proceeding...")
} else {
py_install(packages = package)
}
}
## generate some time series data for month & year
tbl <- expand_grid(Year=2017:2020, Month=month.name) %>% mutate(N=sample(1:100, size=nrow(.), replace=TRUE))
## ggplot2 plot of the data so we know what to expect
fig <-
ggplot(data=tbl) +
geom_line(aes(x=Month, y=N, group=Year, colour=factor(Year)), size=1) +
theme_minimal() +
NULL
print(fig) # see a ggplot2 version of same plot
# convert data to wide format to put in excel
tbl_wide_format <- tbl %>%
pivot_wider(names_from=Month, values_from=N)
# convert wide format data to pandas dataframe, to pass to python script
tbl_pandas <- r_to_py(tbl_wide_format)
## import python script
source_python("write_xlsx_and_chart_to_file.py")
## save chart using python script
save_time_series_as_xlsx_with_chart(tbl_pandas, "reticulate_pandas_writexlsx_excel_line_chart.xlsx")