我正在研究R上的一个小项目,我的目标是在我的数据框架中按Site
创建多个Excel文件。数据框由调查中的注释组成,其中每行代表给定站点的响应。共有10列,第一列为Site
,另外9列为每个主题。
这些注释列可以分组为以下块 -
第1区:总体 =座位+装饰+接待+厕所
第2区:舒适&速度 =舒适+速度
第3区:运营 =效率+礼貌+响应能力
可重现的数据框看起来像这样
#Load libraries
library(dplyr)
library(xlsx)
#Reproducible Data Frame
df=data.frame(Site=c("Tokyo Harbor","Tokyo Harbor","Tokyo Harbor","Arlington","Arlington","Cairo Skyline","Cairo Skyline"),
Seating=c("comfy never a problem to find","difficult","ease and quick","nobody to help","nice n comfy","old seats","nt bad"),
Decor=c("very beautiful","i loved it!!!","nice","great","nice thanks","no response","yea nice"),
Reception=c("always neat","I wasn't happy with the decor on this site","great!","immaculate","happy very helpful","","I wont bother again"),
Toilets=c("well maintained","nicely managed","long queues could do better","","cleaner toilets needed!","no toilet roll in the mens loo","flush for god's sake!!!"),
Comfort=c("very comfortable and heated","I felt like I was home","","couldn't be better","very nice and kush","not comment","fresh eyes needed"),
Speed=c("rapid service","no delays ever got everything I needed on time","","","I have grown accustomed to the speed of service","machines","super duper quick"),
Efficiency=c("very efficient, the servers were great","spot on","","I was quite disappointed in the efficiency","clockwork","parfait",""),
Courtesy=c("Staff were very polite","smiling faces everywhere, loved it","very welcoming and kind","the hostess was a bit rude","trés impoli","noo",""),
Responsiveness=c("On the ball all the time","super quick whenever help was needed","","","","want more service like this",""))
#Transform all columns with empty cells to NAs
df[df==""] <- NA
我的目标
对于每个站点,创建一个Excel文件,其中注释分组为块(如上所述)。 Excel文件中的每个工作表代表一个块,因此总共有三个工作表。
更详细:
步骤1 - 对于每个站点,将注释组合成三个块,并过滤掉注释。
步骤2 - 使用三张纸写入Excel文件,每张纸用于给定块
我希望以下列格式保存Excel文件 -
COMMENTS_ SITENAME _2017.xlsx
所以对于这个df
,所需的输出将是三个Excel文件,因为有三个站点......
COMMENTS_Tokyo Harbor_2017.xlsx
COMMENTS_Arlington_2017.xlsx
COMMENTS_Cairo Skyline_2017.xlsx
我的尝试
我开始定义我的块,后来我用它来过滤掉评论
###########################
#STEP 1: Define the blocks
#Block 1: Overall = Seating + Decor + Reception + Toilets
BlockOverall=c(names(df)[2],names(df)[3],names(df)[4],names(df)[5])
#Block 2: Comfort & Speed = Comfort + Speed
BlockComfortSpeed=c(names(df)[6],names(df)[7])
#Block 3: Operations = Efficiency + Courtesy + Responsiveness
BlockOps=c(names(df)[8],names(df)[9],names(df)[10])
然后我根据这些块对注释进行分组,并过滤掉数据
###############################################
#STEP 2: Group comments based on defined blocks
#Group Overall
Data_Overall= df %>%
select(BlockOverall)
Data_Overall = Data_Overall %>%
do(.,data.frame(Comments_Overall=unlist(Data_Overall,use.names = F))) %>%
filter(complete.cases(.))
#Group Comfort & Speed
Data_ComfortSpeed= df %>%
select(BlockComfortSpeed)
Data_ComfortSpeed = Data_ComfortSpeed %>%
do(.,data.frame(Comments_ComfortSpeed=unlist(Data_ComfortSpeed,use.names = F))) %>%
filter(complete.cases(.))
#Group Operations
Data_Operations= df %>%
select(BlockOps)
Data_Operations = Data_Operations %>%
do(.,data.frame(Comments_Operations=unlist(Data_Operations,use.names = F)))
%>%
filter(complete.cases(.))
最后,我将数据写入Excel
#Write each group to an individual tab in an Excel file
library(xlsx)
write.xlsx(Data_Overall,"Comments_Global_2017.xlsx",sheetName =
'Overall',row.names = F) #Tab 1
write.xlsx(Data_ComfortSpeed,"Comments_Global_2017.xlsx",sheetName =
'Comfort_&_Speed',row.names = F,append = T) #Tab 2
write.xlsx(Data_Operations,"Comments_Global_2017.xlsx",sheetName =
'Operations',row.names = F,append = T) #Tab 3
在全球范围内,这很好用。我无法弄清楚如何将其转换为for
循环,循环遍历数据框中的所有网站并生成网站级Excel文件。
作为新手程序员,任何指针或建议都会受到高度重视!
答案 0 :(得分:1)
如果您使用purrr
中的tidyverse
,则可以避免使用for循环。
如果您使用上面的代码并将其包装到基本函数中,您可以使用purrr::map
迭代每个站点名称的函数。
#Load libraries
library(dplyr)
library(xlsx)
library(purrr)
#Reproducible Data Frame
df=data.frame(Site=c("Tokyo Harbor","Tokyo Harbor","Tokyo Harbor","Arlington","Arlington","Cairo Skyline","Cairo Skyline"),
Seating=c("comfy never a problem to find","difficult","ease and quick","nobody to help","nice n comfy","old seats","nt bad"),
Decor=c("very beautiful","i loved it!!!","nice","great","nice thanks","no response","yea nice"),
Reception=c("always neat","I wasn't happy with the decor on this site","great!","immaculate","happy very helpful","","I wont bother again"),
Toilets=c("well maintained","nicely managed","long queues could do better","","cleaner toilets needed!","no toilet roll in the mens loo","flush for god's sake!!!"),
Comfort=c("very comfortable and heated","I felt like I was home","","couldn't be better","very nice and kush","not comment","fresh eyes needed"),
Speed=c("rapid service","no delays ever got everything I needed on time","","","I have grown accustomed to the speed of service","machines","super duper quick"),
Efficiency=c("very efficient, the servers were great","spot on","","I was quite disappointed in the efficiency","clockwork","parfait",""),
Courtesy=c("Staff were very polite","smiling faces everywhere, loved it","very welcoming and kind","the hostess was a bit rude","trés impoli","noo",""),
Responsiveness=c("On the ball all the time","super quick whenever help was needed","","","","want more service like this",""))
#Transform all columns with empty cells to NAs
df[df==""] <- NA
export_site_data <- function(site.name){
###########################
#STEP 0: filter by block site
df <- df %>% filter(Site %in% site.name)
###########################
#STEP 1: Define the blocks
#Block 1: Overall = Seating + Decor + Reception + Toilets
BlockOverall=c(names(df)[2],names(df)[3],names(df)[4],names(df)[5])
#Block 2: Comfort & Speed = Comfort + Speed
BlockComfortSpeed=c(names(df)[6],names(df)[7])
#Block 3: Operations = Efficiency + Courtesy + Responsiveness
BlockOps=c(names(df)[8],names(df)[9],names(df)[10])
###############################################
#STEP 2: Group comments based on defined blocks
#Group Overall
Data_Overall= df %>%
select(BlockOverall)
Data_Overall = Data_Overall %>%
do(.,data.frame(Comments_Overall=unlist(Data_Overall,use.names = F))) %>%
filter(complete.cases(.))
#Group Comfort & Speed
Data_ComfortSpeed= df %>%
select(BlockComfortSpeed)
Data_ComfortSpeed = Data_ComfortSpeed %>%
do(.,data.frame(Comments_ComfortSpeed=unlist(Data_ComfortSpeed,use.names = F))) %>%
filter(complete.cases(.))
#Group Operations
Data_Operations= df %>%
select(BlockOps)
Data_Operations = Data_Operations %>%
do(.,data.frame(Comments_Operations=unlist(Data_Operations,use.names = F))) %>% filter(complete.cases(.))
library(xlsx)
write.xlsx(Data_Overall, paste0("Comments_",site.name,"_2017.xlsx"), sheetName =
'Overall',row.names = F) #Tab 1
write.xlsx(Data_ComfortSpeed, paste0("Comments_",site.name,"_2017.xlsx"), sheetName =
'Comfort_&_Speed',row.names = F,append = T) #Tab 2
write.xlsx(Data_Operations, paste0("Comments_",site.name,"_2017.xlsx"), sheetName =
'Operations',row.names = F,append = T) #Tab 3
}
site.name <- unique(df$Site)
site.name %>% map(export_site_data )
list.files(pattern = "Comments_")
[1] "Comments_Arlington_2017.xlsx" "Comments_Cairo Skyline_2017.xlsx"
[3] "Comments_Tokyo Harbor_2017.xlsx"