我想按两列(部门和产品线)对数据框进行分组,并输出一个新的数据框,该数据框具有每个部门和产品线的选定逻辑值的计数。原始数据的结构如下:
product department line date
apple A big 201707
cherry A midlle 201609
potato B midlle 201801
peach C small 201807
pear B big 201807
日期是数字,其他变量是字符。
我想添加两列x和y,其中x表示日期为2018年,y表示日期为201807。按部门和行分组,并按降序排列。输出数据帧将像:
department line x y
A big 0 0
A middle 0 0
B big 1 1
B midlle 1 0
C small 1 1
我尝试过dplyr。首先,我将原始数据子集化,仅保留部门,行和日期列。然后,我使用factor()将部门和线设置为因子。当我使用str(subdata)时,可以看到部门和部门在要素类中。
最后,我使用group_by,并进行汇总以获取所需的数据帧。但是结果不是我想要的。
DF <- subdata %>%
group_by(department, line) %>%
summarise(x = sum(data$date >= 201800, na.rm = TRUE),
y = sum(data$date == 201807, na.rm = TRUE))
我做错什么了吗?我也尝试过reshape2包,但是我也无法获得想要的东西。我的数据中有2936行。我得到的就是这样:
str(DF)
classes ‘grouped_df’, ‘tb_df’, ‘tb1’ and ‘data.frame’: 1 obs. of 4 variables:
$ department : chr department
$ line : chr line
$ x : int 220
$ y : int 29
我认为也许问题出在部门和生产线变量的因素过程中。由于group_by和summary过程之后的类是“字符”,尽管有“因素”。但是我不知道解决方案。
有人可以帮忙吗?
答案 0 :(得分:0)
我建议事先在原始数据帧上使用ifelse
这样创建x和y列:
df$x <- ifelse(df$date > 201800, 1, 0)
df$y <- ifelse(df$date == 201807, 1, 0)
现在使用dplyr进行总结
library(dplyr)
df_new <- df %>% group_by(department, line) %>% summarise(X = sum(x), Y = sum(y))
答案 1 :(得分:0)
尝试一下:
library(tidyverse)
df<-data.frame(product=as.character(c("apple","cherry","potato","peach","pear")),
department=as.character(c("A","A","B","C","B")),
line=c("big","midlle","midlle","small","big"),
date=as.character(c("201707","201609","201801","201807","201807")))
df%>%
mutate(yr= as.numeric(str_sub(date,1,4)),
x=ifelse(yr==2018,1,0),
y=ifelse(date=="201807",1,0))%>%
group_by(department,line)%>%
summarise(x=sum(x,na.rm = T),
y=sum(y,na.rm = T))
# A tibble: 5 x 4
# Groups: department [?]
department line x y
<fct> <fct> <dbl> <dbl>
1 A big 0 0
2 A midlle 0 0
3 B big 1 1
4 B midlle 1 0
5 C small 1 1
答案 2 :(得分:0)
这是使用grepl
的另一种方法:
library(tidyverse)
result <- data %>%
group_by(department, line) %>%
summarise(x = as.numeric(grepl("2018", date)),
y = as.numeric(grepl("201807", date)))
result
## A tibble: 5 x 4
## Groups: department [?]
# department line x y
# <fct> <fct> <dbl> <dbl>
#1 A big 0 0
#2 A midlle 0 0
#3 B big 1 1
#4 B midlle 1 0
#5 C small 1 1
data <- read.table(header = TRUE, text = "
product department line date
apple A big 201707
cherry A midlle 201609
potato B midlle 201801
peach C small 201807
pear B big 201807")