我正在尝试找到df1中有多少用户满足df2中指定的条件的总数,但是不断收到错误消息。
df1看起来像这样:
IntSummaryStatistics stats = Arrays.stream(arr).summaryStatistics();
System.out.println
((stats.getSum() - stats.getMax()) + " " + (stats.getSum() - stats.getMin()));
df2看起来像这样:
id step1 step2
1 session_start NA
2 session_start NA
3 session_start sign_up
4 session_start sign_up
5 session_start sign_up
6 sign_up session_start
df1 <- Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 6 obs. of 3 variables:
$ id : chr "1" "2" "3" "4" ...
$ step1: chr "session_start" "session_start" "session_start" "session_start" ...
$ step2: chr NA NA "sign_up" "sign_up" ..
在 step1 step2 count
session_start sign_up 0
sign_up in_screen 0
in_screen click_banner 0
session_stop session_stop 0
df2 <- structure(c("session_start", "sign_up", "0", "sign_up",
"in_screen", "0", "in_screen", "click_banner", "0", "session_stop",
"session_stop", "0", .Dim = c(3L, 4L), .Dimnames = list(c("step1", "step2",
"count"), NULL))
列中,我想显示有多少个(总数)用户按此顺序完成了df2$count
和df2$step1
。在上面的示例代码中,df2$step2
的第一行将输出 3 ,因为df1中的 3 用户以df2$count
的身份完成了session_start
并且df1$step1
为sign_up
。
当我尝试使用此代码手动执行此操作时,一切正常:
df1step2
但是,当我用动态值替换“ session_start”和“ sign_up”时,出现错误“ test8 $ step1:$运算符对原子向量无效”:
count <- sum(df1$step1 == "session_start" & df1$step2 == "sign_up", na.rm = TRUE)
我尝试将“ $”替换为“ []”,但仍然收到“错误:列df2$count <- sum(df1$step1 == df2$step1 & df1$step2 == df2$step2, na.rm = TRUE)
,session_start
,sign_up
,in_screen
,{{1} }找不到”:
click_banner
我希望能够将额外的列添加到数据中,如下所示。你能帮忙吗?如果是的话,非常感谢!
session_stop
答案 0 :(得分:3)
您可以使用mapply
并计算在step1
中完成的step2
和df1
个值的数量。
df2$count <- mapply(function(x, y)
sum(df1$step1 == x & df1$step2 == y, na.rm = TRUE), df2$step1, df2$step2)
df2
# step1 step2 count
#1 session_start sign_up 3
#2 sign_up in_screen 0
#3 in_screen click_banner 0
#4 session_stop session_stop 0
数据
df1 <- structure(list(id = c("1", "2", "3", "4", "5", "6"),
step1 = c("session_start", "session_start", "session_start",
"session_start", "session_start",
"sign_up"), step2 = c(NA, NA, "sign_up", "sign_up", "sign_up",
"session_start")), .Names = c("id", "step1", "step2"), row.names = c(NA,
-6L), class = "data.frame")
df2 <- structure(list(step1 = c("session_start", "sign_up", "in_screen",
"session_stop"), step2 = c("sign_up", "in_screen", "click_banner",
"session_stop")), .Names = c("step1", "step2"), row.names = c(NA,
-4L), class = "data.frame")
答案 1 :(得分:3)
这是一个tidyverse
解决方案。
library(tidyverse)
df2 %>%
group_by(step1, step2) %>%
mutate(count = sum(step1 == df1$step1 & step2 == df1$step2, na.rm = TRUE))
## A tibble: 4 x 3
## Groups: step1, step2 [4]
# step1 step2 count
# <chr> <chr> <int>
#1 session_start sign_up 3
#2 sign_up in_screen 0
#3 in_screen click_banner 0
#4 session_stop session_stop 0
请注意,除了mutate
之外,您还可以使用summarise
,但是输出行的顺序将有所不同。