我试图了解 pivot_wider
中 tidyr
函数的工作原理。我有 bookings
数据和房产数据,我正在尝试确定房产是否对商务旅行者和游客都有吸引力
我试图完成的步骤是:
for_business
转换为级别为 "business"
和 "tourist"
的因子。代码:
bookings %>%
mutate(for_business = factor(for_business, labels = c("business", "tourist"))) %>%
select(property_id, for_business) %>%
mutate(avg_review_score = mean(review_score, na.rm = TRUE)) %>%
ungroup() %>%
pivot_wider(names_from = for_business, values_from = avg_review_score) %>%
mutate(diff = business - tourist) %>%
summarise(avg_diff = mean(diff, na.rm = TRUE))
在此我面临错误:
Error: Problem with `mutate()` input `avg_review_score`. x object 'review_score' not found i Input `avg_review_score` is `mean(review_score, na.rm = TRUE)`.
> dput(head(bookings))
structure(list(booker_id = c("215934017ba98c09f30dedd29237b43dad5c7b5f",
"7f590fd6d318248a48665f7f7db529aca40c84f5", "10f0f138e8bb1015d3928f2b7d828cbb50cd0804",
"7b55021a4160dde65e31963fa55a096535bcad17", "6694a79d158c7818cd63831b71bac91286db5aff",
"d0358740d5f15e85523f94ab8219f25d8c017347"), property_id = c(2668,
4656, 4563, 4088, 2188, 4171), room_nights = c(4, 5, 6, 7, 4,
2), price_per_night = c(91.4669561442773, 106.504997616816, 86.9913739625713,
92.3656155139053, 104.838941902747, 109.981876495045), checkin_day = c("mon",
"tue", "wed", "fri", "tue", "fri"), for_business = c(FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE), status = c("cancelled", "cancelled",
"stayed", "stayed", "stayed", "cancelled"), review_score = c(NA,
NA, 6.25812265672399, 5.953597754693, 6.43474489539585, NA)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
> dput(head(properties))
structure(list(property_id = c(2668, 4656, 4563, 4088, 2188,
4171), destination = c("Brisbane", "Brisbane", "Brisbane", "Brisbane",
"Brisbane", "Brisbane"), property_type = c("Hotel", "Hotel",
"Apartment", "Apartment", "Apartment", "Apartment"), nr_rooms = c(32,
39, 9, 9, 4, 5), facilities = c("airport shuttle,free wifi,garden,breakfast,pool,on-site restaurant",
"on-site restaurant,pool,airport shuttle,breakfast,bbq,free wifi,spa",
"laundry", "kitchen,laundry,free wifi", "parking,kitchen,bbq,free wifi,game console",
"kitchen,pool,laundry,parking,free wifi,garden")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
答案 0 :(得分:2)
该错误基于 select
步骤,其中我们仅选择了两列,而下一个 mutate
步骤需要所选数据集中不存在的列。相反,最好将该列也包含在 select
bookings %>%
mutate(for_business = factor(for_business, levels = c(FALSE, TRUE),
labels = c("business", "tourist"))) %>%
select(property_id, for_business, review_score) %>%
mutate(avg_review_score = mean(review_score, na.rm = TRUE)) %>%
ungroup() %>%
pivot_wider(names_from = for_business, values_from = avg_review_score) %>%
mutate(diff = business - tourist) %>%
summarise(avg_diff = mean(diff, na.rm = TRUE))