Question

我试图了解 pivot_wider 中 tidyr 函数的工作原理。我有 bookings 数据和房产数据，我正在尝试确定房产是否对商务旅行者和游客都有吸引力

我试图完成的步骤是：

首先，将列 for_business 转换为级别为 "business" 和 "tourist" 的因子。
对于每家酒店以及分别为商务旅客和游客计算平均评分。
然后，计算商务旅行者和游客之间的平均评论得分差异。

代码：

bookings %>%
  mutate(for_business = factor(for_business, labels = c("business", "tourist"))) %>%
  select(property_id, for_business) %>%
  mutate(avg_review_score = mean(review_score, na.rm = TRUE)) %>%
  ungroup() %>%
  pivot_wider(names_from = for_business, values_from = avg_review_score) %>%
  mutate(diff = business - tourist) %>%
  summarise(avg_diff = mean(diff, na.rm = TRUE))

在此我面临错误：

Error: Problem with `mutate()` input `avg_review_score`. x object 'review_score' not found i Input `avg_review_score` is `mean(review_score, na.rm = TRUE)`.

> dput(head(bookings))
structure(list(booker_id = c("215934017ba98c09f30dedd29237b43dad5c7b5f", 
"7f590fd6d318248a48665f7f7db529aca40c84f5", "10f0f138e8bb1015d3928f2b7d828cbb50cd0804", 
"7b55021a4160dde65e31963fa55a096535bcad17", "6694a79d158c7818cd63831b71bac91286db5aff", 
"d0358740d5f15e85523f94ab8219f25d8c017347"), property_id = c(2668, 
4656, 4563, 4088, 2188, 4171), room_nights = c(4, 5, 6, 7, 4, 
2), price_per_night = c(91.4669561442773, 106.504997616816, 86.9913739625713, 
92.3656155139053, 104.838941902747, 109.981876495045), checkin_day = c("mon", 
"tue", "wed", "fri", "tue", "fri"), for_business = c(FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE), status = c("cancelled", "cancelled", 
"stayed", "stayed", "stayed", "cancelled"), review_score = c(NA, 
NA, 6.25812265672399, 5.953597754693, 6.43474489539585, NA)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

> dput(head(properties))
structure(list(property_id = c(2668, 4656, 4563, 4088, 2188, 
4171), destination = c("Brisbane", "Brisbane", "Brisbane", "Brisbane", 
"Brisbane", "Brisbane"), property_type = c("Hotel", "Hotel", 
"Apartment", "Apartment", "Apartment", "Apartment"), nr_rooms = c(32, 
39, 9, 9, 4, 5), facilities = c("airport shuttle,free wifi,garden,breakfast,pool,on-site restaurant", 
"on-site restaurant,pool,airport shuttle,breakfast,bbq,free wifi,spa", 
"laundry", "kitchen,laundry,free wifi", "parking,kitchen,bbq,free wifi,game console", 
"kitchen,pool,laundry,parking,free wifi,garden")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

Answer 1

该错误基于 select 步骤，其中我们仅选择了两列，而下一个 mutate 步骤需要所选数据集中不存在的列。相反，最好将该列也包含在 select

bookings %>%
  mutate(for_business = factor(for_business, levels = c(FALSE, TRUE), 
      labels = c("business", "tourist"))) %>%
 select(property_id, for_business, review_score) %>%
  mutate(avg_review_score = mean(review_score, na.rm = TRUE)) %>%
  ungroup() %>%
  pivot_wider(names_from = for_business, values_from = avg_review_score) %>%
  mutate(diff = business - tourist) %>%
  summarise(avg_diff = mean(diff, na.rm = TRUE))

tidyr 中的 Pivot_wider() 函数

1 个答案: