我有以下数据集
Month Year Hotel_Name Reviewer_Score
8 2015 ambassador 7.711111
9 2015 ambassador 8.400000
12 2015 ambassador 9.457890
2 2016 ambassador 8.398733
4 2015 nh hotel 8.934023
7 2015 nh hotel 7.345532
11 2015 nh hotel 6.893445
1 2016 nh hotel 8.834923
我想知道每家酒店的第一个得分和最后得分之间的差异,并在Hotel_Name
的新表中将它们分组。
答案 0 :(得分:1)
我不确定你是否想要第一个和最后一个分数(按日期,最新 - 最早)或第一个和最后一个分数(按值,最大 - 分钟)之间的差异
第一个
library(dplyr)
library(lubridate)
ans1 <- df %>%
group_by(Hotel_Name) %>%
arrange(Hotel_Name, parse_date_time(paste(Month, Year), "my")) %>%
summarise(Diff = abs(last(Reviewer_Score) - first(Reviewer_Score)))
# A tibble: 2 x 2
# Hotel_Name Diff
# <fctr> <dbl>
# 1 ambassador 0.687622
# 2 nh_hotel 0.099100
第二次
ans2 <- df %>%
group_by(Hotel_Name) %>%
summarise(Diff = max(Reviewer_Score) - min(Reviewer_Score))
# A tibble: 2 x 2
# Hotel_Name Diff
# <fctr> <dbl>
# 1 ambassador 1.746779
# 2 nh_hotel 2.040578
您的数据
df <- read.table(text="Month Year Hotel_Name Reviewer_Score
8 2015 ambassador 7.711111
9 2015 ambassador 8.400000
12 2015 ambassador 9.457890
2 2016 ambassador 8.398733
4 2015 nh_hotel 8.934023
7 2015 nh_hotel 7.345532
11 2015 nh_hotel 6.893445
1 2016 nh_hotel 8.834923", header=TRUE)