R函数可根据两列对数据进行细分吗?

时间:2019-11-26 04:21:08

标签: r

我有一个数据集,其中包含房屋的邮政编码和每座房屋的价格。我需要根据邮政编码的平均价格将其分为三个数据集。例如,一组带有最高价格,平均价格和最低价格的邮政编码。

我的想法是根据价格从最低到最高对数据集进行排序,将其分成三份,然后查看每个邮政编码显示最多的位置,但这感觉效率很低。有更好的方法吗?

1 个答案:

答案 0 :(得分:1)

这是使用dplyr的解决方案。这有点冗长,但是可以完成工作。使用group_by可以计算每个邮政编码的平均价格,以便您可以根据昂贵,平均和便宜的邮政编码更精确地进行划分。

library(dplyr)
# Generate sample data
dat <- tibble(postcode = sample(c("5432", "5654", "2342", "1231", "8543", "4324"), 1000, replace = TRUE),
                  price = rnorm(1000, 400000, 50000))

# Work out mean price for each postcode
mean_prices <- dat %>%  
    group_by(postcode) %>% 
    summarise(mean_price = mean(price))

# Find split points for the mean postcode price 
split_points <- quantile(unique(mean_prices$mean_price), (1:3)/3)

# Get the postcodes that are within cheap, middle, or expensive price ranges
cheap_postcodes <- mean_prices %>%     
    filter(mean_price <= split_points[1]) %>%
    pull(postcode)

middle_postcodes <- mean_prices %>%     
    filter(mean_price > split_points[1] & mean_price <= split_points[2]) %>%
    pull(postcode)

expensive_postcodes <- mean_prices %>%     
    filter(mean_price > split_points[2]) %>%
    pull(postcode)

# Create the three datasets 
cheap_third <- dat %>% filter(postcode %in% cheap_postcodes)

middle_third <- dat %>% filter(postcode %in% middle_postcodes)

expensive_third <- dat %>% filter(postcode %in% expensive_postcodes)