我在附近列出的房屋销售数据列表为
address, listingdate, saledate 101 Street, 2017/01/01, 2017/06/06 106 Street, 2017/03/01, 2017/08/11 102 Street, 2017/05/04, 2017/06/13 109 Street, 2017/07/04, 2017/11/24 ...
我想计算在上市日期挂牌出售(而非出售)的房屋数量,也可以查看全年的房屋销售和挂牌量变化。
在示例中:
address, listingdate, saledate, inventory 101 Street, 2017/01/01, 2017/06/06, 1 106 Street, 2017/03/01, 2017/08/11, 2 102 Street, 2017/05/04, 2017/06/13, 3 109 Street, 2017/07/04, 2017/11/24, 2 ...
例如仅出售109和109街时,列出了109街。
有一个简单的1步R表达式可以计算出来吗?
答案 0 :(得分:1)
我想这是3个简单的步骤。我将设置标准,我敢肯定其他人将能够进入该标准。
library(data.table)
library(lubridate)
dt <- data.table(
address = paste(c(101,106,102,109),"Street"),
listing_date = ymd(c('2017/01/01','2017/03/01','2017/05/04','2017/07/04')),
saledate = ymd(c("2017/06/06","2017/08/11","2017/06/13","2017/11/24")),
key = 'listing_date'))
dt2 <- rbind(dt[,.(date = listing_date, x = 1)], dt[,.(date = saledate, x = -1)])
dt3 <- dt2[, .(x = sum(x)), keyby = date][, .(date, inventory = cumsum(x))]
dt[, inventory := dt3[dt, on=c('date' = 'listing_date'), inventory]]
或者作为一线客
dt[,inventory:=dt[,.(d=listing_date:saledate),.(address)][,.N,key=d][dt,N]]
dt[]
#> address listing_date saledate inventory
#> 1: 101 Street 2017-01-01 2017-06-06 1
#> 2: 106 Street 2017-03-01 2017-08-11 2
#> 3: 102 Street 2017-05-04 2017-06-13 3
#> 4: 109 Street 2017-07-04 2017-11-24 2
答案 1 :(得分:0)
由于data.table和tibbles之间的不兼容,我无法使用特定的解决方案,但是通用算法非常有启发性。我可以通过一些更改将总体思路转换为tidyverse领域
# import data from data file
homesale_file = "Home sales data.csv"
homesales <- read_csv(homesale_file,
col_types = cols(listingdate = col_date(format = "%m/%d/%Y"),
saledate = col_date(format = "%m/%d/%Y")
)
)
#
# calculation for inventory
#
listingdate <- tibble(address=homesales$address, listingdate=homesales$listingdate, type="listing",y=1)
saledate <- tibble(address=homesales$address, listingdate=homesales$saledate, type="sale", y=-1)
summation = bind_rows(listingdate, saledate) %>% arrange(listingdate) %>% mutate(inventory=cumsum(y)) %>% select(-y) %>% filter(type=="listing")
homesales <- homesales %>% inner_join(summation) %>% select(-type)
@pseudopin,谢谢您的帮助!