我希望按加权数据的组计算两种频率表。
您可以使用以下代码生成可重现的数据:
Data <- data.frame(
country = sample(c("France", "USA", "UK"), 100, replace = TRUE),
migrant = sample(c("Native", "Foreign-born"), 100, replace = TRUE),
gender = sample (c("men", "women"), 100, replace = TRUE),
wgt = sample(100),
year = sample(2006:2007)
)
首先,我尝试按国家和年份计算移民身份的频率表(Native VS Foreign-born)。我使用包questionr
和plyr
编写了以下代码:
db2006 <- subset (Data, year == 2006)
db2007 <- subset (Data, year == 2007)
result2006 <- as.data.frame(cprop(wtd.table(db2006$migrant, db2006$country, weights=db2006$wgt),total=FALSE))
result2007 <- as.data.frame(cprop(wtd.table(db2007$migrant, db2007$country, weights=db2007$wgt),total=FALSE))
result2006<-rename (result2006, c(Freq = "y2006"))
result2007<-rename (result2007, c(Freq = "y2007"))
result <- merge(result2006, result2007, by = c("Var1","Var2"))
在我的真实数据库中,我有10年的时间,所以这些代码需要多年才能应用。有谁知道更快的方法吗?
我还希望按国家和年份计算移民身份中的男女比例。我正在寻找类似的东西:
Var1 Var2 Var3 y2006 y2007
Foreign born France men 52 55
Foreign born France women 48 45
Native France men 51 52
Native France women 49 48
Foreign born UK men 60 65
Foreign born UK women 40 35
Native UK men 48 50
Native UK women 52 50
有没有人知道如何获得这些结果?
答案 0 :(得分:1)
你可以这样做:用你已经写过的代码制作一个函数;使用lapply
在数据中的所有年份迭代该函数;然后使用Reduce
和merge
将结果列表折叠为一个数据框。像这样:
# let's make your code into a function called 'tallyho'
tallyho <- function(yr, data) {
require(dplyr)
require(questionr)
DF <- filter(data, year == yr)
result <- with(DF, as.data.frame(cprop(wtd.table(migrant, country, weights = wgt), total = FALSE)))
# rename the last column by year
names(result)[length(names(result))] <- sprintf("y%s", year)
return(result)
}
# now iterate that function over all years in your original data set, then
# use Reduce and merge to collapse the resulting list into a data frame
NewData <- lapply(unique(Data$year), function(x) tallyho(x, Data)) %>%
Reduce(function(...) merge(..., all=T), .)