我正在尝试尽可能有效地解决此问题,我不知道到目前为止我所获得的是不是最好的选择。你们还有其他选择吗?
tr <-data.table(industry=as.factor(c("a","a","a","b","b","b")), country=c("ch","gb", "us", "gb", "us", "us"), rat1=c(11,41,3,2,5,7), rat2=c(5,4,2,77,2,3))
SummaryStat <- function(tab, rat, var, val){
if (!missing(var) & !missing(val)) {
tab <- tab[eval(var)==val]
}
else{
var = "NA"
val = "NA"
}
#keep only the ratio column
tab <- tab[, get(rat)]
#Subset the tab accordingly to function parameters
summary.result <- data.frame(N=length(tab),
min=min(tab),
max=max(tab),
row.names=rat)
#return the previously produced summary with the quantiles of the ratio
return(summary.result)
}
for (nrat in 1:length(names(tr)[grep("rat", names(tr))])) {
#LOOP ALL THE INDUSTRIES
for (nind in 1:length(levels(tr[, industry]))) {
#print in a .csv file the summary of the ratio for the industry
write.table(SummaryStat(tr, rat=names(tr)[grep("rat", names(tr))][nrat],
var = quote(industry), val = levels(tr[, industry])[nind]),
file="test.csv", sep=";", col.names = NA, append=T)
}
#LOOP ALL THE COUNTRIES
for (ncou in 1:length(levels(tr[, country]))) {
#print in a .csv file the summary of the ratio for the country
write.table(SummaryStat(tr, rat=names(tr)[grep("rat", names(tr))][nrat],
var = quote(country), val = levels(tr[, country])[ncou]),
file="test.csv", sep=";", col.names = NA, append=T)
}
}
我得到的输出正是我想要的(实际上,如果每个函数的列名都不会重复,那会很好),但是我想知道是否可以找到一种更好的方法(在哪里做for循环)。
(以该功能为例,我的参数相同,但是更复杂,我想避免在那里进行任何更改)
答案 0 :(得分:1)
我会尝试立即执行此操作,然后保存输出。我相信这符合您的需求,否则请让我知道:)
# try converting to long format, and then using the by conditions to get
# aggregate views
# melt is used to convert wide to long, splitting columns over combinations
# of the id.vars
tr2 <- melt(tr, id.vars = c("industry", "country"))
# do the aggregations, at (1) industry level, (2) at country level
sol1 <- tr2[, .(N=.N, min=min(value), max=max(value)), by=.(variable, industry)]
sol2 <- tr2[, .(N=.N, min=min(value), max=max(value)), by=.(variable, country)]
# sense check
sol1[]
sol2[]
编辑:抱歉,忘记了N
列。 .N
是用于计数的data.table语法
编辑:评论...
SummaryStat <- function(table, ids){
table <- melt(table, id.vars = ids)
output <- lapply(ids, function(index){
table[, .(N=.N, min=min(value), max=max(value)), by=c("variable", index)]
})
names(output) <- ids
return(output)
}
SummaryStat(tr, c("industry", "country"))