d1 <- data.frame(col_one = c(1,2,3),col_two = c(4, 5, 6))
d2 <- data.frame(col_one = c(1, 1, 1), col_two = c(6, 5, 4))
d3 <- data.frame(col_one = c(7, 1, 1), col_two = c(8, 5, 4))
my.list <- list(d1, d2,d3)
for (i in 1:3) {
table<- lapply(my.list, function(data, count) {
sql <-
#sqldf(
paste0(
"select *,count(col_one) from data where col_one = ",
count," group by col_one"
)
#)
print(sql)
},
count = i)
}
输出:
[1] "select *,count(col_one) from data where col_one = 1 group by col_one"
[1] "select *,count(col_one) from data where col_one = 1 group by col_one"
[1] "select *,count(col_one) from data where col_one = 1 group by col_one"
[1] "select *,count(col_one) from data where col_one = 2 group by col_one"
[1] "select *,count(col_one) from data where col_one = 2 group by col_one"
[1] "select *,count(col_one) from data where col_one = 2 group by col_one"
[1] "select *,count(col_one) from data where col_one = 3 group by col_one"
[1] "select *,count(col_one) from data where col_one = 3 group by col_one"
[1] "select *,count(col_one) from data where col_one = 3 group by col_one"
期待:
[1] "select *,count(col_one) from data where col_one = 1 group by col_one"
[1] "select *,count(col_one) from data where col_one = 2 group by col_one"
[1] "select *,count(col_one) from data where col_one = 3 group by col_one"
我怎么能改善?我希望运行SQL来创建我想要的新数据集,但它不成功,我可以指定知道与SQL语句相关的列表索引。还有另一种简单的方法吗?
我尝试了其中一种方法。
d1 <- data.frame(col_one = c(1,2,3),col_two = c(4, 5, 6))
d2 <- data.frame(col_one = c(3, 2, 1), col_two = c(6, 5, 4))
d3 <- data.frame(col_one = c(7, 2, 1), col_two = c(8, 5, 4))
my.list <- list(d1, d2,d3)
seq_along(x)
#for (i in 1:3) {
table<- lapply(seq_along(my.list), function(index) {
sql <-
sqldf(
paste0(
"select *,count(col_one) from my.list where col_one = ",
index," group by col_one"
)
)
print(sql)
})
#}
输出:
[1] "select *,count(col_one) from my.list where col_one = 1 group by col_one"
[1] "select *,count(col_one) from my.list where col_one = 2 group by col_one"
[1] "select *,count(col_one) from my.list where col_one = 3 group by col_one"
但是,它不会找到运行SQL的数据集。
d1 <- data.frame(col_one = c(1,2,3),col_two = c(4, 5, 6))
d2 <- data.frame(col_one = c(1, 1, 1), col_two = c(6, 5, 4))
d3 <- data.frame(col_one = c(7, 1, 1), col_two = c(8, 5, 4))
my.list <- list(d1, d2,d3)
table<- mapply(function(data, count) {
sql <-
sqldf(
paste0(
"select *,count(col_one) from data where col_one = ",
count," group by col_one"
)
)
print(sql)
}, my.list, 1
)
答案 0 :(得分:1)
您需要同时迭代data
和counts
。在tidyverse
中,我建议使用purrr :: map2(),但在基数R中你可以简单地执行:'
table<- mapply(function(data, count) {
sql <-
#sqldf(
paste0(
"select *,count(col_one) from data where col_one = ",
count," group by col_one"
)
#)
print(sql)
}, my.list, 1:3
)
[1] "select *,count(col_one) from data where col_one = 1 group by col_one"
[1] "select *,count(col_one) from data where col_one = 2 group by col_one"
[1] "select *,count(col_one) from data where col_one = 3 group by col_one"
答案 1 :(得分:1)
如果我理解正确,OP希望为col_one
中的每个data.frame创建my.list
列联表,即,他想知道每个值1的次数, 2或3出现在每个data.frame的col_one
中。
正如my answer至another question of the OP及G. Grothendieck所建议的那样,将data.frames与相同结构组合在一起,几乎总是更好data.table比将它们分开放在列表中。顺便说一句,OP还有第三个question ("how to loop the dataframe using sqldf?")请求data.frames列表的帮助。
要将data.frames组合在一个大型data.table中,使用rbindlist()
函数。请注意,添加的id列df
标识每行的原始data.frame。
library(data.table)
rbindlist(my.list, idcol = "df")
df col_one col_two 1: 1 1 4 2: 1 2 5 3: 1 3 6 4: 2 1 6 5: 2 1 5 6: 2 1 4 7: 3 7 8 8: 3 1 5 9: 3 1 4
现在我们可以轻松计算聚合:
rbindlist(my.list, idcol = "df")[, count_col_one := .N, by = .(df, col_one)][]
df col_one col_two count_col_one 1: 1 1 4 1 2: 1 2 5 1 3: 1 3 6 1 4: 2 1 6 3 5: 2 1 5 3 6: 2 1 4 3 7: 3 7 8 1 8: 3 1 5 2 9: 3 1 4 2
此data.table
语句通过使用特殊符号col_one
和按df
进行分组来计算每个.N
中每个df
个别值的出现次数col_one
。
在问题中,OP仅要求计算col_one
中出现的1,2或3次。如果确实如此,则需要删除值7。这可以通过过滤结果来完成:
rbindlist(my.list, idcol = "df")[, count_col_one := .N, by = .(df, col_one)][
col_one %in% 1:3]