遍历r数据帧并将行作为参数传递给函数

时间:2019-08-31 12:22:23

标签: r dplyr purrr

我想遍历一个数据框,并将这些行作为参数传递给一个函数,以汇总名为df3的数据框的总数。

我尝试使用传统的for循环进行代码,但没有结果。

我在https://adv-r.hadley.nz/functionals.html#pmap中查看过pmap

但是我看不到如何将此示例应用于我的代码。

以下是原始数据中的一些数据:

dput(head(df3,n=3))
structure(list(id = c("81", "83", "85"), look_work = c("yes", 
"yes", "yes"), current_work = c("no", "yes", "no"), hf_l5k = c("", 
"", ""), ac_l5k = c("", "", ""), hf_5_10k = c("", "1", "1"), 
    ac_5_10k = c("", "1", "1"), hf_11_20k = c("", "", ""), ac_11_20k = c("", 
    "", ""), hf_21_50k = c("", "", ""), ac_21_50k = c("", "", 
    ""), hf_51_100k = c("", "", ""), ac_51_100k = c("", "", ""
    ), hf_m100k = c("", "", ""), ac_m100k = c("", "", ""), s_l1000 = c("", 
    "", ""), se_l1000 = c("", "", "1"), s_1001_1500 = c("", "1", 
    "1"), se_1001_1500 = c("", "", ""), s_2001_3000 = c("", "", 
    ""), se_2001_3000 = c("", "1", ""), s_3001_4000 = c("", "", 
    ""), se_3001_4000 = c("", "", ""), s_4001_5000 = c("", "", 
    ""), se_4001_5000 = c("", "", ""), s_5001_6000 = c("", "", 
    ""), se_5001_6000 = c("", "", ""), s_m6000 = c("", "", ""
    ), se_m6000 = c("", "", ""), s_n_ans = c("", "", ""), se_n_ans = c("", 
    "", ""), before_work = c("no", "NULL", "yes"), keen_move = c("yes", 
    "yes", "no"), city_size = c("village", "more than 500k inhabitants", 
    "more than 500k inhabitants"), gender = c("male", "female", 
    "female"), age = c("18 - 24 years", "18 - 24 years", "more than 50 years"
    ), education = c("secondary", "vocational", "secondary")), row.names = c(NA, 
3L), class = "data.frame")

以下是参数的数据框hf_names:

structure(list(hf_names = c("hf_l5k", "hf_5_10k", "hf_11_20k", 
"hf_21_50k", "hf_51_100k", "hf_m100k"), job = c("hf_l5k_job", 
"hf_5_10k_job", "hf_11_20k_job", "hf_21_50k_job", "hf_51_100k_job", 
"hf_m100k_job"), tot = c("hf_l5k_tot", "hf_5_10k_tot", "hf_11_20k_tot", 
"hf_21_50k_tot", "hf_51_100k_tot", "hf_m100k_tot")), class = "data.frame", row.names = c(NA, 
-6L))

这是我尝试使用传统for循环的代码:

library(dplyr)

tot_function <- function(df, filter_tot, col_name1, col_name2) {
  # filter desired columns for all jobs
  filter_tot <- df %>% filter(col_name1=="1") %>% 
  summarise(col_name2 = n()) 
}

for (i in seq_along(hf_names3)) {
  tot_function(df3, hf_names3$tot[i], hf_names3$hf_names[i], hf_names3$job[i])

}

预期结果将是数据帧或向量:

hf_l5k_jobs hf_l5_10k_jobs
10               193

但是此代码不会生成任何内容,因为它着眼于trim和runif等简单功能。

1 个答案:

答案 0 :(得分:0)

我认为您不必为此过于复杂。您可以从hf_names中获取名称,从df3中获取该列的子集,然后计算该列中1的数量。

sapply(hf_names$hf_names, function(x) sum(df3[[x]] == 1))

#    hf_l5k   hf_5_10k  hf_11_20k  hf_21_50k hf_51_100k   hf_m100k 
#         0          2          0          0          0          0 

如果您更喜欢tidyverse,则可以将sapply更改为map.*个变体

purrr::map_int(hf_names$hf_names, ~sum(df3[[.]] == 1))