通过遍历小标题列表来修改列

时间:2020-04-09 22:11:04

标签: r loops iteration tibble

我已经从CSV文件创建了8个小标题。每个小标题都有一个公用列{​​{1}}。 person_id中的值是整数,我希望它们是因子。

我正在使用tidyverse

导入

person_id

创建列表

drugs <- as_tibble(read.csv("../raw_data/icu_covid_sample/icu_sample_drugs.csv"))
flowsheet_dirty <- as_tibble(read.csv("../raw_data/icu_covid_sample/icu_sample_flowsheet_dirty.csv"))
measurements_clean <- as_tibble(read.csv("../raw_data/icu_covid_sample/icu_sample_measurements_clean.csv"))
measurements_dirty <- as_tibble(read.csv("../raw_data/icu_covid_sample/icu_sample_measurements_dirty.csv"))
procedures_cpt <- as_tibble(read.csv("../raw_data/icu_covid_sample/icu_sample_procedures_cpt.csv"))
vent_dirty <- as_tibble(read.csv("../raw_data/icu_covid_sample/icu_sample_vent_dirty.csv"))
visits <- as_tibble(read.csv("../raw_data/icu_covid_sample/icu_sample_visits.csv"))
person <- as_tibble(read.csv("../raw_data/icu_covid_sample/sample_icu_person.csv"))

一些输出:

data_list <- list(drugs = drugs, flowsheet_dirty = flowsheet_dirty, measurements_clean = measurements_clean, measurements_dirty = measurements_dirty, procedures_cpt = procedures_cpt, vent_dirty = vent_dirty, visits = visits, person = person)

列表中的每个小标题都有“ person_id”列

例如药物$ person_id,访问次数$ person_id等。

> summary(data_list)
                   Length Class  Mode
drugs              12     tbl_df list
flowsheet_dirty    38     tbl_df list
measurements_clean 13     tbl_df list
measurements_dirty 13     tbl_df list
procedures_cpt      4     tbl_df list
vent_dirty         12     tbl_df list
visits             18     tbl_df list
person             39     tbl_df list

我想使用for循环进行迭代,以将每个person_id列转换为因子数据,而不是整数。更笼统地说,我想知道如何通过将它们放在列表中来将功能应用于一组小标题。

> seq_along(data_list)
[1] 1 2 3 4 5 6 7 8

错误输出:


for (i in seq_along(data_list)) {
  data_list[i]$person_id <- as.factor(data_list[i]$person_id)
}

一个测试(必须在循环错误之前完成)

number of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement length

这也不起作用:

data_list$drugs$person_id <- as.factor(data_list$drugs$person_id)

> is.factor(data_list$drugs$person_id)
[1] TRUE
> is.factor(data_list$visit$person_id)
[1] FALSE

错误:


for (i in seq_along(data_list)) {
  data_list[[i]]$person_id <- as.factor(data_list[[i]]$person_id)
}

所以我知道用8条命令我可以将person_id列转换为整数,但是在循环中遇到麻烦。另外,也许mutate()可以对我有所帮助,但是我希望在迭代时比较容易。此外,我不确定我的data_list是否应为列表。也许它应该是向量或其他。任何帮助表示赞赏。

1 个答案:

答案 0 :(得分:1)

在这里,我们需要使用[[而不是[进行提取,因为[返回的是长度为1的list,并且不会提取数据

for (i in seq_along(data_list)) {
   data_list[[i]][["person_id"]] <- as.factor(data_list[[i]][["person_id"]])
  }

根据显示的错误,可能有一些数据集没有“ person_id”。在这种情况下,我们可以检查“ person_id”,如果存在,请执行以下操作

for (i in seq_along(data_list)) {
   i1 <-  'person_id' %in% names(data_list[[i]])
   if(i1) {
    data_list[[i]]$person_id <- as.factor(data_list[[i]]$person_id)  
    }
}

作为可重复的示例

lst1 <- list(as_tibble(head(mtcars)) %>% 
            mutate(person_id = 1:6), 
            as_tibble(head(iris)) %>%
               mutate(person_id = 1:6))
for(i in seq_along(lst1))  lst1[[i]]$person_id <- as.factor(lst1[[i]]$person_id)
is.factor(lst1[[1]]$person_id)
#[1] TRUE

或者可以通过lapply

完成
data_list <- lapply(data_list, transform, person_id = as.factor(person_id))

或者另一个选择是map

library(dplyr)
library(purrr)
data_list <- map(data_list, ~ .x %>% 
                      person_id = factor(person_id))