我已经从CSV文件创建了8个小标题。每个小标题都有一个公用列{{1}}。 person_id
中的值是整数,我希望它们是因子。
我正在使用tidyverse
导入
person_id
创建列表
drugs <- as_tibble(read.csv("../raw_data/icu_covid_sample/icu_sample_drugs.csv"))
flowsheet_dirty <- as_tibble(read.csv("../raw_data/icu_covid_sample/icu_sample_flowsheet_dirty.csv"))
measurements_clean <- as_tibble(read.csv("../raw_data/icu_covid_sample/icu_sample_measurements_clean.csv"))
measurements_dirty <- as_tibble(read.csv("../raw_data/icu_covid_sample/icu_sample_measurements_dirty.csv"))
procedures_cpt <- as_tibble(read.csv("../raw_data/icu_covid_sample/icu_sample_procedures_cpt.csv"))
vent_dirty <- as_tibble(read.csv("../raw_data/icu_covid_sample/icu_sample_vent_dirty.csv"))
visits <- as_tibble(read.csv("../raw_data/icu_covid_sample/icu_sample_visits.csv"))
person <- as_tibble(read.csv("../raw_data/icu_covid_sample/sample_icu_person.csv"))
一些输出:
data_list <- list(drugs = drugs, flowsheet_dirty = flowsheet_dirty, measurements_clean = measurements_clean, measurements_dirty = measurements_dirty, procedures_cpt = procedures_cpt, vent_dirty = vent_dirty, visits = visits, person = person)
列表中的每个小标题都有“ person_id”列
例如药物$ person_id,访问次数$ person_id等。
> summary(data_list)
Length Class Mode
drugs 12 tbl_df list
flowsheet_dirty 38 tbl_df list
measurements_clean 13 tbl_df list
measurements_dirty 13 tbl_df list
procedures_cpt 4 tbl_df list
vent_dirty 12 tbl_df list
visits 18 tbl_df list
person 39 tbl_df list
我想使用for循环进行迭代,以将每个person_id列转换为因子数据,而不是整数。更笼统地说,我想知道如何通过将它们放在列表中来将功能应用于一组小标题。
> seq_along(data_list)
[1] 1 2 3 4 5 6 7 8
错误输出:
for (i in seq_along(data_list)) {
data_list[i]$person_id <- as.factor(data_list[i]$person_id)
}
一个测试(必须在循环错误之前完成)
number of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement length
这也不起作用:
data_list$drugs$person_id <- as.factor(data_list$drugs$person_id)
> is.factor(data_list$drugs$person_id)
[1] TRUE
> is.factor(data_list$visit$person_id)
[1] FALSE
错误:
for (i in seq_along(data_list)) {
data_list[[i]]$person_id <- as.factor(data_list[[i]]$person_id)
}
所以我知道用8条命令我可以将person_id列转换为整数,但是在循环中遇到麻烦。另外,也许mutate()可以对我有所帮助,但是我希望在迭代时比较容易。此外,我不确定我的data_list是否应为列表。也许它应该是向量或其他。任何帮助表示赞赏。
答案 0 :(得分:1)
在这里,我们需要使用[[
而不是[
进行提取,因为[
返回的是长度为1的list
,并且不会提取数据
for (i in seq_along(data_list)) {
data_list[[i]][["person_id"]] <- as.factor(data_list[[i]][["person_id"]])
}
根据显示的错误,可能有一些数据集没有“ person_id”。在这种情况下,我们可以检查“ person_id”,如果存在,请执行以下操作
for (i in seq_along(data_list)) {
i1 <- 'person_id' %in% names(data_list[[i]])
if(i1) {
data_list[[i]]$person_id <- as.factor(data_list[[i]]$person_id)
}
}
作为可重复的示例
lst1 <- list(as_tibble(head(mtcars)) %>%
mutate(person_id = 1:6),
as_tibble(head(iris)) %>%
mutate(person_id = 1:6))
for(i in seq_along(lst1)) lst1[[i]]$person_id <- as.factor(lst1[[i]]$person_id)
is.factor(lst1[[1]]$person_id)
#[1] TRUE
或者可以通过lapply
data_list <- lapply(data_list, transform, person_id = as.factor(person_id))
或者另一个选择是map
library(dplyr)
library(purrr)
data_list <- map(data_list, ~ .x %>%
person_id = factor(person_id))