我有一个通过dput
构成的数据框:
structure(list(railroad = c("bnsf railway company", "bnsf railway company",
"bnsf railway company", "bnsf railway company", "bnsf railway company",
"bnsf railway company", "bnsf railway company", "bnsf railway company",
"union pacific railroad", "union pacific railroad", "union pacific railroad",
"union pacific railroad", "union pacific railroad", "union pacific railroad",
"union pacific railroad", "union pacific railroad"), measure =
c("cars.owned.by",
"cars.owned.by", "cars.type", "cars.type", "cars.type", "train.speed",
"train.speed", "terminal.dwell", "cars.owned.by", "cars.owned.by",
"cars.type", "cars.type", "cars.type", "train.speed", "train.speed",
"terminal.dwell"), category = c("system", "private", "box", "intermodal",
"total", "intermodal", "all.trains", "entire.railroad", "system",
"private", "box", "intermodal", "total", "intermodal", "all.trains",
"entire.railroad"), irm = c(201510L, 201510L, 201510L, 201510L,
201510L, 201510L, 201510L, 201510L, 201510L, 201510L, 201510L,
201510L, 201510L, 201510L, 201510L, 201510L), mean = c(66623,
149937.333, 11395, 16499, 236866, 33.3, 24.5, 25.267, 57618.333,
195764.667, 22229.333, 14135.333, 293164.333, 31.933, 26.6, 27.6
)), row.names = c(1L, 3L, 6L, 9L, 14L, 15L, 20L, 32L, 127L, 129L,
132L, 135L, 140L, 141L, 146L, 160L), class = "data.frame")
我想做的是以下事情:
为measure
和category
的每种组合创建单独的数据帧,
通过粘贴measure
和category
分隔"."
来命名。因此第一个数据帧将被称为cars.owned.by.system
,依此类推。
将每个数据帧的第五列mean
重命名为数据帧本身的名称。因此,对于第一个数据帧,它将为colnames(df)[5] <- cars.owned.by.system
。
所需的输出是8个独立的数据帧,如上所述。
我尝试了以下操作:
cars.owned.by.system <- df[df$category == "system",]
colnames(cars.owned.by.system)[5[ <- cars.owned.by.system
它可以完成工作,但是我不想重复执行此操作。我想象有一种规范的“拆分应用”组合方法会起作用。任何建议或帮助将不胜感激。谢谢。
答案 0 :(得分:1)
假设df
是您的数据框,我认为是这样做的。
for(cat in unique(df$category)) {
newdf<-paste("cars.owned.by.", cat, sep="")
assign(newdf, df[df$category==cat,])
eval(parse(text=paste("colnames(", newdf, ")[5] <- '", newdf, "'", sep="")))
}
答案 1 :(得分:1)
经典的for循环怎么样?
# first create the pasted name to iterate the loop
df$name <- paste(df$railroad,df$measure,sep='.')
# an empty list to have all your df
list_df <- list()
# the loop
for (i in df$name){
data <- df[which(df$name == i),] # select the df of name
colnames(data)[4]<-i # rename the mean
data<- data[,-5] # remove the useless name
list_df[[i]] <- data # store in list
}
# here you can see all the df in a list
list_df
> list_df
$`bnsf railway company.cars.owned.by`
railroad measure category bnsf railway company.cars.owned.by name
1 bnsf railway company cars.owned.by system 201510 bnsf railway company.cars.owned.by
3 bnsf railway company cars.owned.by private 201510 bnsf railway company.cars.owned.by
$`bnsf railway company.cars.type`
railroad measure category bnsf railway company.cars.type name
6 bnsf railway company cars.type box 201510 bnsf railway company.cars.type
9 bnsf railway company cars.type intermodal 201510 bnsf railway company.cars.type
14 bnsf railway company cars.type total 201510 bnsf railway company.cars.type
... and so on
# you can select each df, for example choosin its name
list_df$`bnsf railway company.cars.type`
railroad measure category bnsf railway company.cars.type name
6 bnsf railway company cars.type box 201510 bnsf railway company.cars.type
9 bnsf railway company cars.type intermodal 201510 bnsf railway company.cars.type
14 bnsf railway company cars.type total 201510 bnsf railway company.cars.type
# and you're sure it's a df
class(list_df$`bnsf railway company.cars.type`)
[1] "data.frame"
答案 2 :(得分:1)
考虑通过两个因素split
对数据帧进行子集处理,然后考虑Map
(对mapply
的包装)对子集数据帧和列表名称逐元素进行迭代。
还考虑将setNames()
的左侧版本colnames()
一次调用返回新的命名对象。
# CREATES NAMED LIST
df_list <- split(df, list(df$measure, df$category))
# RETURNS SAME LIST WITH RENAMED FIFTH COLUMN
df_list <- Map(function(sub, nm) setNames(sub, c("railroad", "measure", "category", "irm", nm)),
df_list, names(df_list))
# OUTPUT DFs
df_list$cars.owned.by.all.trains
df_list$cars.type.all.trains
df_list$terminal.dwell.all.trains
...
答案 3 :(得分:1)
这将为您提供一个命名的数据帧列表,几乎可以肯定,与在全局环境中将它们全部分开相比,这是更好的选择:
lst <- split(df, paste(df$measure, df$category, sep = ".")) %>%
purrr::imap(~`names<-`(.x, c(names(.x)[1:4], .y)))