R-从现有列值创建和命名数据框

时间:2018-09-07 20:24:46

标签: r dataframe

我有一个通过dput构成的数据框:

structure(list(railroad = c("bnsf railway company", "bnsf railway company", 
"bnsf railway company", "bnsf railway company", "bnsf railway company", 
"bnsf railway company", "bnsf railway company", "bnsf railway company", 
"union pacific railroad", "union pacific railroad", "union pacific railroad", 
"union pacific railroad", "union pacific railroad", "union pacific railroad", 
"union pacific railroad", "union pacific railroad"), measure = 
c("cars.owned.by", 
"cars.owned.by", "cars.type", "cars.type", "cars.type", "train.speed", 
"train.speed", "terminal.dwell", "cars.owned.by", "cars.owned.by", 
"cars.type", "cars.type", "cars.type", "train.speed", "train.speed", 
"terminal.dwell"), category = c("system", "private", "box", "intermodal", 
"total", "intermodal", "all.trains", "entire.railroad", "system", 
"private", "box", "intermodal", "total", "intermodal", "all.trains", 
"entire.railroad"), irm = c(201510L, 201510L, 201510L, 201510L, 
201510L, 201510L, 201510L, 201510L, 201510L, 201510L, 201510L, 
201510L, 201510L, 201510L, 201510L, 201510L), mean = c(66623, 
149937.333, 11395, 16499, 236866, 33.3, 24.5, 25.267, 57618.333, 
195764.667, 22229.333, 14135.333, 293164.333, 31.933, 26.6, 27.6
)), row.names = c(1L, 3L, 6L, 9L, 14L, 15L, 20L, 32L, 127L, 129L, 
132L, 135L, 140L, 141L, 146L, 160L), class = "data.frame")

我想做的是以下事情:

  1. measurecategory的每种组合创建单独的数据帧, 通过粘贴measurecategory分隔"."来命名。因此第一个数据帧将被称为cars.owned.by.system,依此类推。

  2. 将每个数据帧的第五列mean重命名为数据帧本身的名称。因此,对于第一个数据帧,它将为colnames(df)[5] <- cars.owned.by.system

所需的输出是8个独立的数据帧,如上所述。

我尝试了以下操作:

cars.owned.by.system <- df[df$category == "system",]
colnames(cars.owned.by.system)[5[ <- cars.owned.by.system

它可以完成工作,但是我不想重复执行此操作。我想象有一种规范的“拆分应用”组合方法会起作用。任何建议或帮助将不胜感激。谢谢。

4 个答案:

答案 0 :(得分:1)

假设df是您的数据框,我认为是这样做的。

for(cat in unique(df$category)) {
  newdf<-paste("cars.owned.by.", cat, sep="")
  assign(newdf, df[df$category==cat,])
  eval(parse(text=paste("colnames(", newdf, ")[5] <- '", newdf, "'", sep="")))
}

答案 1 :(得分:1)

经典的for循环怎么样?

# first create the pasted name to iterate the loop 
df$name <- paste(df$railroad,df$measure,sep='.')

# an empty list to have all your df
list_df <- list()

# the loop
for (i in df$name){
data <- df[which(df$name == i),]  # select the df of name
colnames(data)[4]<-i              # rename the mean
data<- data[,-5]                  # remove the useless name
list_df[[i]] <- data              # store in list
}

# here you can see all the df in a list
list_df

> list_df
$`bnsf railway company.cars.owned.by`
              railroad       measure category bnsf railway company.cars.owned.by                               name
1 bnsf railway company cars.owned.by   system                             201510 bnsf railway company.cars.owned.by
3 bnsf railway company cars.owned.by  private                             201510 bnsf railway company.cars.owned.by

$`bnsf railway company.cars.type`
               railroad   measure   category bnsf railway company.cars.type                           name
6  bnsf railway company cars.type        box                         201510 bnsf railway company.cars.type
9  bnsf railway company cars.type intermodal                         201510 bnsf railway company.cars.type
14 bnsf railway company cars.type      total                         201510 bnsf railway company.cars.type
... and so on  

# you can select each df, for example choosin its name
list_df$`bnsf railway company.cars.type`
                    railroad   measure   category bnsf railway company.cars.type                           name
6  bnsf railway company cars.type        box                         201510 bnsf railway company.cars.type
9  bnsf railway company cars.type intermodal                         201510 bnsf railway company.cars.type
14 bnsf railway company cars.type      total                         201510 bnsf railway company.cars.type

# and you're sure it's a df
class(list_df$`bnsf railway company.cars.type`)
[1] "data.frame"

答案 2 :(得分:1)

考虑通过两个因素split对数据帧进行子集处理,然后考虑Map(对mapply的包装)对子集数据帧和列表名称逐元素进行迭代。

还考虑将setNames()的左侧版本colnames()一次调用返回新的命名对象。

# CREATES NAMED LIST
df_list <- split(df, list(df$measure, df$category))

# RETURNS SAME LIST WITH RENAMED FIFTH COLUMN
df_list <- Map(function(sub, nm) setNames(sub, c("railroad", "measure", "category", "irm", nm)), 
               df_list, names(df_list))

# OUTPUT DFs 
df_list$cars.owned.by.all.trains

df_list$cars.type.all.trains

df_list$terminal.dwell.all.trains 
...

答案 3 :(得分:1)

这将为您提供一个命名的数据帧列表,几乎可以肯定,与在全局环境中将它们全部分开相比,这是更好的选择:

lst <- split(df, paste(df$measure, df$category, sep = ".")) %>% 
  purrr::imap(~`names<-`(.x, c(names(.x)[1:4], .y)))