假设我有以下数据框:
df <- data.frame(BR.a=rnorm(10), BR.b=rnorm(10), BR.c=rnorm(10),
USA.a=rnorm(10), USA.b = rnorm(10), FRA.a=rnorm(10), FRA.b=rnorm(10))
我想创建一个数据框列表,将它们按列名的第一部分分开,即以“BR”开头的列将是列表的一个元素,以“USA”开头的列将是是另一个,等等。
我可以使用strsplit
获取列名并将它们分开。但是我不确定如何迭代它并分离数据帧是最好的方法。
strsplit(names(df), "\\.")
给我一个列表,其中顶级元素是列的名称,第二级是由"."
分割的同一个。
如何迭代此列表以获取以相同子字符串开头的列的索引号,并将这些列分组为另一个列表的元素?
答案 0 :(得分:3)
这仅在列名始终采用您拥有它们的形式时才会起作用(基于&#34;。&#34;拆分),并且您希望在第一个&#34;之前根据标识符进行分组。 &#34;
df <- data.frame(BR.a=rnorm(10), BR.b=rnorm(10), BR.c=rnorm(10),
USA.a=rnorm(10), USA.b = rnorm(10), FRA.a=rnorm(10), FRA.b=rnorm(10))
## Grab the component of the names we want
nm <- do.call(rbind, strsplit(colnames(df), "\\."))[,1]
## Create list with custom function using lapply
datlist <- lapply(unique(nm), function(x){df[, nm == x]})
答案 1 :(得分:3)
Dason打败了我,但这是同一概念方法的不同风格:
library(plyr)
# Use regex to get the prefixes
# Pulls any letters or digits ("\\w*") from the beginning of the string ("^")
# to the first period ("\\.") into a group, then matches all the remaining
# characters (".*"). Then replaces with the first group ("\\1" = "(\\w*)").
# In other words, it matches the whole string but replaces with only the prefix.
prefixes <- unique(gsub(pattern = "^(\\w*)\\..*",
replace = "\\1",
x = names(df)))
# Subset to the variables that match the prefix
# Iterates over the prefixes and subsets based on the variable names that
# match that prefix
llply(prefixes, .fun = function(x){
y <- subset(df, select = names(df)[grep(names(df),
pattern = paste("^", x, sep = ""))])
})
我认为即使有“。”,这些正则表达式仍然可以给你正确的结果。后来变量名:
unique(gsub(pattern = "^(\\w*)\\..*",
replace = "\\1",
x = c(names(df), "FRA.c.blahblah")))
或者,如果稍后在变量名称中出现前缀:
# Add a USA variable with "FRA" in it
df2 <- data.frame(df, USA.FRANKLINS = rnorm(10))
prefixes2 <- unique(gsub(pattern = "^(\\w*)\\..*",
replace = "\\1",
x = names(df2)))
llply(prefixes2, .fun = function(x){
y <- subset(df2, select = names(df2)[grep(names(df2),
pattern = paste("^", x, sep = ""))])
})