我正在寻找一个函数,以便通过列名的末尾将一个数据帧拆分为几个数据帧。 举个例子:
Year | hour | LOT | S123_AA | S135_AA | S1763_BB | S173_BB | ...
所以我想将其拆分为2个数据帧,如下所示:
Year | hour | LOT | S123_AA | S135_AA |
和
Year | hour | LOT | S1763_BB | S173_BB |
我的重点是保留前3列,并附加所有结尾名称为_AA和_BB的列。
感谢您的时间
答案 0 :(得分:6)
您可以使用grep
来获取正确的子集。
df_AA = df[,c(1:3, grep("_AA$", colnames(df)))]
df_BB = df[,c(1:3, grep("_BB$", colnames(df)))]
答案 1 :(得分:3)
这是一个基本答案,您的用例可能需要在grepl()
调用中使用更复杂的正则表达式,但这应该使您走上正确的路:
#make some sample data
x <- data.frame(Year = rnorm(3), hour = rnorm(3), LOT = rnorm(3),S123_AA = rnorm(3),S135_AA = rnorm(3),S1763_BB = rnorm(3),S173_BB = rnorm(3))
#list the common columns
common_cols <- c("Year", "hour", "LOT")
#use grepl() to subset the columns that contain AA or BB
aa_cols <- names(x)[grepl("AA", names(x))]
bb_cols <- names(x)[grepl("BB", names(x))]
#create two new data frames
x_a <- x[, c(common_cols, aa_cols)]
x_b <- x[, c(common_cols, bb_cols)]
答案 2 :(得分:2)
一种方法是排除您不想要的列。
i <- grep("_AA$", names(df1))
j <- grep("_BB$", names(df1))
dfA <- df1[, -j] # Exclude the 'BB' columns
dfB <- df1[, -i] # Exclude the 'AA' columns
使用相同的排除原则,但使用tidyverse
。
library(tidyverse)
df1 %>%
select(names(.)[!grepl("_BB$", names(.))])
df1 %>%
select(names(.)[!grepl("_AA$", names(.))])
根据user NColl's comment中的建议,这可以更tidyverse
地出现。
df1 %>% select(-ends_with('_BB'))
df1 %>% select(-ends_with('_AA'))
数据。
df1 <- as.data.frame(matrix(1:49, ncol = 7))
nms <- scan(what = character(), sep = "|",
text = "Year | hour | LOT | S123_AA | S135_AA | S1763_BB | S173_BB ")
names(df1) <- trimws(nms)
答案 3 :(得分:1)
如果您有一堆组(如问题中的...
所示),则可能要使用lapply
来避免为每个组提供正则表达式:
# Sample data
df <- data.frame(
Year = rnorm(3),
hour = rnorm(3),
LOT = rnorm(3),
S123_AA = rnorm(3),
S135_AA = rnorm(3),
S1763_BB = rnorm(3),
S173_BB = rnorm(3)
)
# Our groups
groups <- unique(gsub(".*_", "", names(df[grep("_", colnames(df))])))
groups
#> [1] "AA" "BB"
# Our group regex's
groupx <- paste0("_", groups, "$")
groupx
#> [1] "_AA$" "_BB$"
lapply(groupx, function(x) df[, c(1:3, grep(x, colnames(df)))])
#> [[1]]
#> Year hour LOT S123_AA S135_AA
#> 1 0.07940092 -1.2628189 1.629389 -1.376438 -0.94292025
#> 2 -2.04122298 0.7471061 0.291170 -2.126642 0.24355149
#> 3 0.11448519 0.1710263 -0.736140 -1.087515 -0.07720119
#>
#> [[2]]
#> Year hour LOT S1763_BB S173_BB
#> 1 0.07940092 -1.2628189 1.629389 -0.3593335 0.64176748
#> 2 -2.04122298 0.7471061 0.291170 1.7928938 0.36021859
#> 3 0.11448519 0.1710263 -0.736140 -0.7853338 0.01439278
由reprex package(v0.2.1)于2018-12-31创建
答案 4 :(得分:1)
使用!
和grepl()
过滤列。
A <- ! grepl("BB", names(df))
B <- ! grepl("AA", names(df))
df[, A]
# Year hour LOT S123_AA S135_AA
# 1 1 8 15 22 29
# 2 2 9 16 23 30
# 3 3 10 17 24 31
df[, B]
# Year hour LOT S1763_BB S173_BB
# 1 1 8 15 36 43
# 2 2 9 16 37 44
# 3 3 10 17 38 45