我有一个带杂货的大矩阵。 有些值相同但名称不同。 例如:
Ketchup Ketchupwithgarlic Ketchupspicy Chips Chipsorganic
0 1 0 0 1
1 0 0 0 0
0 0 0 1 0
1 0 0 0 0
我想要做的是将这两个向量组合成一个向量,如果一个名称以完全相同的名称开头,那么输出如下所示:
Ketchup Chips
1 1
1 0
0 1
1 0
我该怎么办?
答案 0 :(得分:3)
我相信这可以做你想要的。至少使用您提供的数据集。并且它不依赖于硬编码列名称。
使用@MKR中的代码读取数据:
nms <- names(df)
inx <- which(sapply(seq_along(nms), function(i) any(grepl(paste0("^", nms[i]), nms[-i]))))
result <- sapply(inx, function(i) rowSums(df[, grep(nms[i], nms)]))
colnames(result) <- nms[inx]
result
# Ketchup Chips
#[1,] 1 1
#[2,] 1 0
#[3,] 0 1
#[4,] 1 0
答案 1 :(得分:2)
将矩阵转换为dplyr::coalesce
后,可以使用data.frame
选项。此外,值为0
的单元格应更改为NA
以应用coalesce
。
library(dplyr)
# First change matrix to data.frame. The same data is created in data.frame
# so this step can be skipped
df <- as.data.frame(df)
# Replace 0 with NA
df[df==0] <- NA
选项#1:如果列名较少且已知,则一旦接近
bind_cols(Chips = coalesce(!!!select(df, starts_with("Chips"))),
Ketchup = coalesce(!!!select(df, starts_with("Ketchup"))) )
# # A tibble: 4 x 2
# Chips Ketchup
# <int> <int>
# 1 1 1
# 2 NA 1
# 3 1 NA
# 4 NA 1
选项#2:通用方法可以写成:
overlapName <- names(df)[mapply(function(x)sum(str_detect(names(df),x)), names(df)) >1]
library(stringr)
mapply(function(x)coalesce(!!!select(df, starts_with(x))), overlapName)
# Ketchup Chips
# [1,] 1 1
# [2,] 1 NA
# [3,] NA 1
# [4,] 1 NA
数据:强>
df <- read.table(text =
"Ketchup Ketchupwithgarlic Ketchupspicy Chips Chipsorganic
0 1 0 0 1
1 0 0 0 0
0 0 0 1 0
1 0 0 0 0",
header = TRUE, stringsAsFactors = FALSE)
答案 2 :(得分:2)
这是另一个基础R替代品。我认为Rui Barrades的答案可能更好,但看到多种方法会有所帮助。
# save column names
cnms <- colnames(myMat)
# build a matrix that groups on column names using col and grepl
grps <- col(diag(length(cnms))) * sapply(cnms[order(cnms)], grepl, x=cnms)
# run through the groups and perform rowSums to collapse groups into one column
sapply(split(seq_len(ncol(myMat)),
colnames(grps)[apply(grps, 1, FUN=function(x) min(x[x != 0]))]),
function(y) rowSums(myMat[, y]))
返回
Chips Ketchup
[1,] 1 1
[2,] 0 1
[3,] 1 0
[4,] 0 1
数据强>
myMat <-
structure(c(0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 1L, 0L, 0L, 0L), .Dim = 4:5, .Dimnames = list(NULL,
c("Ketchup", "Ketchupwithgarlic", "Ketchupspicy", "Chips",
"Chipsorganic")))