我有一个像这样的data.frame:
name value1 value2 value3
a 0.10 0.9 0.10
b 0.00 0.3 0.67
c 0.01 0.1 0.10
d 0.12 0.10 0.2
e 0.10 0.001 0.1
我想要的每一列“值*”对应于值0.10。换句话说,输出将是包含“名称”元素的三列的data.frame。我首先想到的是将“名称”绑定到每个“值*”列,然后将其绑定为子集,但没有成功:
for(i in 1:length(mydf)){
my_subset[[i]] = cbind(rownames(mydf), mydf[[i]])
}
“名称”列是mydf的行名。而且我总共有10.000行和45列。
所需的输出:
name value1 value2 value3
a a NA a
b NA NA NA
c NA NA c
d NA d NA
e e NA NA
有人可以帮我吗?我知道有一些“应用”功能可能可以完成这项工作,但我不得不学习如何使用循环。
提前谢谢
答案 0 :(得分:3)
这是您想要的吗?
a = structure(list(value1 = c("0.10", "0.00", "0.01", "0.12", "0.10"
), value2 = c("0.9", "0.3", "0.1", "0.10", "0.001"), value3 = c("0.10",
"0.67", "0.10", "0.2", "0.1")), row.names = c("a", "b", "c",
"d", "e"), class = "data.frame")
val = "0.10"
apply(a,2,function(x) rownames(a)[which(x==val)])
$`value1`
[1] "a" "e"
$value2
[1] "d"
$value3
[1] "a" "c"
答案 1 :(得分:2)
以基数R lapply
cols <- grep("^value", names(df))
df[cols] <- lapply(df[cols], function(x) ifelse(x == 0.1, df$name, NA))
df
# name value1 value2 value3
#1 a a <NA> a
#2 b <NA> <NA> <NA>
#3 c <NA> c c
#4 d <NA> d <NA>
#5 e e <NA> e
答案 2 :(得分:2)
这是使用for循环的替代方法
X <- data.frame(
name = letters[1:5],
value1 = c(0.10, 0.00, 0.01, 0.12, 0.10),
value2 = c(0.90, 0.30, 0.10, 0.10, 0.001),
value3 = c(0.10, 0.67, 0.10, 0.20, 0.10),
stringsAsFactors = FALSE
)
示例数据:
X
name value1 value2 value3
1 a 0.10 0.900 0.10
2 b 0.00 0.300 0.67
3 c 0.01 0.100 0.10
4 d 0.12 0.100 0.20
5 e 0.10 0.001 0.10
for (j in grep("value", names(X))) {
X[, j] <- ifelse(X[, j] == 0.10, X[, "name"], NA)
}
结果:
X
name value1 value2 value3
1 a a <NA> a
2 b <NA> <NA> <NA>
3 c <NA> c c
4 d <NA> d <NA>
5 e e <NA> e
答案 3 :(得分:0)
这是使用base R
df[-1] <- df$name[NA^(df[-1] != 0.1) * seq_len(nrow(df))]
df
# name value1 value2 value3
#1 a a <NA> a
#2 b <NA> <NA> <NA>
#3 c <NA> c c
#4 d <NA> d <NA>
#5 e e <NA> e
df1 <- df[rep(seq_len(nrow(df)), 1e7), ]
df2 <- copy(df1)
system.time({
cols <- grep("^value", names(df1))
df1[cols] <- lapply(df1[cols], function(x) ifelse(x == 0.1, df1$name, NA))
})
# user system elapsed
# 35.700 4.587 40.615
system.time({
df2[-1] <- df2$name[NA^(df2[-1] != 0.1) * seq_len(nrow(df2))]
})
# user system elapsed
# 21.709 3.886 26.026
df <- structure(list(name = c("a", "b", "c", "d", "e"), value1 = c(0.1,
0, 0.01, 0.12, 0.1), value2 = c(0.9, 0.3, 0.1, 0.1, 0.001), value3 = c(0.1,
0.67, 0.1, 0.2, 0.1)), class = "data.frame", row.names = c(NA,
-5L))
答案 4 :(得分:0)
您只需使用data.table
软件包-
> setDT(dt)[,(setdiff(colnames(dt),"name")):=lapply(.SD,function(x) ifelse(x==.10,as.character(name),NA)),.SDcols=setdiff(colnames(dt),"name")]
> dt
name value1 value2 value3
1: a a <NA> a
2: b <NA> <NA> <NA>
3: c <NA> c c
4: d <NA> d <NA>
5: e e <NA> e