我正在尝试获取满足列中条件的所有索引。如果有一个这样的列,我已经完成了这个:
# Get a 10% of samples labeled with a 1
indexPositive = sample(which(datafsign$result == 1), nrow(datafsign) * .1)
也可以对一行中的任意数量的列进行相同的操作吗?我想在那种情况下indexPositive
将是一个包含每列索引的列表或数组。
数据框如下:
x y f1 f2 f3 f4
1 76.71655 60.74299 1 1 -1 -1
2 -85.73743 -19.67202 1 1 1 -1
3 75.95698 -27.20154 1 1 1 -1
4 -82.57193 39.30717 1 1 1 -1
5 -45.32161 39.44898 1 1 -1 -1
6 -46.76636 -35.30635 1 1 1 -1
我使用的种子是set.seed(1000000007)
我想要的是值为1的索引集。在只有一列的情况下,结果为:
head(indexPositive)
[1] 1398 873 3777 2140 133 3515
提前致谢。
感谢@David Arenburg,我终于做到了。基于他的评论,我创建了这个函数:
getPercentageOfData <- function(x, condition = 1, percentage = .1){
# Get the percentage of samples that meet condition
#
# Args:
# x: A vector containing the data
# condition: Condition that the data need to satisfy
# percentaje: What percentage of samples to get
#
# Returns:
# Indexes of the percentage of the samples that meet the condition
meetCondition = which(x == condition)
sample(meetCondition, length(meetCondition) * percentage)
}
然后我像这样使用:
# Get a 10% of samples labeled with a 1 in all 4 functions
indexPositive = lapply(datafunctions[3:6], getPercentageOfData)
# Change 1 by -1
datafunctions$f1[indexPositive$f1] = -1
datafunctions$f2[indexPositive$f2] = -1
datafunctions$f3[indexPositive$f3] = -1
datafunctions$f4[indexPositive$f4] = -1
同时将值-1分配给每列而不是写入4行会很棒,但我不知道如何。
答案 0 :(得分:2)
您可以按如下方式定义您的功能(您也可以添加replacement
作为参与者)
getPercentageOfData <- function(x, condition = 1, percentage = .1, replacement = -1){
meetCondition <- which(x == condition)
replace(x, sample(meetCondition, length(meetCondition) * percentage), replacement)
}
然后选择您要操作的列并直接更新datafunctions
(不创建indexPositive
然后手动更新)
cols <- 3:6
datafunctions[cols] <- lapply(datafunctions[cols], getPercentageOfData)
您当然可以使用lapply
中的函数参数,例如(例如)
datafunctions[cols] <- lapply(datafunctions[cols],
getPercentageOfData, percentage = .8, replacement = -100)