矢量化对R中列的访问

时间:2016-03-10 11:20:23

标签: r vectorization

我正在尝试获取满足列中条件的所有索引。如果有一个这样的列,我已经完成了这个:

# Get a 10% of samples labeled with a 1
indexPositive = sample(which(datafsign$result == 1), nrow(datafsign) * .1)

也可以对一行中的任意数量的列进行相同的操作吗?我想在那种情况下indexPositive将是一个包含每列索引的列表或数组。

数据

数据框如下:

          x         y f1 f2 f3 f4
1  76.71655  60.74299  1  1 -1 -1
2 -85.73743 -19.67202  1  1  1 -1
3  75.95698 -27.20154  1  1  1 -1
4 -82.57193  39.30717  1  1  1 -1
5 -45.32161  39.44898  1  1 -1 -1
6 -46.76636 -35.30635  1  1  1 -1

我使用的种子是set.seed(1000000007)

我想要的是值为1的索引集。在只有一列的情况下,结果为:

head(indexPositive)
[1] 1398  873 3777 2140  133 3515

提前致谢。

答案

感谢@David Arenburg,我终于做到了。基于他的评论,我创建了这个函数:

getPercentageOfData <- function(x, condition = 1, percentage = .1){
  # Get the percentage of samples that meet condition
  #
  # Args:
  #   x: A vector containing the data
  #   condition: Condition that the data need to satisfy
  #   percentaje: What percentage of samples to get
  #
  # Returns:
  #   Indexes of the percentage of the samples that meet the condition
  meetCondition = which(x == condition)
  sample(meetCondition, length(meetCondition) * percentage)
}

然后我像这样使用:

# Get a 10% of samples labeled with a 1 in all 4 functions
indexPositive = lapply(datafunctions[3:6], getPercentageOfData)
# Change 1 by -1
datafunctions$f1[indexPositive$f1] = -1
datafunctions$f2[indexPositive$f2] = -1
datafunctions$f3[indexPositive$f3] = -1
datafunctions$f4[indexPositive$f4] = -1

同时将值-1分配给每列而不是写入4行会很棒,但我不知道如何。

1 个答案:

答案 0 :(得分:2)

您可以按如下方式定义您的功能(您也可以添加replacement作为参与者)

getPercentageOfData <- function(x, condition = 1, percentage = .1, replacement = -1){
  meetCondition <- which(x == condition)
  replace(x, sample(meetCondition, length(meetCondition) * percentage), replacement)
}

然后选择您要操作的列并直接更新datafunctions(不创建indexPositive然后手动更新)

cols <- 3:6
datafunctions[cols] <- lapply(datafunctions[cols], getPercentageOfData)

您当然可以使用lapply中的函数参数,例如(例如)

datafunctions[cols] <- lapply(datafunctions[cols], 
                              getPercentageOfData, percentage = .8, replacement = -100)