Question

我有一个dataframe混合数据，范围从带有数值的变量（或列）到带有因子的变量（或列）。

我想在R中使用以下代码来用NA替换所有负值，如果超过99％的该变量的观察结果为NA，则删除整个变量。

第一部分应该确保在遇到字符串时没有问题。是否可以简单地从：

开始

mydata$v1[mydata$v1<0] <- NA

但是不是特定于v1并且只有观察不是字符串？

跟进：这是我从@stas g提供的解释到底有多远。然而，似乎没有从df中删除任何变量。

#mixed data
df <- data.frame(WVS_Longitudinal_1981_2014_R_v2015_04_18)
dat <- df[,sapply(df, function(x) {class(x)== "numeric" | class(x) == 
"integer"})]

foo <- function(dat, p){ 
  ind <- colSums(is.na(dat))/nrow(dat)
  dat[dat < 0] <- NA
  dat[, ind < p]
}

#process numeric part of the data separately
ii <- sapply(df, class) == "numeric" | sapply(df, class) == "integer"
dat.num <- foo(as.matrix(df[, ii]), 0.99)
#then stick the two parts back together again
WVS <- data.frame(df[, !ii], dat.num)

Answer 1

如果没有最小的可重复示例，

无法准确知道如何帮助您，但假设您有以下示例数据：

#matrix of random normal observations, 20 samples, 5 variables
dat <- matrix(rnorm(100), nrow = 20)
#if entry is negative, replace with 'NA'
dat[dat < 0] <- NA

#threshold for dropping a variable
p <- 0.99
#check how many NAs in each column (proportionally)
ind <- colSums(is.na(dat))/nrow(dat)
#only keep columns where threshold is not exceded
dat <- dat[, ind < p]

如果你有非数字变量并且你正在处理data.frame，你可以做这样的事情（假设你不关心列的顺序）：

#generate mixed data
dat <- matrix(rnorm(100), nrow = 20) #20 * 50 numeric numbers
df <- data.frame(letters[1 : 20], dat) #combined with one character column 


foo <- function(dat, p){ 
  ind <- colSums(is.na(dat))/nrow(dat)
  dat[dat < 0] <- NA
  dat[, ind < p]
}

#process numeric part of the data separately
ii <- sapply(df, class) == "numeric" #ind of numeric columns
dat.num <- foo(as.matrix(df[, ii]), 0.99) #feed numeric part of data to foo
#then stick the two partw back together again
data.frame(df[, !ii], dat.num)

Answer 2

这种方法：@YOLO提出的Solution by YOLO终于解决了这个问题：

RSpec.describe Writer do
  it 'replaces the contents of the file' do
    file = StringIO.new('foo')
    writer = described_class.new(file)
    one_contact = [{ 'name' => 'name', 'address' => 'address' }]

    writer.write(one_contact)

    expect(file.string).to eq('[{"name":"name1","address":"address1"}]')
  end
end

替换数据集中的所有负值

2 个答案: