如何仅在一定数量的非连续行中替换字符串

时间:2018-03-09 11:55:34

标签: r string replace

我有一个数据框,其中有一列,如下所示

mydf<-data.frame(c("Normal study","Normal study","Normal study","Odd things","Strange stuff","Normal study","Normal study","Normal study"))

我想将“Normal study”替换为“Bizarre”,但只能在“Normal study”中随机排成三行,以便输出为:

"Normal study"
"Normal study"
"Bizarre"
"Odd things"
"Strange stuff"
"Bizarre"
"Normal study"
"Normal study"

我试过像

这样的东西
library(dplyr)

replaceWithBarr<-filter(grepl("Normal",out))
sample_n(replaceWithBarr,nrow(replaceWithBarr)/3)

最初对数据进行子集但是它没有保留行号,因此我可以重新合并子集化数据......而我甚至还没有进入替换部分..

3 个答案:

答案 0 :(得分:3)

你想要在1/3行上完成这项工作,还是随机选择33%的几率?

在第一种情况下:

mydf[sample(which(mydf[,1]=="Normal study"), sum(mydf[,1]=="Normal study")/3), 1] <- "Bizzare"

在第二种情况下:

mydf[mydf[,1]=="Normal study" & runif(nrow(mydf), 0, 3) > 2, 1] <- "Bizzare"

答案 1 :(得分:1)

你可以在基础R中做这样的事情。这里我假设你想用“Bizarre”替换你的“普通学习”条目的0.3。

# Your sample data
mydf<-data.frame(c("Normal study","Normal study","Normal study","Odd things","Strange stuff","Normal study","Normal study","Normal study"));

# Convert factors to characters
mydf[] <- lapply(mydf, as.character);

# Replace 0.3 of all "Normal study" entries with "Bizarre" 
set.seed(2017);
mydf[
    sample(which(mydf[, 1] == "Normal study"), floor(length(mydf[, 1] == "Normal study") * 0.33)),
    1] <- "Bizarre";
mydf;
#  c..Normal.study....Normal.study....Normal.study....Odd.things...
#1                                                     Normal study
#2                                                     Normal study
#3                                                          Bizarre
#4                                                       Odd things
#5                                                    Strange stuff
#6                                                     Normal study
#7                                                     Normal study
#8                                                          Bizarre   

答案 2 :(得分:1)

你可以这样做:

mydf<-data.frame(x=c("Normal study","Normal study","Normal study","Odd things","Strange stuff","Normal study","Normal study","Normal study"), stringsAsFactors = FALSE)
i <- which(mydf$x=="Normal study")
j <- sample(i, length(i)/3)
mydf[j, "x"] <- "Bizarre"
mydf
# > mydf
#               x
# 1  Normal study
# 2  Normal study
# 3       Bizarre
# 4    Odd things
# 5 Strange stuff
# 6  Normal study
# 7  Normal study
# 8       Bizarre