我必须编写许多data.frames
代码。例如:
tt <- data.frame(V1=c("test1", "test3", "test1", "test4", "wins", "loses"),
V2=c("someannotation", "othertext", "loads of text including the word winning for the winner and the word losing for the loser", "blablabla", "blablabla", "blablabla"))
tt
V1 V2
test1 someannotation
test3 othertext
test1 loads of text including the word winning for the winner and the word losing for the loser
test4 blablabla
wins blablabla
loses blablabla
编码必须进入新的data.frame
,如果跑步者赢或输,我必须编码。如果V1
表示wins
,则他获胜(如果他输了,则由loses
表示)。但是,跑步者有可能赢得或失去部分比赛,test1
中的V1
表示V2
指定。如果winning
中的V2
一词出现在术语losing
之前,那么跑步者将赢得部分比赛(以及副驾驶)。
我试图从这里实现答案元素,以指定哪个字/字符串出现在哪个位置:
find location of character in string
实现如下:
result <- data.frame()
for(i in 1:length(tt[,1])){
if(grepl("wins", tt[i,1])) result[i,1] <- "wins"
if(grepl("loses", tt[i,1])) result[i,1] <- "loses"
if(grepl("test1", tt[i,1])&(which(strsplit(tt[i,2], " ")[[1]]=="winning")>which(strsplit(tt[i,2], " ")[[1]]=="losing"))) result[i,1] <- "loses"
if(grepl("test1", tt[i,1])&(which(strsplit(tt[i,2], " ")[[1]]=="winning")<which(strsplit(tt[i,2], " ")[[1]]=="losing"))) result[i,1] <- "wins"
}
但是V2
列的单元格不包含winning
或losing
的错误消息:
Error in if (grepl("test1", tt[i, 1]) & (which(strsplit(tt[i, 2], " ")[[1]] == : argument is of length zero
是否有人解决了这个问题甚至是复杂的解决方案?感谢任何帮助,谢谢!
修改
正如@grrgrrbla善意地澄清一样,赢得胜利的可能性有两种:一种是V1 == "win"
,另一种是V2
是否包含&#34;赢得&#34;在&#34;失去&#34;之前跑步者也赢了,有两种可能性会丢失:V1 == "loses"
或V2
包含&#34;失败&#34;之前&#34;赢得&#34;。
我的输出应如下所示:
result
V1
NA
NA
wins
NA
wins
loses
答案 0 :(得分:0)
您可以尝试(可能不是最简单的解决方案......)创建一个函数,如果满足您的“获胜”条件,则返回“胜利”,如果满足您的“失败”条件,则“失败” NA
在其他情况下:
wilo<-function(vec){
if(grepl("wins|loses",vec[1])){ # if the first variable is "wins" or "loses" you return the value of the first variable
return(vec[1])
} else {
if(grepl("winning|losing",vec[2])){ # if in the second variable, there is winning or losing (actually both need to be in the sentence and are supposed to be so you can just check for one word : grepl("winning",vec[2]) )
ifelse(gregexpr("winning",vec[2])[[1]]<gregexpr("losing",vec[2])[[1]], # if "winning" is placed before "losing"
return("wins"), # return "wins"
return("loses")) # else return "loses"
} else {
return(NA) # if none of the conditions are fulfilled, return NA
}
}
}
然后,您可以在data.frame的每一行上应用该函数:
apply(tt,1,wilo)
#[1] NA NA "wins" NA "wins" "loses"
注意:正如@grrgrrbla所建议的,使用函数gregexpr
的替代方法是使用str_locate
包中的函数stringr
。