我正在尝试为子集中的值匹配条件分配因子lepsp
的空白级别的名称。数据的一个示例包括:
df<-
plantfam lepfam lepsp lepcn
Asteraceae Geometridae Eois sp green/spikes
Asteraceae Erebidae Anoba sp green/nospikes
Asteraceae Erebidae green/nospikes
Melastomaceae Noctuidae Balsinae sp
Poaceae Erebidae Deinopa sp black/orangespots
Poaceae Erebidae black/orangespots
Poaceae Erebidae Cocytia sp black/yellowspots
Poaceae black/yellowspots
以下是数据框的代码:
df<-data.frame( plantfam= c("Asteraceae","Asteraceae","Asteraceae",
"Melastomaceae","Poaceae","Poaceae","Poaceae","Poaceae"), lepfam=
c("Geometridae", "Erebidae","Erebidae",
"Noctuidae","Erebidae","Erebidae","Erebidae",""), lepsp= c("Eois sp",
"Anoba sp", "", "Balsinae sp", "Deinopa sp", "", "Cocytia sp", ""),
lepcn= c("green/spikes","green/nospikes", "green/nospikes","",
"black/orangespots", "black/orangespots", "black/yellowspots",
"black/yellowspots"))
如果lepsp
为空但有lepcn
且lepcn
与另一个lepsp
相匹配,那么plantfam
空白lepsp
将给出这些条件匹配的lepsp
名称。因此,使用相同lepfam
的相同plantfam
的每个lepcn
子集将被指定为相同的名称。
output<-
plantfam lepfam lepsp lepcn
Asteraceae Geometridae Eois sp green/spikes
Asteraceae Erebidae Anoba sp green/nospikes
Asteraceae Erebidae Anoba sp green/nospikes
Melastomaceae Noctuidae Balsinae sp
Poaceae Erebidae Deinopa sp black/orangespots
Poaceae Erebidae Deinopa sp black/orangespots
Poaceae Erebidae Cocytia sp black/yellowspots
Poaceae Cocytia sp black/yellowspots
我尝试过以下各种变体而没有成功: https://stackoverflow.com/a/44479195/8061255
答案 0 :(得分:0)
直接的基础R,有利于检查要重命名的组合。实质上,您将获得plantfam / lepfam / lepcn组合的唯一列表,并将其与原始数据集合并:
读入数据并确保格式符合预期:
df<- read.csv(text =
'plantfam,lepfam,lepsp,lepcn
Asteraceae,Geometridae,Eois sp,green/spikes
Asteraceae,Erebidae,Anoba sp,green/nospikes
Asteraceae,Erebidae,NA,green/nospikes
Melastomaceae,Noctuidae,Balsinae sp,NA
Poaceae,Erebidae,Deinopa sp,black/orangespots
Poaceae,Erebidae,NA,black/orangespots
Poaceae,Erebidae,NA,balck/yellowspots')
# assumes blanks are NA
# if blanks are actually empty strings "" then turn those into NA's
# make sure everything is a character, not a factor
df <- as.data.frame(apply(df,2,as.character),stringsAsFactors = F)
解决方案:
# get a unique list of all combinations that don't have missing data
dflookup <- unique(na.omit(df))
# inspect combinations to be renamed, there should be no duplicate plantfam/lepfam/lepcn combinations
dflookup
# use the lookup to merge in all known names
newdf <- merge(df,dflookup,by = c('plantfam','lepfam','lepcn'),all.x = T,suffixes = c('old','new'))
# use original lepsp when new lepsp is NA
newdf$lepsp <- ifelse(is.na(newdf$lepspnew),newdf$lepspold,newdf$lepspnew)
# remove unneeded columns
newdf$lepspold <- newdf$lepspnew <- NULL
# turn back into factors if desired
newdf <- as.data.frame(apply(newdf,2,as.factor))