我有一个像这样的数据框:
df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE,
text = "
plantfam,lepfam,lepsp\n
Asteraceae,Geometridae,Eois sp\n
Asteraceae,Erebidae,\n
Poaceae,Erebidae,\n
Poaceae,Noctuidae,\n
Asteraceae,Saturnidae,Polyphemous sp\n
Melastomaceae,Noctuidae,\n
Asteraceae,,\n
Melastomaceae,,\n
,Noctuidae,\n
,Erebidae,\n
Poaceae, Erebidae,\n")
我希望以lepsp
和plantfam
的唯一组合为条件,创建唯一的lepfam
个名称。每个lepfam必须首先进行子集化。对于该lepfam子集中的每个唯一plantfam
lepfam
组合,指定了morpho种类名称。对于那些植物纤维或lepfam是空白的,没有指定一个morpho物种。重复的plantfam
lepfam
组合应该被赋予相同的形态物种名称。输出应如下所示:
output<-
plantfam lepfam lepsp
Asteraceae Geometridae Eois sp
Asteraceae Erebidae Erebidae_morphosp1
Poaceae Erebidae Erebidae_morphosp2
Poaceae Noctuidae Noctuidae_morphosp1
Asteraceae Saturnidae Polyphemous sp
Melastomaceae Noctuidae Noctuidae_morphosp2
Asteraceae
Melastomaceae
Noctuidae
Erebidae
Poaceae Erebidae Erebidae_morphosp2
我试过了:
condition <- quote(lepsp == "" & plantfam != "" & lepfam != "")
subset1 <- df %>% filter(condition) %>% group_by(lepfam) %>%
mutate(lepsp=
paste0(lepfam,"_morphosp",match(plantfam,unique(plantfam))))
subset2 <- df %>% filter(condition) %>% setdiff(df, .)
union(subset1, subset2) %>% arrange(lepsp)
然而,Poaceae
和Erebidae
这两行在它们应该相同时返回不同的morphosp数Erebidae_morphosp1
和Erebidae_morphosp2
。
Source: local data frame [11 x 3]
Groups: lepfam [6]
plantfam lepfam lepsp
<chr> <chr> <chr>
1 Melastomaceae
2 Asteraceae
3 Poaceae Erebidae Erebidae_morphosp1
4 Asteraceae Geometridae Eois sp
5 Asteraceae Erebidae Erebidae_morphosp1
6 Poaceae Erebidae Erebidae_morphosp2
7 Erebidae Erebidae_morphosp3
8 Poaceae Noctuidae Noctuidae_morphosp1
9 Melastomaceae Noctuidae Noctuidae_morphosp2
10 Noctuidae Noctuidae_morphosp3
11 Asteraceae Saturnidae Polyphemous sp
答案 0 :(得分:0)
我认为问题可能只是在你的df
中,最后一行在Erebidae之前有一个空间,这导致R认为它与另一个不同。
我发现我正在完成答案。这里&#39;我该怎么做你想做的事。我在lepfam_number
粘贴之前引入了一个组号mutate
。
library(dplyr)
df %>%
group_by(lepfam) %>%
mutate(lepfam_number= match(plantfam, unique(plantfam)),
lepsp=ifelse(lepsp=="" & lepfam!="" & trimws(plantfam)!="",
paste0(lepfam,"_morphosp",lepfam_number),
lepsp)
)
plantfam lepfam lepsp lepfam_number
<chr> <chr> <chr> <int>
1 Asteraceae Geometridae Eois sp 1
2 Asteraceae Erebidae Erebidae_morphosp1 1
3 Poaceae Erebidae Erebidae_morphosp2 2
4 Poaceae Noctuidae Noctuidae_morphosp1 1
5 Asteraceae Saturnidae Polyphemous sp 1
6 Melastomaceae Noctuidae Noctuidae_morphosp2 2
7 Asteraceae 1
8 Melastomaceae 2
9 Noctuidae 3
10 Erebidae 3
11 Poaceae Erebidae Erebidae_morphosp2 2
数据强>
df <- read.table(sep = ",", header = TRUE, stringsAsFactors = FALSE,
text = "
plantfam,lepfam,lepsp\n
Asteraceae,Geometridae,Eois sp\n
Asteraceae,Erebidae,\n
Poaceae,Erebidae,\n
Poaceae,Noctuidae,\n
Asteraceae,Saturnidae,Polyphemous sp\n
Melastomaceae,Noctuidae,\n
Asteraceae,,\n
Melastomaceae,,\n
,Noctuidae,\n
,Erebidae,\n
Poaceae,Erebidae,\n")