根据另一列的因子级别创建新列

时间:2018-03-05 13:49:34

标签: r dplyr grouping levels mutate

我正在尝试在我的数据框中生成一个列,让我们说它被称为“状态”,它应该提供鱼的状态,即受保护或不受保护。

我在寻找:

  ID                   Species      Status
1  1 Epinephelus polyphekadion   Protected
2  2        Epinephelus tukula   Protected
3  3         Thunnus albacares   Protected
4  4       Sphyraena barracuda Unprotected
5  5        Lutjanus rivulatus Unprotected
6  6         Lethrinus lentjan Unprotected
7  7 Plectropomus pessuliferus   Protected

我的数据:

fishydata <- structure(list(ID = 1:7, Species = structure(c(1L, 2L, 7L, 6L,4L, 3L, 5L), .Label = c("Epinephelus polyphekadion", "Epinephelus tukula","Lethrinus lentjan", "Lutjanus rivulatus", "Plectropomus pessuliferus","Sphyraena barracuda", "Thunnus albacares"), class = "factor"), .Names = c("ID", "Species"), row.names = c(NA, 7L), class = "data.frame")

该数据集包含超过1000个观测值。他们的一行代码可以将特定物种链接到新列中的状态。

我有超过40种,7种受到保护。我希望提供7种具有“保护”状态的物种,并将其他所有物种视为“未受保护”而不是输入所有物种名称并将其归类为“未受保护”

任何指针或建议都将不胜感激。我的技能是基本的,试图回到R.我一直在使用变异和过滤器涉及dplyr,但我已经到了一堵砖墙。

2 个答案:

答案 0 :(得分:3)

没有状态栏的数据:

fishydata2 <- structure(list(ID = 1:7, 
                            Species = structure(c(1L, 2L, 7L, 6L,4L, 3L, 5L), 
                            .Label = c("Epinephelus polyphekadion", "Epinephelus tukula","Lethrinus lentjan", "Lutjanus rivulatus", "Plectropomus pessuliferus","Sphyraena barracuda", "Thunnus albacares"), class = "factor")
                        ),
                   .Names = c("ID", "Species"), 
                   row.names = c(NA, 7L), 
                   class = "data.frame")

#   ID                   Species
#1  1 Epinephelus polyphekadion
#2  2        Epinephelus tukula
#3  3         Thunnus albacares
#4  4       Sphyraena barracuda
#5  5        Lutjanus rivulatus
#6  6         Lethrinus lentjan
#7  7 Plectropomus pessuliferus

默认情况下,您只需要创建一个具有不受保护状态的新列:

fishydata2$Status <- "Unprotected"

现在,只需为您的7种受保护物种更新它:

fishydata2[fishydata2$Species %in% c('Epinephelus polyphekadion',
                  'Epinephelus tukula','Thunnus albacares',
                  'Plectropomus pessuliferus'),]$Status <- "Protected"

结果:

fishydata2
#ID                   Species      Status
#1  1 Epinephelus polyphekadion   Protected
#2  2        Epinephelus tukula   Protected
#3  3         Thunnus albacares   Protected
#4  4       Sphyraena barracuda Unprotected
#5  5        Lutjanus rivulatus Unprotected
#6  6         Lethrinus lentjan Unprotected
#7  7 Plectropomus pessuliferus   Protected

答案 1 :(得分:0)

如果您只是想知道如何对数据框进行子集化以便您只拥有值为Protected的行,则有两个选项:

使用dplyr

filter(fishydata, Status == "Protected")
#   ID                   Species    Status
# 1  1 Epinephelus polyphekadion Protected
# 2  2        Epinephelus tukula Protected
# 3  3         Thunnus albacares Protected
# 4  7 Plectropomus pessuliferus Protected

fishydata[fishydata$Status == "Protected",]
#   ID                   Species    Status
# 1  1 Epinephelus polyphekadion Protected
# 2  2        Epinephelus tukula Protected
# 3  3         Thunnus albacares Protected
# 7  7 Plectropomus pessuliferus Protected

这两个选项都将生成一个数据帧,该数据帧只包含与受保护物种相对应的行。如果您想稍后使用它,可以将其分配给protected_fish,例如protected_fish <- filter(fishydata, Status == "Protected")。我建议不要在fishydata中创建一个仅包含具有受保护状态的物种的新列。您已经拥有数据框中的所有信息。如果您只想查看Species名称,可以使用protected_fish$Species或使用管道命令filter(fishydata, Status == "Protected") %>% select(Species)

将其提取为矢量。