Question

我有一个难以达到的目标，以便于我的分析;据我所知，没有类似的问题。我在Excel中有一个非常长的数据框，我在这里以简单的形式再现 - 在R环境中：

LoginLayoutRoute.propTypes = {
  component: PropTypes.func.isRequired,
}

为了更好地理解我工作的环境，'sp'是物种，而A *是我检测到特定物种的地点。

我想将此数据框转换为另一个数据框，如下所示：

The dataframe I want to obtain in an automated way

第一列包含站点的名称，以下是所有物种名称（显然，只重复一次）。然后，我需要为存在分配'1'，并为给定站点中的缺席分配'0'。

我花了很多时间来达到我的目标，但这对我的R语法能力来说太复杂了。

任何人都可以帮助我吗？

Answer 1

您可以采用长格式收集数据，以便处理和添加显示网站物种存在的列。然后使用reshape2::dcast以宽格式传播数据：

library(tidyverse)
library(reshape2)

df %>% gather(Site, Species) %>%
  filter(!is.na(Species)) %>%
  mutate(value = 1) %>%      #Species are present on a site
  dcast(Site~Species, value.var = "value", fill = 0)

#   Site sp1 sp10 sp2 sp3 sp4 sp5 sp6 sp7 sp8 sp9
# 1   A1   1    0   1   1   1   0   0   1   1   0
# 2   A2   1    0   0   1   1   0   0   1   0   1
# 3   A3   0    1   0   0   0   1   1   1   0   0
# 4   A4   1    1   1   0   0   0   0   1   0   1
# 5   A5   0    0   0   1   1   0   0   0   0   0

Answer 2

您可以使用gather中的spread和tidyverse：

library(tidyverse)

df %>%
  gather(A, sp) %>%
  filter(!is.na(sp)) %>%
  group_by(A, sp) %>%
  count() %>%
  spread(sp, n) %>%
  replace(., is.na(.), 0)

  # A tibble: 5 x 11
# Groups:   A [5]
  A       sp1  sp10   sp2   sp3   sp4   sp5   sp6   sp7   sp8   sp9
* <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A1       1.    0.    1.    1.    1.    0.    0.    1.    1.    0.
2 A2       1.    0.    0.    1.    1.    0.    0.    1.    0.    1.
3 A3       0.    1.    0.    0.    0.    1.    1.    1.    0.    0.
4 A4       1.    1.    1.    0.    0.    0.    0.    1.    0.    1.
5 A5       0.    0.    0.    1.    1.    0.    0.    0.    0.    0.

操作数据帧（使用R）

2 个答案: