根据因子和数值数据创建新列

时间:2020-10-07 11:10:29

标签: r dplyr

我有一个数据框,我想使用相邻列中的数据将一列中的值转换为新列。 df$species中的每个因素都将成为一个新列,新列中的数据将是df$fish_num中的相应数据,但是,我对于如何执行此操作确实感到困惑,不要这样做。不知道从哪里开始!

这是我当前的df:

     site treatment section species fish_num
 1 Site 1   Control       A    parr        7
 2 Site 1   Control       A  salmon        6
 3 Site 1   Control       B   trout        4
 4 Site 1   Control       B  salmon       12
 5 Site 1 Treatment       A    parr        8
 6 Site 1 Treatment       A  salmon        5
 7 Site 1 Treatment       B   trout       15
 8 Site 1 Treatment       B  salmon        9

df <- structure(list(site = c("Site 1", "Site 1", "Site 1", "Site 1", 
"Site 1", "Site 1", "Site 1", "Site 1"), treatment = c("Control", 
"Control", "Control", "Control", "Treatment", "Treatment", "Treatment",
"Treatment"), section = c("A", "A", "B", "B", "A", "A", "B",
"B"), species = c("parr", "salmon", "trout", "salmon", "parr",
"salmon", "trout", "salmon"), fish_num = c(7L, 6L, 4L, 12L, 8L,
5L, 15L, 9L)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8"))

我希望能够产生以下内容:

   site treatment section fish_num parr salmon trout
1 Site 1   Control       A        7    7      0     0
2 Site 1   Control       A        6    0      6     0
3 Site 1   Control       B        4    0      0     4
4 Site 1   Control       B       12    0     12     0
5 Site 1 Treatment       A        8    8      0     0
6 Site 1 Treatment       A        5    0      5     0
7 Site 1 Treatment       B       15    0      0    15
8 Site 1 Treatment       B        9    0      9     0

我不确定最好的方法!

4 个答案:

答案 0 :(得分:2)

一种方法是使用pivot_wider()中的tidyr,如果您想留在tidyverse中。我还向mutate_at()mutate()添加了调用,以用零替换缺失值并计算列fish_num

library(tidyverse)

df %>%
   pivot_wider(names_from = species,
             values_from = fish_num) %>%
   mutate_at(c("parr", "salmon", "trout"), ~replace(., is.na(.), 0)) %>%
   mutate(fish_num = parr+salmon+trout)

# A tibble: 4 x 7
site   treatment section  parr salmon trout fish_num
<chr>  <chr>     <chr>   <dbl>  <dbl> <dbl>    <dbl>
1 Site 1 Control   A           7      6     0       13
2 Site 1 Control   B           0     12     4       16
3 Site 1 Treatment A           8      5     0       13
4 Site 1 Treatment B           0      9    15       24

答案 1 :(得分:1)

使用tidyverse的{​​{1}}

pivot_wider

答案 2 :(得分:1)

您可以使用以下技巧:R中的TRUE和FALSE分别代表零和一:

- name: 'gcr.io/cloud-builders/gcloud'
  id: scheduler
  waitFor: ['sensor']
  entrypoint: bash
  args:
   - '-c'
   - |
      gcloud scheduler jobs create http NAME --schedule="* * * * *" --uri="uri" ||  echo "Scheduler email-sensor already exist";

要对所有出现的物种执行此操作,请将其循环放在df$salmon <- df$fish_num * (df$species == "salmon") df$trout <- df$fish_num * (df$species == "trout") 上。

答案 3 :(得分:1)

一个简单的基本R选项正在使用xtabs

cbind(df,unclass(t(xtabs(fish_num ~species + q,cbind(df,q = 1:nrow(df))))))

给出

    site treatment section species fish_num parr salmon trout
1 Site 1   Control       A    parr        7    7      0     0
2 Site 1   Control       A  salmon        6    0      6     0
3 Site 1   Control       B   trout        4    0      0     4
4 Site 1   Control       B  salmon       12    0     12     0
5 Site 1 Treatment       A    parr        8    8      0     0
6 Site 1 Treatment       A  salmon        5    0      5     0
7 Site 1 Treatment       B   trout       15    0      0    15
8 Site 1 Treatment       B  salmon        9    0      9     0

数据

> dput(df)
structure(list(site = c("Site 1", "Site 1", "Site 1", "Site 1", 
"Site 1", "Site 1", "Site 1", "Site 1"), treatment = c("Control", 
"Control", "Control", "Control", "Treatment", "Treatment", "Treatment",
"Treatment"), section = c("A", "A", "B", "B", "A", "A", "B",
"B"), species = c("parr", "salmon", "trout", "salmon", "parr",
"salmon", "trout", "salmon"), fish_num = c(7L, 6L, 4L, 12L, 8L,
5L, 15L, 9L)), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8"))