用dplyr进行表转换

时间:2019-05-08 07:44:34

标签: r dataframe dplyr subset

我有一个名为df的表。该表包含三列(代码,描述和费率)。

enter image description here

#CODE

df<-data.frame(
  Code=c("01","0101","0101 21 00 00","0101 29","0101 29 10 00","0101 29 
          90 00","0101 30 00 00","0101 90 00 00","NA","0102 21","0102 21 10 
          00","0102 21 30 00","0102 21 90 00"),
  Description=c("LIVE ANIMALS", "Live horses, asses, mules","Live horses, 
                asses, mules and hinnies","Pure-bred breeding horses","Live 
                horses (excl. pure-bred for breeding)","Horses for 
                slaughter","Live horses (excl. for slaughter, pure-bred for 
                breeding)","Live asses","Live mules and hinnies","Live 
                bovine animals","Pure-bred cattle for breeding","Pure-bred 
                breeding heifers female bovines that have never 
                calved","Pure-bred breeding cows (excl. heifers)"),
  Rate=c("NA","NA","5","NA","5","10","15","7","NA","NA","10","15","20"))

所以我的目的是制作上表的子集,该子表仅包含具有10位数字的Code列字段并计算平均值。这意味着代码只能提取10位代码的行(0101 21 00 00,0101 29 10 00,0101 29 90 00,0101 30 00 00,0101 90 00 00,0102 21 10 00,0102 21 30 00和0102 21 90 00),如下表所示。列“费率”的平均值为2,75。

enter image description here

那么有人可以帮助我如何转换此表吗?

1 个答案:

答案 0 :(得分:1)

我们可以从代码中删除空格,然后计算字符数。这样就只能使用filter到10位代码。然后,如果我们想要平均值,可以添加一个summarise(请注意,它不是2.75)

library(tidyverse)
df <- tibble(Code = c("01", "0101", "0101 21 00 00", "0101 29", "0101 29 10 00", "0101 29 90 00", "0101 30 00 00", "0101 90 00 00", "NA", "0102 21", "0102 21 10 00", "0102 21 30 00", "0102 21 90 00"), Description = c("LIVE ANIMALS", "Live horses, asses, mules", "Live horses,  asses, mules and hinnies", "Pure-bred breeding horses", "Live horses (excl. pure-bred for breeding)", "Horses for slaughter", "Live horses (excl. for slaughter, pure-bred for breeding)", "Live asses", "Live mules and hinnies", "Live  bovine animals", "Pure-bred cattle for breeding", "Pure-bred breeding heifers female bovines that have never calved", "Pure-bred breeding cows (excl. heifers)"), Rate = c("NA", "NA", "5", "NA", "5", "10", "15", "7", "NA", "NA", "10", "15", "20"))
df %>%
  filter(Code %>% str_remove_all("\\s") %>% str_length %>% `==`(10))
#> # A tibble: 8 x 3
#>   Code         Description                                            Rate 
#>   <chr>        <chr>                                                  <chr>
#> 1 0101 21 00 … Live horses,  asses, mules and hinnies                 5    
#> 2 0101 29 10 … Live horses (excl. pure-bred for breeding)             5    
#> 3 0101 29 90 … Horses for slaughter                                   10   
#> 4 0101 30 00 … Live horses (excl. for slaughter, pure-bred for breed… 15   
#> 5 0101 90 00 … Live asses                                             7    
#> 6 0102 21 10 … Pure-bred cattle for breeding                          10   
#> 7 0102 21 30 … Pure-bred breeding heifers female bovines that have n… 15   
#> 8 0102 21 90 … Pure-bred breeding cows (excl. heifers)                20

df %>%
  filter(Code %>% str_remove_all("\\s") %>% str_length %>% `==`(10)) %>%
  summarise(mean_rate = mean(as.integer(Rate)))
#> # A tibble: 1 x 1
#>   mean_rate
#>       <dbl>
#> 1      10.9

reprex package(v0.2.1)于2019-05-08创建