我有一个名为df的表。该表包含三列(代码,描述和费率)。
#CODE
df<-data.frame(
Code=c("01","0101","0101 21 00 00","0101 29","0101 29 10 00","0101 29
90 00","0101 30 00 00","0101 90 00 00","NA","0102 21","0102 21 10
00","0102 21 30 00","0102 21 90 00"),
Description=c("LIVE ANIMALS", "Live horses, asses, mules","Live horses,
asses, mules and hinnies","Pure-bred breeding horses","Live
horses (excl. pure-bred for breeding)","Horses for
slaughter","Live horses (excl. for slaughter, pure-bred for
breeding)","Live asses","Live mules and hinnies","Live
bovine animals","Pure-bred cattle for breeding","Pure-bred
breeding heifers female bovines that have never
calved","Pure-bred breeding cows (excl. heifers)"),
Rate=c("NA","NA","5","NA","5","10","15","7","NA","NA","10","15","20"))
所以我的目的是制作上表的子集,该子表仅包含具有10位数字的Code列字段并计算平均值。这意味着代码只能提取10位代码的行(0101 21 00 00,0101 29 10 00,0101 29 90 00,0101 30 00 00,0101 90 00 00,0102 21 10 00,0102 21 30 00和0102 21 90 00),如下表所示。列“费率”的平均值为2,75。
那么有人可以帮助我如何转换此表吗?
答案 0 :(得分:1)
我们可以从代码中删除空格,然后计算字符数。这样就只能使用filter
到10位代码。然后,如果我们想要平均值,可以添加一个summarise
(请注意,它不是2.75)
library(tidyverse)
df <- tibble(Code = c("01", "0101", "0101 21 00 00", "0101 29", "0101 29 10 00", "0101 29 90 00", "0101 30 00 00", "0101 90 00 00", "NA", "0102 21", "0102 21 10 00", "0102 21 30 00", "0102 21 90 00"), Description = c("LIVE ANIMALS", "Live horses, asses, mules", "Live horses, asses, mules and hinnies", "Pure-bred breeding horses", "Live horses (excl. pure-bred for breeding)", "Horses for slaughter", "Live horses (excl. for slaughter, pure-bred for breeding)", "Live asses", "Live mules and hinnies", "Live bovine animals", "Pure-bred cattle for breeding", "Pure-bred breeding heifers female bovines that have never calved", "Pure-bred breeding cows (excl. heifers)"), Rate = c("NA", "NA", "5", "NA", "5", "10", "15", "7", "NA", "NA", "10", "15", "20"))
df %>%
filter(Code %>% str_remove_all("\\s") %>% str_length %>% `==`(10))
#> # A tibble: 8 x 3
#> Code Description Rate
#> <chr> <chr> <chr>
#> 1 0101 21 00 … Live horses, asses, mules and hinnies 5
#> 2 0101 29 10 … Live horses (excl. pure-bred for breeding) 5
#> 3 0101 29 90 … Horses for slaughter 10
#> 4 0101 30 00 … Live horses (excl. for slaughter, pure-bred for breed… 15
#> 5 0101 90 00 … Live asses 7
#> 6 0102 21 10 … Pure-bred cattle for breeding 10
#> 7 0102 21 30 … Pure-bred breeding heifers female bovines that have n… 15
#> 8 0102 21 90 … Pure-bred breeding cows (excl. heifers) 20
df %>%
filter(Code %>% str_remove_all("\\s") %>% str_length %>% `==`(10)) %>%
summarise(mean_rate = mean(as.integer(Rate)))
#> # A tibble: 1 x 1
#> mean_rate
#> <dbl>
#> 1 10.9
由reprex package(v0.2.1)于2019-05-08创建