使用%>%时,R bind_rows表现不一致

时间:2018-02-01 23:59:00

标签: r tidyverse

我有以下的tibble

tst <- tibble(
  x = 'actual data',
  age_1 = 5.3,
  age_2 = 6.6,
  age_3 = 8.3,
  age_4 = 20.3,
  age_5 = 25.3,
  age_6 = 30.8,
  age_7 = 31.3,
  age_8 = 22.3,
  age_9 = 18.3,
  age_10 = 14.3
)

我可以使用

创建一个舍入的第一行值的新行
demo <- tst %>% 
  c(x='round',round(.[nrow(.),2:(ncol(.))])) %>% 
  bind_rows(tst,.)

demo
# A tibble: 2 x 11
  x           age_1 age_2 age_3 age_4 age_5 age_6 age_7 age_8 age_9 age_10
  <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
1 actual data  5.30  6.60  8.30  20.3  25.3  30.8  31.3  22.3  18.3   14.3
2 round        5.00  7.00  8.00  20.0  25.0  31.0  31.0  22.0  18.0   14.0

现在,当我尝试使用相同的代码创建另一行楼层值时,我收到错误

demo %>%
  c(x='floor',round(demo[1,2:(ncol(demo))])) %>% 
  bind_rows(demo,.)
Error in bind_rows_(x, .id) : Argument 12 must be length 2, not 1

但是,如果我按照更圆的方式做我认为是同样的事情,那就可以了。

i <- c(x='floor',round(demo[1,2:(ncol(demo))]))
bind_rows(demo,i)
# A tibble: 3 x 11
  x           age_1 age_2 age_3 age_4 age_5 age_6 age_7 age_8 age_9 age_10
  <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
1 actual data  5.30  6.60  8.30  20.3  25.3  30.8  31.3  22.3  18.3   14.3
2 round        5.00  7.00  8.00  20.0  25.0  31.0  31.0  22.0  18.0   14.0
3 floor        5.00  7.00  8.00  20.0  25.0  31.0  31.0  22.0  18.0   14.0

我不知道为什么在运行与创建舍入行时基本相同的代码时出现错误。有任何想法吗?如果您有更好的方法(根据上面的行添加行)的建议,我很乐意听到它们。

1 个答案:

答案 0 :(得分:1)

使用宽数据集在R中可能非常直观。以长格式处理数据几乎总是优先考虑。例如,直到您可能想要在宽表中显示的位置。

我会重塑您的数据:

library(tidyr)
library(dplyr)

tst <- tibble(
  x = 'actual data',
  age_1 = 5.3,
  age_2 = 6.6,
  age_3 = 8.3,
  age_4 = 20.3,
  age_5 = 25.3,
  age_6 = 30.8,
  age_7 = 31.3,
  age_8 = 22.3,
  age_9 = 18.3,
  age_10 = 14.3
)

df <- tst %>%
  select(-x) %>%
  gather(var, actual) %>%
  mutate(
    var = "age",
    round = round(actual),
    floor = floor(actual)
  )

df
# # A tibble: 10 x 4
#  var   actual round floor
#  <chr>  <dbl> <dbl> <dbl>
#  1 age   5.30  5.00  5.00
#  2 age   6.60  7.00  6.00
#  3 age   8.30  8.00  8.00
#  4 age  20.3  20.0  20.0 
#  5 age  25.3  25.0  25.0 
#  6 age  30.8  31.0  30.0 
#  7 age  31.3  31.0  31.0 
#  8 age  22.3  22.0  22.0 
#  9 age  18.3  18.0  18.0 
# 10 age  14.3  14.0  14.0 

您现在可以看到添加roundfloor的难易程度。要从控制台以宽格式快速显示,您可以将其转换为原始问题的表示。

df %>% select(-var) %>% t

#        [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# actual  5.3  6.6  8.3 20.3 25.3 30.8 31.3 22.3 18.3  14.3
# round   5.0  7.0  8.0 20.0 25.0 31.0 31.0 22.0 18.0  14.0
# floor   5.0  6.0  8.0 20.0 25.0 30.0 31.0 22.0 18.0  14.0

采用tidy方法,而不是快速查看:

df %>%
  mutate(cols = paste(var, sprintf("%02d", seq_len(nrow(.))), sep = "_")) %>%
  gather(var, value, -cols) %>%
  filter(var != "var") %>%
  spread(cols, value) %>%
  mutate_at(vars(2:length(.)), as.numeric)

# # A tibble: 3 x 11
#   var    age_01 age_02 age_03 age_04 age_05 age_06 age_07 age_08 age_09 age_10
#   <chr>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
# 1 actual   5.30   6.60   8.30   20.3   25.3   30.8   31.3   22.3   18.3   14.3
# 2 floor    5.00   6.00   8.00   20.0   25.0   30.0   31.0   22.0   18.0   14.0
# 3 round    5.00   7.00   8.00   20.0   25.0   31.0   31.0   22.0   18.0   14.0
  

N.B。在此变体中,我添加了mutate_at,因为数字值被混合类型的gather强制转换为字符。