Dplyr Mutate_each用于成对的列集

时间:2016-05-13 18:46:36

标签: r dplyr

有没有办法使用dplyr :: mutate_each实现以下转换?

data.frame(x1 = 1:5, x2 = 6:10, y1 = rnorm(5), y2 = rnorm(5)) %>%
  mutate(diff1 = x1 - y1, diff2 = x2 - y2) 

##   x1 x2          y1         y2       diff1     diff2
## 1  1  6  1.03645018 -0.8602099 -0.03645018  6.860210
## 2  2  7 -1.10790835  1.6912875  3.10790835  5.308712
## 3  3  8  0.95452119  2.7232657  2.04547881  5.276734
## 4  4  9  0.01370762  1.6385765  3.98629238  7.361424
## 5  5 10  0.19354354 -1.0464360  4.80645646 11.046436

我意识到这是一个简单的例子,并且很容易按照我的描述完成,但我正在尝试使用更大的列来完成类似的事情。

谢谢

2 个答案:

答案 0 :(得分:5)

正如@Gregor在评论中提到的,如果您想使用 namespace WebServiceTest1.Controllers { public class Konekt1Controller : ApiController { public IHttpActionResult Post(Rootobject dto) { //Do something here. return Ok(); } } } ,最好以整齐的格式获取数据。这是一个想法:

dplyr

给出了:

library(dplyr)
library(tidyr)

df %>%
  add_rownames() %>%
  gather(key, val, -rowname) %>%
  separate(key, c("var", "num"), "(?<=[a-z]) ?(?=[0-9])") %>%
  spread(var, val) %>%
  mutate(diff = x - y) 

如果由于某种原因在执行操作后仍然希望数据采用宽格式,则可以添加到管道中:

#Source: local data frame [10 x 5]
#
#   rowname   num     x           y        diff
#     (chr) (chr) (dbl)       (dbl)       (dbl)
#1        1     1     1  1.03645018 -0.03645018
#2        1     2     6 -0.86020990  6.86020990
#3        2     1     2 -1.10790835  3.10790835
#4        2     2     7  1.69128750  5.30871250
#5        3     1     3  0.95452119  2.04547881
#6        3     2     8  2.72326570  5.27673430
#7        4     1     4  0.01370762  3.98629238
#8        4     2     9  1.63857650  7.36142350
#9        5     1     5  0.19354354  4.80645646
#10       5     2    10 -1.04643600 11.04643600

哪会给:

  gather(key, value, -(rowname:num)) %>%
  unite(key_num, key, num, sep = "") %>%
  spread(key_num, value)

数据

#Source: local data frame [5 x 7]
#
#  rowname       diff1     diff2    x1    x2          y1         y2
#    (chr)       (dbl)     (dbl) (dbl) (dbl)       (dbl)      (dbl)
#1       1 -0.03645018  6.860210     1     6  1.03645018 -0.8602099
#2       2  3.10790835  5.308713     2     7 -1.10790835  1.6912875
#3       3  2.04547881  5.276734     3     8  0.95452119  2.7232657
#4       4  3.98629238  7.361423     4     9  0.01370762  1.6385765
#5       5  4.80645646 11.046436     5    10  0.19354354 -1.0464360

答案 1 :(得分:1)

这不使用mutate_each,也不是很漂亮,我认为它不会很快,但是:

#create data set
p<-data.frame(x1 = 1:5, x2 = 6:10,
          y1 = rnorm(5), y2 = rnorm(5),
          z1 = 11:15, z2 = rnorm(5),
          w1 = rchisq(5,2), w2 = rgamma(5, .2)) 

#subset the columns by their column number and subtract them
p[,ncol(p)+seq(1,ncol(p)/2, by = 1)]<-
p[,seq(1,ncol(p),by = 2)]-p[,seq(2,ncol(p), by = 2)]

data.frame p应该更新为原始列的一半,新列包含每对(1-2,3-4,5-6)原始的差异。