如何收集然后变异新列然后再次扩展到宽格式

时间:2016-07-08 07:02:10

标签: r dplyr tidyr

使用tidyr / dplyr,我有一些因素列我想要Z分数,然后改变平均Z分数,同时保留原始数据以供参考。

我想避免在tidyr / dplyr中使用for循环,因此我收集数据并在单个列上执行计算(Z-score)。但是,我正在努力恢复宽幅格式。

这是一个MWE:

library(dplyr)
library(tidyr)

# Original Data
dfData <- data.frame(
  Name = c("Steve","Jwan","Ashley"),
  A = c(10,20,12),
  B = c(0.2,0.3,0.5)
) %>% tbl_df() 

# Gather to Z-score
dfLong <- dfData %>% gather("Factor","Value",A:B) %>% 
  mutate(FactorZ = paste0("Z_",Factor)) %>% 
  group_by(Factor) %>% 
  mutate(ValueZ = (Value - mean(Value,na.rm = TRUE))/sd(Value,na.rm = TRUE))

# Now go wide to do some mutations (eg Z)Avg = (Z_A + Z_B)/2)

# This does not work
dfWide <- dfLong %>% 
  spread(Factor,Value) %>%
  spread(FactorZ,ValueZ)%>% 
  mutate(Z_Avg = (Z_A+Z_B)/2)


# This is the desired result
dfDesired <- dfData %>% mutate(Z_A = (A - mean(A,na.rm = TRUE))/sd(A,na.rm = TRUE)) %>% mutate(Z_B = (B - mean(B,na.rm = TRUE))/sd(B,na.rm = TRUE)) %>% 
                    mutate(Z_Avg = (Z_A+Z_B)/2)

感谢您的帮助/输入!

4 个答案:

答案 0 :(得分:3)

使用dplyr(版本0.5.0)的另一种方法

library(dplyr)

dfData  %>% 
   mutate_each(funs(Z = scale(.)), -Name) %>% 
   mutate(Z_Avg = (A_Z+B_Z)/2)

答案 1 :(得分:2)

means <-function(x)mean(x, na.rm=T)
dfWide %>% group_by(Name) %>% summarise_each(funs(means)) %>% mutate(Z_Avg = (Z_A + Z_B)/2)

# A tibble: 3 x 6
    Name     A     B        Z_A        Z_B      Z_Avg
   <chr> <dbl> <dbl>      <dbl>      <dbl>      <dbl>
1 Ashley    12   0.5 -0.3779645  1.0910895  0.3565625
2   Jwan    20   0.3  1.1338934 -0.2182179  0.4578378
3  Steve    10   0.2 -0.7559289 -0.8728716 -0.8144003

答案 2 :(得分:1)

这是一种长格式和宽格式的方法。对于z变换,您可以使用基函数scale。此外,这种方法包括一个连接,用于组合原始数据帧和包含新值的数据帧。

dfLong <- dfData %>%
  gather(Factor, Value, A:B) %>%
  group_by(Factor) %>%
  mutate(ValueZ = scale(Value))

#     Name Factor Value     ValueZ
#   <fctr>  <chr> <dbl>      <dbl>
# 1  Steve      A  10.0 -0.7559289
# 2   Jwan      A  20.0  1.1338934
# 3 Ashley      A  12.0 -0.3779645
# 4  Steve      B   0.2 -0.8728716
# 5   Jwan      B   0.3 -0.2182179
# 6 Ashley      B   0.5  1.0910895   


dfWide <- dfData %>% inner_join(dfLong %>% 
                                  ungroup %>%
                                  select(-Value) %>%
                                  mutate(Factor = paste0("Z_", Factor)) %>%
                                  spread(Factor, ValueZ) %>%
                                  mutate(Z_Avg = (Z_A + Z_B) / 2))

#     Name     A     B        Z_A        Z_B      Z_Avg
#   <fctr> <dbl> <dbl>      <dbl>      <dbl>      <dbl>
# 1  Steve    10   0.2 -0.7559289 -0.8728716 -0.8144003
# 2   Jwan    20   0.3  1.1338934 -0.2182179  0.4578378
# 3 Ashley    12   0.5 -0.3779645  1.0910895  0.3565625

答案 3 :(得分:0)

我会以宽幅格式完成所有操作。无需在长格式和宽格式之间切换。

dfData %>% 
 mutate(Z_A=(A-mean(unlist(dfData$A)))/sd(unlist(dfData$A)),
        Z_B=(B-mean(unlist(dfData$B)))/sd(unlist(dfData$B))) %>% 
 mutate(Z_AVG=(Z_A+Z_B)/2)