整数数据中一列中的组之间的计算

时间:2017-11-26 20:13:37

标签: r tidyr

我有这样的数据:

df <- (
      tibble(
      ID = rep(1:2, 4),
      Group = c("A", "B", "A", "B","A", "B", "A", "B"),
      Parameter = c("Blood", "Blood", "Height", "Height", "Waist", "Waist", "Hip", "Hip"),
      Value = c(6.3, 6.0, 180, 170, 90, 102, 60, 65))
      )

我想计算“身高”和“腰围”之间以及“腰围”和“臀部”之间的比例。

我有以下解决方案。但我的解决方案需要使用spread()并仅提供“腰到臀”的计算。

     df <- rbind(df,
        spread(df, Parameter, Value)
        %>% transmute(ID = ID,
                      Group = Group,
                      Parameter = "Ratio.Height-to-Hip",
                      Value = Height / Hip,
                      Parameter = "Ratio.Waist-to-Hip",
                      Value = Waist / Hip))

是否可以保持整洁的数据格式并避免切换到长格式?为什么“Height-to-hip”的计算缺失?

3 个答案:

答案 0 :(得分:0)

这是一个可能的解决方案:

# Calculate ratios "Height" vs "Waist" and "Waist" vs "Hip"

# 1. Load packages
library(tidyr)
library(dplyr)

# 2. Data set
df <- tibble(
   id = rep(1:2, 4),
   group = c("A", "B", "A", "B","A", "B", "A", "B"),
   parameter = c("Blood", "Blood", "Height", "Height", "Waist", "Waist", "Hip", "Hip"),
   value = c(6.3, 6.0, 180, 170, 90, 102, 60, 65))

# 3. Filter and transform data set
df <- df %>% 
  filter(parameter %in% c("Height", "Waist", "Hip")) %>% 
  spread(parameter, value)

# 4. Convert column names to lower case
colnames(df) <- tolower(colnames(df))

# 5. Calcutate ratios
df <- df %>% 
  mutate(
    ratio_height_vs_waist = round(height / waist, 2),
    ratio_waist_vs_hip = round(waist / hip, 2))

答案 1 :(得分:0)

主要问题是数据格式不整齐。

整洁格式的两个关键功能是(Wickham, 2013):

  1. 每个变量形成一列;
  2. 每次观察都会形成一排。
  3. 原始格式的数据违反了这两条规则。例如, Parameter 列包含四个变量(Blood,Height,Waist和Hip)。在参数中对几个变量进行分组的连锁效应是每个观察必须在多行中重复。通常,在没有重复测量的情况下重复的标识符行(在这种情况下为 ID )表示两个或多个变量已被分组在单个列下。

    无论如何,这是我尝试清理数据(我使用mutate而不是为了说明目的而转化)。


    # Load packages
    library(dplyr)
    library(tidyr)
    library(magrittr) # For the %<>% function, which I love
    
    # Make data frame, df
    df <- tibble(
            ID = rep(1:2, 4),
            Group = c("A", "B", "A", "B","A", "B", "A", "B"),
            Parameter = c("Blood", "Blood", "Height", "Height", "Waist", "Waist", "Hip", "Hip"),
            Value = c(6.3, 6.0, 180, 170, 90, 102, 60, 65))
    
    # Wrangle df
    df %<>% 
        # ID and Group appear to be repeated, so use them to group_by
        group_by(ID, Group) %>%
        # Spread the Value column by the Parameter column
        spread(key = Parameter,
               value = Value) %>%
        # Ungroup, just because its a good habit
        ungroup() %>%
        # Generate new columns.
        mutate(Ratio_height_to_hip = Height / Hip,
               Ratio_waist_to_hip = Waist / Hip)
    
    # Print df
    df
    #> # A tibble: 2 x 8
    #>      ID Group Blood Height   Hip Waist Ratio_height_to_hip
    #>   <int> <chr> <dbl>  <dbl> <dbl> <dbl>               <dbl>
    #> 1     1     A   6.3    180    60    90            3.000000
    #> 2     2     B   6.0    170    65   102            2.615385
    #> # ... with 1 more variables: Ratio_waist_to_hip <dbl>
    

答案 2 :(得分:0)

df <- df %>%
  spread(Parameter, Value) %>%
  mutate("Ratio.Height-to-Hip" = Height / Hip) %>%
  mutate("Ratio.Waist-to-Hip" = Hip / Waist) %>%
  gather("Parameter", "Value", -c("ID", "Group"))

您的数据格式不整齐;)如果您希望数据整齐,请删除最后一步。