下面的示例数据字段
Event Ethnicity Score
50 yd dash Asian 7
50 yd dash Afr. Am 8
50 yd dash White 5
Hurdle Asian 6
Hurdle Afr. Am 8
Hurdle White 9
我正在尝试确定每个事件中某些种族之间的差异,希望使用dplyr或tidyverse中的某些内容,但会采取任何答案/帮助。例如,每个事件中亚洲组和白人组之间的差异
例如,亚洲(7)-白色(5)=差异(2),
产生类似于以下内容的输出:
Event Difference
50 yd dash 2
Hurdle -3
答案 0 :(得分:4)
使用以下内容应该可以帮助您:
library(tidyverse)
df %>%
spread(Ethnicity, Score) %>%
mutate("Difference" = Asian - White) %>%
select(-Asian, -White, -`Afr. Am`)
# Event Difference
#1 50 yd dash 2
#2 Hurdle -3
数据。
df <-
structure(list(Event = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("50 yd dash",
"Hurdle"), class = "factor"), Ethnicity = structure(c(2L, 1L,
3L, 2L, 1L, 3L), .Label = c("Afr. Am", "Asian", "White"), class = "factor"),
Score = c(7L, 8L, 5L, 6L, 8L, 9L)), class = "data.frame", row.names = c(NA,
-6L))
@AntoniosK已经发布了一种read.table
的方式来读取OP发布的数据,但是我的方法有些不同。我没有从列的值中删除空格,而是将其放在单引号之间。 (由于指令将参数text
的值放在 double 引号之间,因此必须为单引号。)
df <- read.table(text = "
Event Ethnicity Score
'50 yd dash' Asian 7
'50 yd dash' 'Afr. Am' 8
'50 yd dash' White 5
Hurdle Asian 6
Hurdle 'Afr. Am' 8
Hurdle White 9
", header = TRUE)
答案 1 :(得分:3)
数据
df = read.table(text = "
Event Ethnicity Score
50yddash Asian 7
50yddash Afr.Am 8
50yddash White 5
Hurdle Asian 6
Hurdle Afr.Am 8
Hurdle White 9
", header=T, stringsAsFactors=F)
第一种方法,您可以在其中手动指定感兴趣的种族:
library(dplyr)
df %>%
group_by(Event) %>%
summarise(Diff = Score[Ethnicity=="Asian"] - Score[Ethnicity=="White"])
# # A tibble: 2 x 2
# Event Diff
# <chr> <int>
# 1 50yddash 2
# 2 Hurdle -3
您可以将此代码段用作函数(输入两个感兴趣的种族)。
第二种方法,您可以在其中为种族和事件的所有唯一组合创建所有差异:
library(tidyverse)
# create vectorised function that calculates the difference
# based on a given event and ethnicities
f = function(event, eth1, eth2) {
df$Score[df$Event==event & df$Ethnicity==eth1] -
df$Score[df$Event==event & df$Ethnicity==eth2] }
f = Vectorize(f)
data.frame(t(combn(unique(df$Ethnicity), 2)), stringsAsFactors = F) %>% # create combinations of ethnicities
mutate(Event = list(unique(df$Event))) %>% # create combinations with events
unnest() %>%
mutate(Diff = f(Event, X1, X2)) # apply the function
# X1 X2 Event Diff
# 1 Asian Afr.Am 50yddash -1
# 2 Asian Afr.Am Hurdle -2
# 3 Asian White 50yddash 2
# 4 Asian White Hurdle -3
# 5 Afr.Am White 50yddash 3
# 6 Afr.Am White Hurdle -1
此过程使用字母顺序创建唯一差异。如果您想全部使用(即亚洲白人和白人),可以使用此
expand.grid(Event = unique(df$Event),
X1 = unique(df$Ethnicity),
X2 = unique(df$Ethnicity)) %>%
filter(X1 != X2) %>%
mutate(Diff = f(Event, X1, X2))
# Event X1 X2 Diff
# 1 50yddash Afr.Am Asian 1
# 2 Hurdle Afr.Am Asian 2
# 3 50yddash White Asian -2
# 4 Hurdle White Asian 3
# 5 50yddash Asian Afr.Am -1
# 6 Hurdle Asian Afr.Am -2
# 7 50yddash White Afr.Am -3
# 8 Hurdle White Afr.Am 1
# 9 50yddash Asian White 2
# 10 Hurdle Asian White -3
# 11 50yddash Afr.Am White 3
# 12 Hurdle Afr.Am White -1
答案 2 :(得分:2)
df %>%
mutate(rn = row_number()) %>%
spread(Ethnicity, Score) %>%
group_by(Event) %>%
summarise(Difference = max(Asian, na.rm = T) - max(White, na.rm = T))
# # A tibble: 2 x 2
# Event Difference
# <chr> <dbl>
# 1 50 yd dash 2
# 2 Hurdle -3
数据:
df <-
structure(list(Event = c("50 yd dash", "50 yd dash", "50 yd dash",
"Hurdle", "Hurdle", "Hurdle"), Ethnicity = c("Asian", "Afr. Am",
"White", "Asian", "Afr. Am", "White"), Score = c(7, 8, 5, 6,
8, 9)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))