我有一个与此类似的数据集:
library(tidyverse)
df <- tibble(
subjid = 1:5,
event_1 = c("Watery eyes", # Event number 1
"Sore throat",
"Vomiting",
"Gastroenteritis viral",
"Dry Mouth"),
start_date_1 = as.Date("2017-01-02") + 0:4,
stop_date_1 = as.Date("2017-01-03") + 0:4,
severity_1 = 1,
related_to_drug_1 = 0,
event_2 = c("Nausea", # Event number 2
"Dizziness",
"Cough",
"Disorientation",
"Diarrhea"),
start_date_2 = as.Date("2017-02-02") + 0:4,
stop_date_2 = as.Date("2017-02-03") + 0:4,
severity_2 = 2,
related_to_drug_2 = 1,
event_3 = c("Eczema", # Event number 3
"Sinusitis",
"Abdominal discomfort",
"Muscle spasms",
"Nasopharyngitis"),
start_date_3 = as.Date("2017-03-02") + 0:4,
stop_date_3 = as.Date("2017-03-03") + 0:4,
severity_3 = 2,
related_to_drug_3 = 1
)
df
# A tibble: 5 × 16
subjid event_1 start_date_1 stop_date_1 severity_1 related_to_drug_1 event_2 start_date_2 stop_date_2 severity_2 related_to_drug_2 event_3
<int> <chr> <date> <date> <dbl> <dbl> <chr> <date> <date> <dbl> <dbl> <chr>
1 1 Watery eyes 2017-01-02 2017-01-03 1 0 Nausea 2017-02-02 2017-02-03 2 1 Eczema
2 2 Sore throat 2017-01-03 2017-01-04 1 0 Dizziness 2017-02-03 2017-02-04 2 1 Sinusitis
3 3 Vomiting 2017-01-04 2017-01-05 1 0 Cough 2017-02-04 2017-02-05 2 1 Abdominal discomfort
4 4 Gastroenteritis viral 2017-01-05 2017-01-06 1 0 Disorientation 2017-02-05 2017-02-06 2 1 Muscle spasms
5 5 Dry Mouth 2017-01-06 2017-01-07 1 0 Diarrhea 2017-02-06 2017-02-07 2 1 Nasopharyngitis
# ... with 4 more variables: start_date_3 <date>, stop_date_3 <date>, severity_3 <dbl>, related_to_drug_3 <dbl>
但是,还有更多的数据行和超过100个&#34;事件&#34; /列系列。数据框由每个主题的行组成,包含不良事件及其相关属性,列在以下划线命名的列中,以指示它们属于哪个事件。我想用tidyr将这些事件收集到像这样的元素中:
# A tibble: 15 × 7
subjid event_number event start_date stop_date severity related_to_drug
<int> <int> <chr> <date> <date> <int> <int>
1 1 1 Watery eyes 2017-01-02 2017-01-03 1 0
2 2 1 Sore throat 2017-01-03 2017-01-04 1 0
3 3 1 Vomiting 2017-01-04 2017-01-05 1 0
4 4 1 Gastroenteritis viral 2017-01-05 2017-01-06 1 0
5 5 1 Dry Mouth 2017-01-06 2017-01-07 1 0
6 1 2 Nausea 2017-02-02 2017-02-03 2 1
7 2 2 Dizziness 2017-02-03 2017-02-04 2 1
8 3 2 Cough 2017-02-04 2017-02-05 2 1
9 4 2 Disorientation 2017-02-05 2017-02-06 2 1
10 5 2 Diarrhea 2017-02-06 2017-02-07 2 1
11 1 3 Eczema 2017-03-02 2017-03-03 3 2
12 2 3 Sinusitis 2017-03-03 2017-03-04 3 2
13 3 3 Abdominal discomfort 2017-03-04 2017-03-05 3 2
14 4 3 Muscle spasms 2017-03-05 2017-03-06 3 2
15 5 3 Nasopharyngitis 2017-03-06 2017-03-07 3 2
每个不良事件都有一行,标识该特定事件的属性列。
答案 0 :(得分:1)
您可以使用以下代码执行此操作:
df %>%
gather(Var,Val,-1) %>%
mutate(Var = gsub('_(\\d+)','!!\\1',Var)) %>%
separate(Var,c('Var','Event'),sep = '!!') %>%
spread(Var,Val)
不幸的是,这会破坏列的类,并且需要修复,您可以通过调用mutate
来执行此操作。
(另请注意,收集后的mutate
行只是因为您的列名中包含'_'而我想拆分事件编号。)
答案 1 :(得分:1)
这是一种更复杂的方式,但非常重要的是,保留了类。
从列名开始,根据事件编号拆分它们,然后为每个事件创建一个数据帧,最后将它们垂直堆叠:
names(df) %>%
setdiff("subjid") %>%
split(sub(".*_(\\d+)$", "\\1", x = .)) %>%
map(~ select_(.data = df, .dots = c("subjid", .x))) %>%
map(~ setNames(.x, nm = sub("(.*)_\\d+$", "\\1", x = names(.x)))) %>%
map2(names(.), ~ mutate(.x, event_number = .y)) %>%
bind_rows() %>%
select(subjid, event_number, everything())
# # A tibble: 15 × 7
# subjid event_number event start_date stop_date severity related_to_drug
# <int> <chr> <chr> <date> <date> <dbl> <dbl>
# 1 1 1 Watery eyes 2017-01-02 2017-01-03 1 0
# 2 2 1 Sore throat 2017-01-03 2017-01-04 1 0
# 3 3 1 Vomiting 2017-01-04 2017-01-05 1 0
# 4 4 1 Gastroenteritis viral 2017-01-05 2017-01-06 1 0
# 5 5 1 Dry Mouth 2017-01-06 2017-01-07 1 0
# 6 1 2 Nausea 2017-02-02 2017-02-03 2 1
# 7 2 2 Dizziness 2017-02-03 2017-02-04 2 1
# 8 3 2 Cough 2017-02-04 2017-02-05 2 1
# 9 4 2 Disorientation 2017-02-05 2017-02-06 2 1
# 10 5 2 Diarrhea 2017-02-06 2017-02-07 2 1
# 11 1 3 Eczema 2017-03-02 2017-03-03 2 1
# 12 2 3 Sinusitis 2017-03-03 2017-03-04 2 1
# 13 3 3 Abdominal discomfort 2017-03-04 2017-03-05 2 1
# 14 4 3 Muscle spasms 2017-03-05 2017-03-06 2 1
# 15 5 3 Nasopharyngitis 2017-03-06 2017-03-07 2 1