我正在尝试转换长数据,以便重复行值成为标题。数据如下所示:
# A tibble: 12 x 2
x1 x2
<chr> <chr>
1 Position 1
2 Name Jon Ellis
3 Sex m
4 Year 2017
5 Category Open
6 Time 06:37:27
7 Position 2
8 Name Craig Holgate
9 Sex m
10 Year 2015
11 Category Open
12 Time 06:43:45
我希望我的重复行值(“Position”、“Name”、“Sex”、“Year”、“Category”、“Time”)成为标题,但尽管进行了多次尝试还没有弄清楚如何传播/旋转数据以实现这一目标。感谢指点,谢谢。
structure(list(x1 = c("Position", "Name", "Sex", "Year", "Category",
"Time", "Position", "Name", "Sex", "Year", "Category", "Time",
"Position", "Name", "Sex", "Year", "Category", "Time", "Position",
"Name", "Sex", "Year", "Category", "Time"), x2 = c("1", "Jon Ellis",
"m", "2017", "Open", "06:37:27", "2", "Craig Holgate", "m", "2015",
"Open", "06:43:45", "3", "Stuart Leaney", "m", "2018", "Open",
"06:46:03", "4", "Craig Holgate", "m", "2013", "Open", "06:47:19"
)), row.names = c(NA, -24L), class = c("tbl_df", "tbl", "data.frame"
))
答案 0 :(得分:5)
1) dplyr/tidyr 添加分组列row
,将长格式转换为宽格式,删除row
并转换列类型。
library(dplyr)
library(tidyr)
DF %>%
mutate(row = cumsum(x1 == "Position")) %>%
pivot_wider(names_from = x1, values_from = x2) %>%
select(-row) %>%
type.convert(as.is = TRUE)
给予:
# A tibble: 2 x 6
Position Name Sex Year Category Time
<int> <chr> <chr> <int> <chr> <chr>
1 1 Jon Ellis m 2017 Open 06:37:27
2 2 Craig Holgate m 2015 Open 06:43:45
2) Base R 使用字符串操作转换为 Debian 控制文件格式,并使用 read.dcf
创建字符矩阵读取它,转换为数据框并修复类型。
txt <- with(DF, sub("Position", "\nPosition", sprintf("%s: %s", x1, x2)))
type.convert(as.data.frame(read.dcf(textConnection(txt))), as.is = TRUE)
给予:
Position Name Sex Year Category Time
1 1 Jon Ellis m 2017 Open 06:37:27
2 2 Craig Holgate m 2015 Open 06:43:45
或者用只需要基数 R 的 Bizarro 管道表示:
DF ->.;
with(., sub("Position", "\nPosition", sprintf("%s: %s", x1, x2))) ->.;
textConnection(.) ->.;
read.dcf(.) ->.;
as.data.frame(.) ->.;
type.convert(., as.is = TRUE)
DF <- structure(list(x1 = c("Position", "Name", "Sex", "Year", "Category",
"Time", "Position", "Name", "Sex", "Year", "Category", "Time"
), x2 = c("1", "Jon Ellis", "m", "2017", "Open", "06:37:27",
"2", "Craig Holgate", "m", "2015", "Open", "06:43:45")), class = "data.frame", row.names = c(NA,
-12L))
答案 1 :(得分:4)
假设中间没有任何属性丢失,这将起作用
library(tidyverse)
df %>% pivot_wider(names_from = x1, values_from = x2, values_fn = list) %>%
unnest(cols = everything())
A tibble: 4 x 6
Position Name Sex Year Category Time
<chr> <chr> <chr> <chr> <chr> <chr>
1 1 Jon Ellis m 2017 Open 06:37:27
2 2 Craig Holgate m 2015 Open 06:43:45
3 3 Stuart Leaney m 2018 Open 06:46:03
4 4 Craig Holgate m 2013 Open 06:47:19
答案 2 :(得分:0)
我的更像亲爱的 AnilGoyal 解释,有两个轻微的修改:
library(dplyr)
library(tidyr)
df %>%
group_by(x1) %>%
mutate(id = row_number()) %>%
pivot_wider(id_cols = id, names_from = x1, values_from = x2)
# A tibble: 4 x 7
id Position Name Sex Year Category Time
<int> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 1 Jon Ellis m 2017 Open 06:37:27
2 2 2 Craig Holgate m 2015 Open 06:43:45
3 3 3 Stuart Leaney m 2018 Open 06:46:03
4 4 4 Craig Holgate m 2013 Open 06:47:19
数据:
df <- structure(list(x1 = c("Position", "Name", "Sex", "Year", "Category",
"Time", "Position", "Name", "Sex", "Year", "Category", "Time",
"Position", "Name", "Sex", "Year", "Category", "Time", "Position",
"Name", "Sex", "Year", "Category", "Time"), x2 = c("1", "Jon Ellis",
"m", "2017", "Open", "06:37:27", "2", "Craig Holgate", "m", "2015",
"Open", "06:43:45", "3", "Stuart Leaney", "m", "2018", "Open",
"06:46:03", "4", "Craig Holgate", "m", "2013", "Open", "06:47:19"
)), row.names = c(NA, -24L), class = c("tbl_df", "tbl", "data.frame"
))