我正在尝试修改以下R数据帧:
Column1 Column2 Value1 Value2
Parent1 Child1 3 12
Parent1 Child2 4 12
Parent1 Child3 5 12
Parent2 Child4 2 9
Parent2 Child5 6 9
Parent2 Child6 1 9
我想将“ Parent”项放置在“ Child”项上方,并将值从“ Value2”移到“ Value1”。新数据框将如下所示:
Column2 Value1
Parent1 12
Child1 3
Child2 4
Child3 5
Parent2 9
Child4 2
Child5 6
Child6 1
这可以使用dplyr完成吗?另外,是否可以在“孩子”条目中添加任何额外的空格?
感谢您的见解。
答案 0 :(得分:4)
准备数据
library(tidyverse)
data <- read_delim(
"Column1 Column2 Value1 Value2
Parent1 Child1 3 12
Parent1 Child2 4 12
Parent1 Child3 5 12
Parent2 Child4 2 9
Parent2 Child5 6 9
Parent2 Child6 1 9",delim = " "
) %>%
mutate_all(~str_remove_all(.x," "))
colnames(data) <- str_remove_all(colnames(data)," ")
使用tidyr::nest()
“清除”数据,以便我们可以逐行迭代数据。
nested_data <- data %>%
group_by(Column1,Value2) %>%
nest()
> nested_data
# A tibble: 2 x 3
Column1 Value2 data
<chr> <chr> <list>
1 Parent1 12 <tibble [3 x 2]>
2 Parent2 9 <tibble [3 x 2]>
然后使用pmap_df()
构造所需的输出。
pmap_df(nested_data,function(...){
values = list(...)
bind_rows(
tibble(
Column2 = values$Column1,
Value1 = values$Value2
)
,
values$data %>%
mutate(Column2 = paste0(" ",Column2)) # add white space
)
})
# A tibble: 8 x 2
Column2 Value1
<chr> <chr>
1 Parent1 12
2 " Child1" 3
3 " Child2" 4
4 " Child3" 5
5 Parent2 9
6 " Child4" 2
7 " Child5" 6
8 " Child6" 1
答案 1 :(得分:2)
这是dplyr
中的另一种方式。如果需要,可以删除group
列,并使arrange
逻辑更健壮。 -
df %>%
mutate(group = group_indices(., Column1)) %>%
{bind_rows(
distinct(., Column = Column1, Value = Value2, group),
select(., Column = Column2, Value = Value1, group) %>%
mutate(Column = paste0(" ", Column))
)} %>%
arrange(group, desc(Column))
# A tibble: 8 x 3
Column Value group
<chr> <int> <int>
1 Parent1 12 1
2 " Child3" 5 1
3 " Child2" 4 1
4 " Child1" 3 1
5 Parent2 9 2
6 " Child6" 1 2
7 " Child5" 6 2
8 " Child4" 2 2
数据-
df <- structure(list(Column1 = c("Parent1", "Parent1", "Parent1", "Parent2",
"Parent2", "Parent2"), Column2 = c("Child1", "Child2", "Child3",
"Child4", "Child5", "Child6"), Value1 = c(3L, 4L, 5L, 2L, 6L,
1L), Value2 = c(12L, 12L, 12L, 9L, 9L, 9L)), .Names = c("Column1",
"Column2", "Value1", "Value2"), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
答案 2 :(得分:1)
这是一个data.table
解决方案:
library(data.table)
DT[, GRP := .GRP, by = Column1]
DT[, ID := .I]
DT_bind <- rbindlist(list(DT[, .(Value1 = first(Value2), .GRP, ID = NA_integer_), by = .(Column2 = Column1)]
,DT[, .(Column2, Value1, GRP, ID)]))
setorder(DT_bind, GRP, ID)
DT_bind[, .(Column2, Value1)]
Column2 Value1
1: Parent1 12
2: Child1 3
3: Child2 4
4: Child3 5
5: Parent2 9
6: Child4 2
7: Child5 6
8: Child6 1