我有一个数据框。说,
data.frame(x = c(1, 3), y = c(5, 0), id = c("A", "B"))
现在我想复制它,所以我在同一个data.frame中有一个副本。我最终会得到类似的东西,
data.frame(x = c(1, 3, 1, 3), y = c(5, 0, 5, 0), id = c("A", "B", "A", "B"))
现在,这非常接近我想要的但是我还想附加id列,根据我想要的重复数量使它们对每一行都是唯一的(在这种情况下只有一个,但我想要很多) 。
data.frame(x = c(1, 3, 1, 3), y = c(5, 0, 5, 0), id = c("A-1", "B-1", "A-2", "B-2"))
所以,正如你所看到的那样,我可以把头包裹在制作物体的周围,但是我想继续做下去" hacky"使用基数R的代码,用dplyr复制此功能。
答案 0 :(得分:2)
所以我注意到你想用dplyr
包来做这件事。我认为使用来自group_by()
的{{1}},mutate()
和row_number()
函数的组合,您可以很好地完成这项工作。
dplyr
请记住,您现在有一个“tibble”/“分组data.frame”而不是基本的data.frame。
如果您愿意,可以很容易地将其恢复为原始data.frame。
library(dplyr)
# so you start with this data.frame:
df <- data.frame(x = c(1, 3), y = c(5, 0), id = c("A", "B"))
# to attach an exact duplication of this df to itself:
df <- rbind(df, df)
# group by id, add a second id to increment within each id group ("A", "B", etc.)
df2 <- group_by(df, id) %>%
mutate(id2 = row_number())
# paste the id and id2 together for the desired result
df2$id_combined <- paste0(df2$id, '-', df2$id2)
# inspect results
df2
# x y id id2 id_combined
# <dbl> <dbl> <fctr> <int> <chr>
# 1 1 5 A 1 A-1
# 2 3 0 B 1 B-1
# 3 1 5 A 2 A-2
# 4 3 0 B 2 B-2
df2 <- data.frame(df2, stringsAsFactors = F)
# now to remove the additional columns that were added in this process:
df2$id2 <- NULL
次重复附加到自身的其他选项:n
然后,您可以使用上面显示的# Not dplyr, but this is how I would normally handle this type of task:
df <- data.frame(x = c(1, 3), y = c(5, 0), id = c("A", "B"))
# set n equal to the number of times you want to replicate the data.frame
n <- 13
# initialize space to hold the data frames
list_dfs <- list()
# loop through, adding individual data frames to the list
for(i in 1:n) {
list_dfs[[i]] <- df
}
# combine them all with do.call
my_big_df <- do.call(rbind, list_dfs)
,group_by()
和mutate()
函数为data.frame创建新的唯一键。