复制数据框并替换值

时间:2018-03-13 16:48:55

标签: r dataframe dplyr plyr gsub

如果我有两个数据框:

Df1:
Name1 Name2 Destination1
  A     I       London
  B     J       Paris
  C     K       New York
  D     L       Bangkok
  E     M       Singapore

Df2:
Theme      Pattern
Luxury      luxury hotels in {d} 
City        city hotels {d}
Break        breaks in {d} 
Package      {d} packages

基本上,我想要一个新的数据框,其中对于Df1中的每个destination1,我都有来自Df2的每个模式,同时保留Df1中的Theme列和Name 1 Name 2列。

E.g。期望的输出:

Df3:
Name 1      Name 2     Destination 1  Theme     Pattern
A            I            London      Luxury     luxury hotels in {London} 
A            I            London      City       city hotels {London}
A            I            London      Break       breaks in {London} 
A            I            London      Packages    {London} packages
B            J            Paris       Luxury       luxury hotels in {Paris} 
B            J            Paris       City         city hotels {Paris}
B            J            Paris       Break        breaks in {Paris} 
B            J            Paris       Packages     {Paris} packages
C etc....

3 个答案:

答案 0 :(得分:1)

您可以使用dplyr和tidyr解决方案:首先,将Df2重新整形为宽格式并使用Df1重塑cbind;然后收集到原来的长格式。然后使用带有正则表达式的gsub将{d}替换为目标。

library(dplyr)
library(tidyr)

Df1 <- data.frame(name1 = LETTERS[1:5],
                  name2 = LETTERS[9:13],
                  Destination1 = c("London", "Paris", "New York", "Bangkok", "Singapore")
                  )

Df2 <- data.frame(Theme = c("Luxury", "City", "Break", "Package"),
                  Pattern = c("Luxury hotels in {d}",
                          "City hotels in {d}",
                          "Breaks in {d}",
                          "{d} packages")
                 )

Df3 <- Df1 %>% 
  # reshape Df2 to wide format and combine it with Df1
  cbind(spread(data = Df2, key = Theme, value = Pattern)) %>%
  # convert back to long format
  gather(key = Theme, value = Pattern, Break:Package) %>%
  # replace {d} with Destination
  mutate(Pattern = gsub(pattern = "\\{d\\}",
                        replacement = Destination1,
                        x = Pattern))

答案 1 :(得分:0)

您可以为每个数据集创建一个新变量,然后在连接后将其删除。你可以在下面做。

library(dplyr)
Df1$new <- "lol"

Df2$new <- "lol"

Df3 <- full_join(Df1,Df2) %>% select(-new)


**example:
df1 <- data.frame(a=c(1:5),b=c(7:11))

df2 <- data.frame(c=c(12:16),d=c(17:21))

df1$new <- "lol"
df2$new <- "lol"
library(dplyr)

full_join(df1,df2) %>% select(-new)**

答案 2 :(得分:0)

不完全相同的数据(您应该提供生成数据的代码),但这可以满足您的需求!虽然不是很优雅但我必须承认......

A=data.frame(c1=c("A", "B", "C"), c2=c("london", "paris", "berlin"))
B=data.frame(c3=c("a", "b", "c"), c4=c("la{d}", "{d}lala", "lala{d}la"))
# aggregate the df
AB <- data.frame(c1=rep(A$c1, nrow(B)), c2=rep(A$c2, nrow(B)), 
                 c3=rep(B$c3, each=nrow(A)), c4=rep(B$c4, each=nrow(A)))
# change {d} in city names
AB$c4 <- sapply(1:nrow(AB), function(x) gsub("\\{d\\}", 
                                        paste(" ", AB[x,"c2"], " "), AB[x,"c4"])) 
# regroup by city names
AB <- AB[order(AB$c2),] 
AB # enjoy