根据列的多个值在数据框中创建新行

时间:2016-11-10 17:00:14

标签: r dataframe

我已将我的问题调整为更具体

我已经搜索了我的问题的具体答案,但没有成功。

首先,我有一个由48个变量组成的数据框,看起来像这样:

> df

    Text                                               Screen_Name   ...  
1   a text where @Sam and @Su and @Jim are addressed   Peter
2   a text where @Eric is addressed                    Margret
3   a text where @Sarah and @Adam are addressed        John

现在我提取所有相等的字符串(" @ \ S +")并将它们存储在新列中

df$addressees <- str_extract_all(df$text, "@\\S+")

这让我:

    ...   Screen_Name   Addressees               ...  
1         Peter         c("@Sam", "@Su", "@Jim")
2         Margret       @Eric
3         John          c("@Sarah", "@Adam")

现在我想为两列创建一个新的数据框,其中每列都有新的行&#34;收件人&#34;通过重复列#34; Screen_Name&#34;:

的相应值来创建
> df

    Screen_Name  Addressees
 1  Peter        Sam
 2  Peter        Su
 3  Peter        Jim
 4  Margret      Eric
 5  John         Sarah
 6  John         Adam

我尝试过类似方法的解决方案,但似乎都没有。

非常感谢你的帮助!

3 个答案:

答案 0 :(得分:3)

好的,有一个可重复的例子:

# create df
ego <- c("peter","margaret","john")
friends <- list(c("sam","su","jim"),c("eric"),c("sarah","adam"))
df <- data.frame(ego,friends= I(friends),stringsAsFactors = F)

# use repeat function to repeat rows
times <- sapply(df$friends,length)
df <- df[rep(seq_len(nrow(df)), times),]
# assign back unlisted friends
df$friends <- unlist(friends)

答案 1 :(得分:3)

您也可以使用@raistlin创建的data.table尝试df

library(data.table)
setDT(df)[, .(friends = unlist(friends)), by = "ego"]

        ego friends
1:    peter     sam
2:    peter      su
3:    peter     jim
4: margaret    eric
5:     john   sarah
6:     john    adam

修改

现在,通过OP 提供的附加上下文,可以简化data.table解决方案以解决单行中的潜在问题。

要根据OP的请求移除@列中的前导Addressees,需要修改正则表达式以使用positive lookbehind

library(data.table)

# read data (to make it a reproducible example)
dt <- fread("Text;                                  Screen_Name 
a text where @Sam and @Su and @Jim are addressed;   Peter
a text where @Eric is addressed;                    Margret
a text where @Sarah and @Adam are addressed;        John")

# use str_extract_all with modified regex
dt[, .(Addressees = unlist(stringr::str_extract_all(Text, "(?<=@)\\S+"))), 
   by = .(Screen_Name)]

#   Screen_Name Addressees
#1:       Peter        Sam
#2:       Peter         Su
#3:       Peter        Jim
#4:     Margret       Eric
#5:        John      Sarah
#6:        John       Adam

答案 2 :(得分:0)

这有帮助吗?

输入:

Screen_Name <- c("Peter", "Margaret", "John") Addressees <- c(c("@Sam", "@Su", "@Jim"), "@Eric", c("@Sarah", "@Adam") )

tidyverse方式:

df <- data.frame(Screen_Name, Addressees) %>% tidyr::expand(Screen_Name, Addressees)