连接数据框中的行

时间:2017-07-25 04:23:15

标签: python r concatenation

我的数据框结构如下:

Column A  Column B

1          A  
1          B  
1          C  
1          D  
2          B  
2          C  
2          D  
2          E 

我想连接属于A列中特定值的所有行。

我希望最终输出看起来像这样:

Column A Column B Column C  
1        A        ABCD    
1        B        ABCD  
1        C        ABCD  
1        D        ABCD  
2        B        BCDE  
2        C        BCDE  
2        D        BCDE  
2        E        BCDE   

我如何在R / Python中执行此操作?

谢谢

2 个答案:

答案 0 :(得分:2)

R中,我们可以使用dplyr。通过' ColumnA',paste对' ColumnB'的内容进行分组后并使用mutate

创建一个新列
library(dplyr)
df1 %>%
     group_by(ColumnA) %>% 
     mutate(ColumnC = paste(ColumnB, collapse=""))
# A tibble: 8 x 3
# Groups:   ColumnA [2]
#  ColumnA ColumnB ColumnC
#    <int>   <chr>   <chr>
#1       1       A    ABCD
#2       1       B    ABCD
#3       1       C    ABCD
#4       1       D    ABCD
#5       2       B    BCDE
#6       2       C    BCDE
#7       2       D    BCDE
#8       2       E    BCDE

或另一个选项是data.table

library(data.table)
setDT(df1)[,  ColumnC := paste(ColumnB, collapse=""), by = ColumnA]

数据

df1 <- structure(list(ColumnA = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), ColumnB = c("A", 
 "B", "C", "D", "B", "C", "D", "E")), .Names = c("ColumnA", "ColumnB"
 ), class = "data.frame", row.names = c(NA, -8L))

如果我们需要python,那么

>>> import pandas as pd;
>>> df1 = pd.read_clipboard()
>>> df1
#   ColumnA ColumnB
#1        1       A
#2        1       B
#3        1       C
#4        1       D
#5        2       B
#6        2       C
#7        2       D
#8        2       E
>>> df1['ColumnC'] = df1.groupby('ColumnA')['ColumnB'].transform(lambda x: ''.join(x))
>>> df1
#   ColumnA ColumnB ColumnC
#1        1       A    ABCD
#2        1       B    ABCD
#3        1       C    ABCD
#4        1       D    ABCD
#5        2       B    BCDE
#6        2       C    BCDE
#7        2       D    BCDE
#8        2       E    BCDE

答案 1 :(得分:1)

@Sotos在评论中建议的基础R中的单行内容。对于此解决方案,请确保ColumnB dfcharacter而非factor

with(df, ave(ColumnB, ColumnA, FUN = function(i) paste(i, collapse = '')))

另一个基础R解决方案:

df$ColumnC<-rep(unlist(by(df,INDICES = df$ColumnA,
function(t){paste(t$ColumnB,collapse = "")},simplify = F)),each=4)

>df
#ColumnA ColumnB ColumnC
#1       1       a    abcd
#2       1       b    abcd
#3       1       c    abcd
#4       1       d    abcd
#5       2       b    bcde
#6       2       c    bcde
#7       2       d    bcde
#8       2       e    bcde