我的数据框结构如下:
Column A Column B
1 A
1 B
1 C
1 D
2 B
2 C
2 D
2 E
我想连接属于A列中特定值的所有行。
我希望最终输出看起来像这样:
Column A Column B Column C
1 A ABCD
1 B ABCD
1 C ABCD
1 D ABCD
2 B BCDE
2 C BCDE
2 D BCDE
2 E BCDE
我如何在R / Python中执行此操作?
谢谢
答案 0 :(得分:2)
在R
中,我们可以使用dplyr
。通过' ColumnA',paste
对' ColumnB'的内容进行分组后并使用mutate
library(dplyr)
df1 %>%
group_by(ColumnA) %>%
mutate(ColumnC = paste(ColumnB, collapse=""))
# A tibble: 8 x 3
# Groups: ColumnA [2]
# ColumnA ColumnB ColumnC
# <int> <chr> <chr>
#1 1 A ABCD
#2 1 B ABCD
#3 1 C ABCD
#4 1 D ABCD
#5 2 B BCDE
#6 2 C BCDE
#7 2 D BCDE
#8 2 E BCDE
或另一个选项是data.table
library(data.table)
setDT(df1)[, ColumnC := paste(ColumnB, collapse=""), by = ColumnA]
df1 <- structure(list(ColumnA = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), ColumnB = c("A",
"B", "C", "D", "B", "C", "D", "E")), .Names = c("ColumnA", "ColumnB"
), class = "data.frame", row.names = c(NA, -8L))
如果我们需要python
,那么
>>> import pandas as pd;
>>> df1 = pd.read_clipboard()
>>> df1
# ColumnA ColumnB
#1 1 A
#2 1 B
#3 1 C
#4 1 D
#5 2 B
#6 2 C
#7 2 D
#8 2 E
>>> df1['ColumnC'] = df1.groupby('ColumnA')['ColumnB'].transform(lambda x: ''.join(x))
>>> df1
# ColumnA ColumnB ColumnC
#1 1 A ABCD
#2 1 B ABCD
#3 1 C ABCD
#4 1 D ABCD
#5 2 B BCDE
#6 2 C BCDE
#7 2 D BCDE
#8 2 E BCDE
答案 1 :(得分:1)
@Sotos在评论中建议的基础R
中的单行内容。对于此解决方案,请确保ColumnB
df
为character
而非factor
。
with(df, ave(ColumnB, ColumnA, FUN = function(i) paste(i, collapse = '')))
另一个基础R
解决方案:
df$ColumnC<-rep(unlist(by(df,INDICES = df$ColumnA,
function(t){paste(t$ColumnB,collapse = "")},simplify = F)),each=4)
>df
#ColumnA ColumnB ColumnC
#1 1 a abcd
#2 1 b abcd
#3 1 c abcd
#4 1 d abcd
#5 2 b bcde
#6 2 c bcde
#7 2 d bcde
#8 2 e bcde