标题可能看起来有点令人困惑,所以让我看看我是否可以用一个小例子来澄清:
我有一个包含3列的数据框
col1 col2 col3
1 A,D,C sd,dg,ds 5,26,1
2 D,F fh,we 85,41
3 H hr 27
4 C,A,D ds,sd,dg 235,65,3
5 Q,G,J rt,gh,we 34,98,65
我想按字母顺序排列col1的每个元素,然后根据col1中的顺序对col2和col3的每个元素进行排序,以获得:
col1 col2 col3
1 A,C,D sd,ds,dg 5,1,26
2 D,F fh,we 85,41
3 H hr 27
4 A,C,D sd,ds,dg 65,235,3
5 G,J,Q gh,we,rt 98,65,34
后来我很想通过col1聚合,我需要在示例中将元素1和4相等(A,C,D)
到目前为止,我被困在这里:
MWE
my.df <- data.frame(col1=c('A,D,C','D,F','H','C,A,D','Q,G,J'), col2=c('sd,dg,ds','fh,we','hr','ds,sd,dg','rt,gh,we'), col3=c('5,26,1','85,41','27','235,65,3','34,98,65'))
my.df
my.df$col1 <- sapply(sapply(strsplit(as.character(my.df$col1), ','), sort), paste, collapse=',')
my.df
任何帮助表示赞赏!!谢谢!
答案 0 :(得分:1)
你走了:
my.df <- data.frame(col1=c('A,D,C','D,F','H','C,A,D','Q,G,J'), col2=c('sd,dg,ds','fh,we','hr','ds,sd,dg','rt,gh,we'), col3=c('5,26,1','85,41','27','235,65,3','34,98,65'),stringsAsFactors = F)
for (k in 1:dim(my.df)[1]){
tempdf <- data.frame(strsplit(my.df[k,1],","),strsplit(my.df[k,2],","),strsplit(my.df[k,3],","),stringsAsFactors = F)
tempdf <- tempdf[order(tempdf[,1]),]
my.df[k,] <- sapply(tempdf,paste,collapse=",")
}
如您所见,我通过逗号分隔字符串,将每行转换为临时数据帧。然后,您只需要按第一列排序临时数据框。然后从那里将每列tempdf折叠成一个字符串并将其替换为原始的my.df
结果:
> my.df
col1 col2 col3
1 A,C,D sd,ds,dg 5,1,26
2 D,F fh,we 85,41
3 H hr 27
4 A,C,D sd,ds,dg 65,235,3
5 G,J,Q gh,we,rt 98,65,34
答案 1 :(得分:1)
您可以将每一行转换为数据框,根据第1列对data.frame重新排序,然后将它们全部粘贴在一起:
# split the entries by commas and
# turn each row of my.df into a data frame
# storing each data frame in a list element
dfList <- lapply(
apply(my.df, 1, strsplit, ","),
function(x) data.frame(x))
# sort each data frame by col1
dfSortedList <- lapply(dfList, function(x) x[with(x, order(col1)), ])
# paste columns back together and arrange as desired
t(sapply(dfSortedList, function(x) apply(x, 2, paste, collapse = ",")))
# col1 col2 col3
#[1,] "A,C,D" "sd,ds,dg" "5,1,26"
#[2,] "D,F" "fh,we" "85,41"
#[3,] "H" "hr" "27"
#[4,] "A,C,D" "sd,ds,dg" "65,235,3"
#[5,] "G,J,Q" "gh,we,rt" "98,65,34"
如有必要,您可以转换回数据框。
答案 2 :(得分:1)
我们可以使用cSplit
和splitstackshape
中的data.table
来完成此操作。
library(splitstackshape)
na.omit(cSplit(setDT(my.df, keep.rownames=TRUE), 2:4, ",","long"))[
, {i1 <- order(col1)
lapply(.SD, function(x) paste(x[i1], collapse=","))
}, rn][, rn:= NULL][]
# col1 col2 col3
#1: A,C,D sd,ds,dg 5,1,26
#2: D,F fh,we 85,41
#3: H hr 27
#4: A,C,D sd,ds,dg 65,235,3
#5: G,J,Q gh,we,rt 98,65,34
或稍微长一点的选项是拆分'col1'并将数据集转换为'long'格式cSplit
,然后按'col2'和'col3'分组,我们创建order
列('i1')和sort
ed'col1'。然后,将.SDcols
指定为'col2'和'col3',循环使用lapply
,使用,
拆分列,根据'i1'更改order
将Map
,paste
放在一起的列,并将输出(:=
)分配回原始列。如果需要,将“i1”指定为NULL。
d1 <- cSplit(my.df, "col1", ",", "long")[,
.(i1 = list(order(col1)), col1 = toString(sort(col1))) ,.(col2, col3)]
d1[, c('col2', 'col3') := lapply(.SD, function(x)
Map(function(x, y) x[y], strsplit(as.character(x), ","), d1$i1)), .SDcols = col2:col3]
d1[, i1:= NULL]
d1[, names(my.df), with = FALSE]
# col1 col2 col3
#1: A, C, D sd,ds,dg 5,1,26
#2: D, F fh,we 85,41
#3: H hr 27
#4: A, C, D sd,ds,dg 65,235,3
#5: G, J, Q gh,we,rt 98,65,34