R:根据另一列的数据帧拆分和订单列

时间:2016-05-26 04:36:21

标签: r sorting dataframe split

标题可能看起来有点令人困惑,所以让我看看我是否可以用一个小例子来澄清:

我有一个包含3列的数据框

   col1     col2     col3
1 A,D,C sd,dg,ds   5,26,1
2   D,F    fh,we    85,41
3     H       hr       27
4 C,A,D ds,sd,dg 235,65,3
5 Q,G,J rt,gh,we 34,98,65

我想按字母顺序排列col1的每个元素,然后根据col1中的顺序对col2和col3的每个元素进行排序,以获得:

   col1     col2     col3
1 A,C,D sd,ds,dg   5,1,26
2   D,F    fh,we    85,41
3     H       hr       27
4 A,C,D sd,ds,dg 65,235,3
5 G,J,Q gh,we,rt 98,65,34

后来我很想通过col1聚合,我需要在示例中将元素1和4相等(A,C,D)

到目前为止,我被困在这里:

MWE

my.df <- data.frame(col1=c('A,D,C','D,F','H','C,A,D','Q,G,J'), col2=c('sd,dg,ds','fh,we','hr','ds,sd,dg','rt,gh,we'), col3=c('5,26,1','85,41','27','235,65,3','34,98,65'))
my.df
my.df$col1 <- sapply(sapply(strsplit(as.character(my.df$col1), ','), sort), paste, collapse=',')
my.df

任何帮助表示赞赏!!谢谢!

3 个答案:

答案 0 :(得分:1)

你走了:

my.df <- data.frame(col1=c('A,D,C','D,F','H','C,A,D','Q,G,J'), col2=c('sd,dg,ds','fh,we','hr','ds,sd,dg','rt,gh,we'), col3=c('5,26,1','85,41','27','235,65,3','34,98,65'),stringsAsFactors = F)

for (k in 1:dim(my.df)[1]){
    tempdf <- data.frame(strsplit(my.df[k,1],","),strsplit(my.df[k,2],","),strsplit(my.df[k,3],","),stringsAsFactors = F)
    tempdf <- tempdf[order(tempdf[,1]),]
    my.df[k,] <- sapply(tempdf,paste,collapse=",")
}

如您所见,我通过逗号分隔字符串,将每行转换为临时数据帧。然后,您只需要按第一列排序临时数据框。然后从那里将每列tempdf折叠成一个字符串并将其替换为原始的my.df

结果:

> my.df
   col1     col2     col3
1 A,C,D sd,ds,dg   5,1,26
2   D,F    fh,we    85,41
3     H       hr       27
4 A,C,D sd,ds,dg 65,235,3
5 G,J,Q gh,we,rt 98,65,34

答案 1 :(得分:1)

您可以将每一行转换为数据框,根据第1列对data.frame重新排序,然后将它们全部粘贴在一起:

# split the entries by commas and
# turn each row of my.df into a data frame
# storing each data frame in a list element
dfList <- lapply(
  apply(my.df, 1, strsplit, ","),
  function(x) data.frame(x))

# sort each data frame by col1
dfSortedList <- lapply(dfList, function(x) x[with(x, order(col1)), ])

# paste columns back together and arrange as desired
t(sapply(dfSortedList, function(x) apply(x, 2, paste, collapse = ",")))

#     col1    col2       col3      
#[1,] "A,C,D" "sd,ds,dg" "5,1,26"  
#[2,] "D,F"   "fh,we"    "85,41"   
#[3,] "H"     "hr"       "27"      
#[4,] "A,C,D" "sd,ds,dg" "65,235,3"
#[5,] "G,J,Q" "gh,we,rt" "98,65,34"

如有必要,您可以转换回数据框。

答案 2 :(得分:1)

我们可以使用cSplitsplitstackshape中的data.table来完成此操作。

library(splitstackshape)
na.omit(cSplit(setDT(my.df, keep.rownames=TRUE), 2:4, ",","long"))[
        , {i1 <- order(col1)
         lapply(.SD, function(x) paste(x[i1], collapse=","))
     }, rn][, rn:= NULL][]
#   col1     col2     col3
#1: A,C,D sd,ds,dg   5,1,26
#2:   D,F    fh,we    85,41
#3:     H       hr       27
#4: A,C,D sd,ds,dg 65,235,3
#5: G,J,Q gh,we,rt 98,65,34

或稍微长一点的选项是拆分'col1'并将数据集转换为'long'格式cSplit,然后按'col2'和'col3'分组,我们创建order列('i1')和sort ed'col1'。然后,将.SDcols指定为'col2'和'col3',循环使用lapply,使用,拆分列,根据'i1'更改orderMappaste放在一起的列,并将输出(:=)分配回原始列。如果需要,将“i1”指定为NULL。

d1 <- cSplit(my.df, "col1", ",", "long")[, 
 .(i1 = list(order(col1)), col1 = toString(sort(col1))) ,.(col2, col3)]
d1[,  c('col2', 'col3') := lapply(.SD, function(x) 
  Map(function(x, y) x[y], strsplit(as.character(x), ","), d1$i1)), .SDcols = col2:col3]
d1[, i1:= NULL]
d1[, names(my.df), with = FALSE]
#     col1     col2     col3
#1: A, C, D sd,ds,dg   5,1,26
#2:    D, F    fh,we    85,41
#3:       H       hr       27
#4: A, C, D sd,ds,dg 65,235,3
#5: G, J, Q gh,we,rt 98,65,34