如何将数据框中的每个列拆分为两列?

时间:2015-05-05 13:20:23

标签: r loops rbind strsplit

我有一个类似这样的数据框(4行和5列):

Marker ind1 ind2 ind3 ind4
mark1             CT             TT             CT             TT
mark2             AG             AA             AG             AA
mark3             AC             AA             AC             AA
mark4             CT             TT             CT             TT

我想要做的是将每个列(第一个coloumn除外)拆分为两列。所以输出应该是这样的(4行和9列):

Marker ind1 ind1 ind2 ind2 ind3 ind3 ind4 ind4
mark1             C T             T T             C T             T T
mark2             A G             A A             A G             A A
mark3             A C             A A             A C             A A
mark4             C T             T T             C T             T T

我知道如何分割一列

do.call(rbind,strsplit(test$JRP4RA6119.039, ""))

给出了这个:

      [,1] [,2]
 [1,] "C"  "T" 
 [2,] "A"  "G" 
 [3,] "A"  "C" 
 [4,] "C"  "T" 

我想要的是能够循环这个并为一个数据帧中的所有列创建它。

提前致谢。

3 个答案:

答案 0 :(得分:5)

我觉得它有点牵强,但是:

test_split <- data.frame(Marker=test$Marker, 
                         do.call("cbind", lapply(apply(test[, -1], 2, strsplit, ""), 
                                                 function(x) do.call("rbind", x))), 
                         stringsAsFactors=F)
colnames(test_split)[-1] <- paste(rep(colnames(test)[-1], e=2), 1:2, sep="_")

test_split
#      Marker JRP4RA6119.039_1 JRP4RA6119.039_2 JRP4RA6124.029_1 JRP4RA6124.029_2 JRP4RA6133.051_1 JRP4RA6133.051_2 JRP4RA6125.009_1 JRP4RA6125.009_2
#1 s7e4419xxx                C                T                T                T                C                T                T                T
#2 s7e7001s01                A                G                A                A                A                G                A                A
#3 s7e3049xxx                A                C                A                A                A                C                A                A
#4 s7e4727xxx                C                T                T                T                C                T                T                T

答案 1 :(得分:5)

您还可以尝试final View parentView = v.findViewById(R.id.touch_delegate_linear_layout); final View directChildView = v.findViewById(R.id.footer_value_text_view); ViewTreeObserver vto = parentView.getViewTreeObserver(); vto.addOnGlobalLayoutListener(new ViewTreeObserver.OnGlobalLayoutListener() { @SuppressLint("NewApi") @SuppressWarnings("deprecation") @Override public void onGlobalLayout() { final Rect r = new Rect(); final Rect directChild = new Rect(); currencyExchangeLinearLayout.getHitRect(r); directChildView.getHitRect(directChild); int bestHeight = Utils.dpToPixel(48); directChild.top = directChild.bottom - Math.max((bestHeight - r.height()), 0); parentView.setTouchDelegate(new TouchDelegate(directChild, currencyExchangeLinearLayout)); ViewTreeObserver obs = parentView.getViewTreeObserver(); if (android.os.Build.VERSION.SDK_INT >= android.os.Build.VERSION_CODES.JELLY_BEAN) { obs.removeOnGlobalLayoutListener(this); } else { obs.removeGlobalOnLayoutListener(this); } } });

中的cSplit_f
splitstackshape

或者正如@Ananda Mahto建议的那样,library(splitstackshape) df1[-1] <- lapply(df1[-1] , function(x) gsub('(?<=\\w)(?=\\w)', ',', x, perl=TRUE)) cSplit_f(df1, 2:ncol(df1), sep=',') # Marker ind1_1 ind1_2 ind2_1 ind2_2 ind3_1 ind3_2 ind4_1 ind4_2 #1: mark1 C T T T C T T T #2: mark2 A G A A A G A A #3: mark3 A C A A A C A A #4: mark4 C T T T C T T T 在大型数据集上可能更有效,而且可以直接使用它而无需更改分隔符。

cSplit

或使用cSplit(df1, names(df1)[-1], sep="", stripWhite = FALSE) # Marker ind1_1 ind1_2 ind2_1 ind2_2 ind3_1 ind3_2 ind4_1 ind4_2 #1: mark1 C T T T C T T T #2: mark2 A G A A A G A A #3: mark3 A C A A A C A A #4: mark4 C T T T C T T T

中的tstrsplit
data.table

数据

library(data.table)#v1.9.5+
setDT(df1)
cbind(Marker=df1$Marker,df1[, unlist(lapply(.SD, function(x)
        tstrsplit(x, '')), recursive=FALSE), .SDcols=-1])
#   Marker ind11 ind12 ind21 ind22 ind31 ind32 ind41 ind42
#1:  mark1     C     T     T     T     C     T     T     T
#2:  mark2     A     G     A     A     A     G     A     A
#3:  mark3     A     C     A     A     A     C     A     A
#4:  mark4     C     T     T     T     C     T     T     T

答案 2 :(得分:0)

> b <- as.data.frame(a[, 1])
> b[, 2] <- substr(a[, 2], 1, 1)
> b[, 3] <- substr(a[, 2], 2, 2)
> b[, 4] <- substr(a[, 3], 1, 1)
> b[, 5] <- substr(a[, 3], 2, 2)
> b[, 6] <- substr(a[, 4], 1, 1)
> b[, 7] <- substr(a[, 4], 2, 2)
> b[, 8] <- substr(a[, 5], 1, 1)
> b[, 9] <- substr(a[, 5], 2, 2)
> head(b)
  a[, 1] V2 V3 V4 V5 V6 V7 V8 V9
1  mark1  C  T  T  T  C  T  T  T
2  mark2  A  G  A  A  A  G  A  A
3  mark3  A  C  A  A  A  C  A  A
4  mark4  C  T  T  T  C  T  T  T
> dim(b)
[1] 4 9
> names(b) <- c("Marker", "ind1", "ind1","ind2", "ind2", "ind3", "ind3", "ind4", "ind4")
> head(b)
  Marker ind1 ind1 ind2 ind2 ind3 ind3 ind4
1  mark1    C    T    T    T    C    T    T
2  mark2    A    G    A    A    A    G    A
3  mark3    A    C    A    A    A    C    A
4  mark4    C    T    T    T    C    T    T
  ind4
1    T
2    A
3    A
4    T
> 

你可以很容易地将它变成一个循环,但是我没有必要使用相对较少的列。

要将其设置为循环,只需将其设置为

即可
for(i in 2:ncol(a)){
}