Tidyr分开col参考

时间:2017-08-10 11:40:17

标签: r tidyr

我有单独的功能和列引用的问题。我清理了一些数据,并且根据来源,字数和字数不同。这里的例子:

#First find what is max white space count in each column
col.words <- apply(example, 2, function(x) max(sapply(strsplit(x, "\\s+"), length)))
# Cols to separate (those columns that have more than 1 word)
cols <- col.words[col.words > 1]
#Use Separate to split column into multiple columns
example %>% separate(col=X1, into = paste0("N", 1:cols[1]), sep = "\\s+")

   N1 N2       N3       N4       N5  X2            X3    X4     X5
5   1 82     DOLL Benedikt     <NA> GER 0 0 0 23:27.4   0.0 60 160
6   2 96      BOE Johannes Thingnes NOR 0 0 0 23:28.1  +0.7 54 154
7   3  4 FOURCADE   Martin     <NA> FRA 1 1 2 23:50.5 +23.1 48 148
8   4 77   BAILEY   Lowell     <NA> USA 0 0 0 23:56.9 +29.5 43 143
9   5 81  MORAVEC   Ondrej     <NA> CZE 0 1 1 23:58.1 +30.7 40 140
10  6 40     ANEV Krasimir     <NA> BUL 0 0 0 24:00.9 +33.5 38 138

这里的问题是根据数据源分隔更改的列。我想使用类似的东西:

example %>% separate(col= names(cols)[1], into = paste0("N", 1:cols[1]), sep = "\\s+")

所以我可以循环,colnames或count可能会改变。以下示例数据。

#DATA
> dput(example)
structure(list(X1 = c("1 82 DOLL Benedikt", "2 96 BOE Johannes Thingnes", 
"3 4 FOURCADE Martin", "4 77 BAILEY Lowell", "5 81 MORAVEC Ondrej", 
"6 40 ANEV Krasimir"), X2 = c("GER", "NOR", "FRA", "USA", "CZE", 
"BUL"), X3 = c("0 0 0 23:27.4", "0 0 0 23:28.1", "1 1 2 23:50.5", 
"0 0 0 23:56.9", "0 1 1 23:58.1", "0 0 0 24:00.9"), X4 = c("0.0", 
"+0.7", "+23.1", "+29.5", "+30.7", "+33.5"), X5 = c("60 160", 
"54 154", "48 148", "43 143", "40 140", "38 138")), .Names = c("X1", 
"X2", "X3", "X4", "X5"), row.names = 5:10, class = "data.frame")

0 个答案:

没有答案