我有单独的功能和列引用的问题。我清理了一些数据,并且根据来源,字数和字数不同。这里的例子:
#First find what is max white space count in each column
col.words <- apply(example, 2, function(x) max(sapply(strsplit(x, "\\s+"), length)))
# Cols to separate (those columns that have more than 1 word)
cols <- col.words[col.words > 1]
#Use Separate to split column into multiple columns
example %>% separate(col=X1, into = paste0("N", 1:cols[1]), sep = "\\s+")
N1 N2 N3 N4 N5 X2 X3 X4 X5
5 1 82 DOLL Benedikt <NA> GER 0 0 0 23:27.4 0.0 60 160
6 2 96 BOE Johannes Thingnes NOR 0 0 0 23:28.1 +0.7 54 154
7 3 4 FOURCADE Martin <NA> FRA 1 1 2 23:50.5 +23.1 48 148
8 4 77 BAILEY Lowell <NA> USA 0 0 0 23:56.9 +29.5 43 143
9 5 81 MORAVEC Ondrej <NA> CZE 0 1 1 23:58.1 +30.7 40 140
10 6 40 ANEV Krasimir <NA> BUL 0 0 0 24:00.9 +33.5 38 138
这里的问题是根据数据源分隔更改的列。我想使用类似的东西:
example %>% separate(col= names(cols)[1], into = paste0("N", 1:cols[1]), sep = "\\s+")
所以我可以循环,colnames或count可能会改变。以下示例数据。
#DATA
> dput(example)
structure(list(X1 = c("1 82 DOLL Benedikt", "2 96 BOE Johannes Thingnes",
"3 4 FOURCADE Martin", "4 77 BAILEY Lowell", "5 81 MORAVEC Ondrej",
"6 40 ANEV Krasimir"), X2 = c("GER", "NOR", "FRA", "USA", "CZE",
"BUL"), X3 = c("0 0 0 23:27.4", "0 0 0 23:28.1", "1 1 2 23:50.5",
"0 0 0 23:56.9", "0 1 1 23:58.1", "0 0 0 24:00.9"), X4 = c("0.0",
"+0.7", "+23.1", "+29.5", "+30.7", "+33.5"), X5 = c("60 160",
"54 154", "48 148", "43 143", "40 140", "38 138")), .Names = c("X1",
"X2", "X3", "X4", "X5"), row.names = 5:10, class = "data.frame")