我有一个数据帧(df),要分离十个变量(var1,var2,var3,...,var10)。变量的值具有以下形式:value_text。我想应用tidyr :: separate,实际上我做了十次,每个变量一次。
tidyr::separate(col=var1,into=c("value1","text1"),extra="merge")%>%
tidyr::separate(col=var2,into=c("value2","text2"),extra="merge")%>%
...
你知道吗?更优雅"使用tidyr的方法::一次分开(不分离10次)?
答案 0 :(得分:1)
目前我能想到的最好的事情是这样的:
library(tidyverse)
mydf %>%
gather(var, val, everything()) %>% # Gather all the columns into a key-value pair
separate(val, into = c("value", "text")) %>% # Separate the gathered columns
gather(key, val, value, text) %>% # Gather again so you have key-key-value
unite(cn, var, key) %>% # Unite your keys to become the column names
group_by(cn) %>% # Keys are duplicated; we need to know their origins
mutate(rn = sequence(n())) %>% # .. so we add row numbers
spread(cn, val) # Then we convert to the wide form
# # A tibble: 5 × 21
# rn var1_text var1_value var10_text var10_value var2_text var2_value var3_text var3_value var4_text var4_value
# * <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 1 POIL 235 ZHKV 555 QVWK 479 SKCY 454 YCTY 704
# 2 2 NENB 928 CJLE 956 JXQT 379 HPCV 186 VIQY 764
# 3 3 HFHX 966 PZZE 622 PXHE 261 IUGF 717 YIGE 842
# 4 4 LUWK 021 OJRH 741 XNWE 230 NGEN 486 INYN 003
# 5 5 BISI 637 MEJS 718 TSYO 383 ODGS 755 GBKW 564
# # ... with 10 more variables: var5_text <chr>, var5_value <chr>, var6_text <chr>, var6_value <chr>,
# # var7_text <chr>, var7_value <chr>, var8_text <chr>, var8_value <chr>, var9_text <chr>, var9_value <chr>
# Warning message:
# attributes are not identical across measure variables; they will be dropped
以下是我使用的示例数据:
library(stringi)
set.seed(1)
mydf <- data.frame(matrix(sprintf("%s_%s", stri_rand_strings(50, 3, "[0-9]"),
stri_rand_strings(50, 4, "[A-Z]")), ncol = 10,
dimnames = list(NULL, paste0("var", 1:10))))
mydf
# var1 var2 var3 var4 var5 var6 var7 var8 var9 var10
# 1 235_POIL 479_QVWK 454_SKCY 704_YCTY 924_JDJQ 883_TYMP 206_BCJE 214_FDZI 944_DFVS 555_ZHKV
# 2 928_NENB 379_JXQT 186_HPCV 764_VIQY 362_KRRO 794_MCGM 877_HEGE 959_NRCD 174_GMCJ 956_CJLE
# 3 966_HFHX 261_PXHE 717_IUGF 842_YIGE 470_LLHP 733_JYNI 448_MUAN 734_BYRC 522_ZQRI 622_PZZE
# 4 021_LUWK 230_XNWE 486_NGEN 003_INYN 838_XDKF 727_HUSE 663_WJBD 107_MMJZ 550_KZWY 741_OJRH
# 5 637_BISI 383_TSYO 755_ODGS 564_GBKW 334_LDLY 121_BTQE 296_IEDF 146_EVBK 069_VUGT 718_MEJS
注意:这似乎与“tidyverse”套装的作者推荐的完全相反的数据格式。
如上所述,另一种方法是使用我的“splitstackshape”包中的cSplit
。有了这个,问题很容易解决如下:
library(splitstackshape)
cSplit(mydf, names(mydf), "_", type.convert = FALSE)
## var1_1 var1_2 var2_1 var2_2 var3_1 var3_2 var4_1 var4_2 var5_1 var5_2 var6_1 var6_2 var7_1 var7_2 var8_1 var8_2
## 1: 235 POIL 479 QVWK 454 SKCY 704 YCTY 924 JDJQ 883 TYMP 206 BCJE 214 FDZI
## 2: 928 NENB 379 JXQT 186 HPCV 764 VIQY 362 KRRO 794 MCGM 877 HEGE 959 NRCD
## 3: 966 HFHX 261 PXHE 717 IUGF 842 YIGE 470 LLHP 733 JYNI 448 MUAN 734 BYRC
## 4: 021 LUWK 230 XNWE 486 NGEN 003 INYN 838 XDKF 727 HUSE 663 WJBD 107 MMJZ
## 5: 637 BISI 383 TSYO 755 ODGS 564 GBKW 334 LDLY 121 BTQE 296 IEDF 146 EVBK
## var9_1 var9_2 var10_1 var10_2
## 1: 944 DFVS 555 ZHKV
## 2: 174 GMCJ 956 CJLE
## 3: 522 ZQRI 622 PZZE
## 4: 550 KZWY 741 OJRH
## 5: 069 VUGT 718 MEJS
如果你想坚持使用tidyverse,另一个选择是使用for
循环。
for (i in names(mydf)) mydf <- separate_(mydf, i, paste0(i, c("_text", "_value")))