从R中的Google云端硬盘文件中提取的数据中,我有很多列要转换成多列。存储数据的数据框的名称为 Data
现在,让我们从R中提取的数据中的以标题和前4个值开头的以下列开始:
Data$Param1
-----------
Private Bus, Private Car, Public Bus
Private Car, Private Van, Public Bus
Private Car
Private Bus, Private Car
在上面的列中,我们有四(4)个值集:
我如何将列 Data $ Param1 转换为我上面提到的4个值集中的每个元素的相应列,并且如果数据中不存在,则每个列中的值应为“ 0” $ Param1和“ 1”(如果存在于Data $ Param1中)。
赞:
Data$Param1 | Data$Param1_PrivateBus | Data$Param1_PrivateCar | Data$Param1_PrivateVan | Data$Param1_PublicBus |
Private Bus, Private Car, Public Bus | 1 | 1 | 0 | 1 |
Private Car, Private Van, Public Bus | 0 | 1 | 1 | 1 |
Private Car | 0 | 1 | 0 | 0 |
Private Car, Private Bus | 1 | 1 | 0 | 0 |
我正好要转换187个类似的具有不同值集的列。某些列具有5个值的集合,而另一些具有6、7和9个值的集合。
我正在使用R版本3.4.1。
答案 0 :(得分:0)
dplyr,stringi和reshape2将完成您需要的所有工作
install.packages("dplyr")
install.packages("stringi")
install.packages("reshape2")
library(dplyr)
library(stringi)
library(reshape2)
xx_df <- data.frame(Param1 = c("Private Bus,Private Car,Public Bus", "Private Car,Private Van,Public Bus", "Private Car", "Private Bus,Private Car")
, stringsAsFactors = F)
cbind(xx_df, stringi::stri_split_fixed(xx_df$Param1, ",", simplify = T) ) %>%
data.frame(stringsAsFactors = F) %>%
reshape2::melt(id.vars = "Param1", na.rm = T) %>%
mutate(variable = 1) %>% filter(value != '') %>%
reshape2::dcast(Param1~value, value.var = "variable", fill = 0) %>%
data.frame()
结果是
Param1 Private.Bus Private.Car Private.Van Public.Bus
1 Private Bus,Private Car 1 1 0 0
2 Private Bus,Private Car,Public Bus 1 1 0 1
3 Private Car 0 1 0 0
4 Private Car,Private Van,Public Bus 0 1 1 1