我在data.frame的一个名为services
的列中有以下以逗号分隔的数据。
> dput(structure(df$services[1:5]))
list("Global Expense Management, Company Privacy Policy", "Removal Services, Global Expense Management",
"Removal Services, Exception & Cost Admin, Global Cost Estimate, Company Privacy Policy",
"Removal Services, Exception & Cost Admin, Ancillary Services, Global Cost Estimate, Global Expense Management, Perm Storage, Company Privacy Policy",
"Global Expense Management, Company Privacy Policy")
我想将此数据转换为数据框中的单独列,如果该行包含服务,则在该服务列下设置TRUE。否则,将值设置为FALSE。
例如,如果我希望我的数据框看起来像这样:
GlobalExpenseManagement | CompanyPrivacyPolicy | etc...
TRUE TRUE
TRUE FALSE
FALSE TRUE
我假设我必须拆分逗号sep值,将它们分组以删除重复项,然后将它们作为names(df)
添加到我的数据帧中。但是,如果行包含该服务,我不知道如何迭代数据集并设置true / false。
有没有人有任何好的想法必须这样做?
我现在正在尝试将新矩阵与现有数据框合并,以用新列对应的服务替换服务。我根据@ plafort的好答案尝试了这个:
names(df) <- headnames
rbind(mat, df)
然而,我收到此错误:
名称错误(df)&lt; - headnames:'names'属性[178]必须是 与矢量[7]
的长度相同
我也试过这个:
final <- data.frame(cbind(mat, df))
但是,似乎缺少df
的列。如何合并mat
到df
的列?
答案 0 :(得分:3)
尝试:
splitup <- sapply(unlist(lst), strsplit, ', ')
headnames <- unique(unlist(splitup))
(mat <- t(unname(sapply(splitup, function(x) headnames %in% x))))
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] TRUE TRUE FALSE FALSE FALSE FALSE FALSE
[2,] TRUE FALSE TRUE FALSE FALSE FALSE FALSE
[3,] FALSE TRUE TRUE TRUE TRUE FALSE FALSE
[4,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[5,] TRUE TRUE FALSE FALSE FALSE FALSE FALSE
我们首先用逗号分割数据,然后使用unlist
直接访问元素。如您所述,headnames
会查找唯一的类别标题。最后一行首先将标题类别与每个列表项匹配,然后使用unname
删除自动命名,并将数据转换回我们对t
的喜好。
要在顶部添加名称,我们使用函数colnames
指定先前定义为列标题的唯一名称。该顺序正确,因为这是用于进行行观察的headnames
向量。
colnames(mat) <- headnames
Global Expense Management Company Privacy Policy
[1,] TRUE TRUE
[2,] TRUE FALSE
[3,] FALSE TRUE
[4,] TRUE TRUE
[5,] TRUE TRUE...
答案 1 :(得分:2)
我会从我的“splitstackshape”包中考虑cSplit_e
。结果是二进制“1”和“0”而不是TRUE
和FALSE
,但这应该很容易转换。
示例数据:
df <- data.frame(services = I(
list("Global Expense Management, Company Privacy Policy", "Removal Services, Global Expense Management",
"Removal Services, Exception & Cost Admin, Global Cost Estimate, Company Privacy Policy",
"Removal Services, Exception & Cost Admin, Ancillary Services, Global Cost Estimate, Global Expense Management, Perm Storage, Company Privacy Policy",
"Global Expense Management, Company Privacy Policy")))
将“服务”列转换为vector
而不是list
:
df$services <- unlist(df$services)
现在分开了:
library(splitstackshape)
cSplit_e(df, "services", ",", type = "character", fill = 0)
## services
## 1 Global Expense Management, Company Privacy Policy
## 2 Removal Services, Global Expense Management
## 3 Removal Services, Exception & Cost Admin, Global Cost Estimate, Company Privacy Policy
## 4 Removal Services, Exception & Cost Admin, Ancillary Services, Global Cost Estimate, Global Expense Management, Perm Storage, Company Privacy Policy
## 5 Global Expense Management, Company Privacy Policy
## services_Ancillary Services services_Company Privacy Policy services_Exception & Cost Admin
## 1 0 1 0
## 2 0 0 0
## 3 0 1 1
## 4 1 1 1
## 5 0 1 0
## services_Global Cost Estimate services_Global Expense Management services_Perm Storage
## 1 0 1 0
## 2 0 1 0
## 3 1 0 0
## 4 1 1 1
## 5 0 1 0
## services_Removal Services
## 1 0
## 2 1
## 3 1
## 4 1
## 5 0