Question

创建列值为1和0的表的最佳方法是什么。

id,string
1,"x,y,z"
2,"x,z"
3,"x"

我需要一张看起来像

的桌子

id,x,y,z,a,b,c
1,1,1,1,0,0,0
2,1,0,1,0,0,0
3,1,0,0,0,0,0

此外，还预定义了字符串中所有可能唯一值的完整列表。我有一个csv，其列表看起来像

col
x
y
z
a
b
c
.
.
.

Answer 1

在SQL中，您可以使用case：

select id,
       (case when string like '%x%' then 1 else 0 end) as x,
       (case when string like '%y%' then 1 else 0 end) as y,
       . . .
from t;

根据数据库的不同，可能会有更简单的配方。此外，这假设值不重叠 - 因为它们在您的问题中。例如，“apple”和“pineapple”会引起问题。

Answer 2

您可以拆分字符串，然后使用%in%将拆分值与预定义的可能值列表中的值进行匹配。

示例：

mydf <- read.csv(text = 'id,string\n1,"x,y,z"\n2,"x,z"\n3,"x"')

matches <- c("x", "y", "z", "a", "b", "c")

cbind(mydf[1], 
      `colnames<-`(t(vapply(strsplit(as.character(mydf$string), ",", TRUE), 
                            function(x) {
                              matches %in% x
                            }, 
                            numeric(length(matches)))), 
                   matches))
#   id x y z a b c
# 1  1 1 1 1 0 0 0
# 2  2 1 0 1 0 0 0
# 3  3 1 0 0 0 0 0

Answer 3

以下是在table中使用melt和R的另一个选项。我们将,的“字符串”列拆分为list，将list元素的名称设置为“id”，将melt list设置为data.frame factor，将“值”列更改为table并添加“a”，“b”，“c”，级别，获取cbind和library(reshape2) tbl <- table(transform(melt(setNames(strsplit(df1$string, ','),df1$id)), value=factor(value, levels=c(levels(value), letters[1:3])))[2:1]) cbind(df1['id'], as.data.frame.matrix(tbl)) # id x y z a b c #1 1 1 1 1 0 0 0 #2 2 1 0 1 0 0 0 #3 3 1 0 0 0 0 0 `id'栏。

SomeModel::with('user')

Answer 4

您可以通过重塑来完成此操作。

library(dplyr)
library(stringi)
library(tidyr)

'id,string
1,"x,y,z"
2,"x,z"
3,"x"' %>% 
  read.csv(text = .) %>%
  mutate(string_split = 
           string %>% 
           stri_split_fixed(",") ) %>%
  unnest(string_split) %>%
  mutate(value = 1) %>%
  spread(string_split, value, fill = 0)

包含表示行值的列的表

4 个答案: