Question

我有一个包含100列的数据集（名称为Col_1，Col_2 ... Col_100），结果如下：“A”，“B”，“C”......我不知道有多少不同所有数据集中都有字符。我正在尝试将每个值转换为一个列，使其具有如下矩阵：

A   B   C   D
0   1   0   1
1   1   0   1

我正在尝试这个：

library(reshape2)
train <- read.csv("train.csv",head=TRUE,sep=",")
train

recast(train, id ~ value, id.var = 1, fun.aggregate = function(x) (length(x) > 0) + 0L)

但我收到以下错误：

Error in eval(substitute(expr), envir, enclos) : 
  n must be a positive integer
In addition: Warning messages:
1: attributes are not identical across measure variables; they will be dropped 
2: In split_indices(.group, .n) :
  NAs introduced by coercion to integer range

我可以做什么来退回我想要的桌子？

Answer 1

也许这就是你要找的东西。第一步收集可能的值。第二步使每个变量都知道潜在的值。这允许table在缺少特定值时产生0计数，以便rbind构造正确的输出。

# collect all possible values
allLevels <- levels(unlist(sapply(df, unique)))
# provide all levels to each variable in the data.frame
dfNew <- data.frame(lapply(df, function(i) factor(i, levels=allLevels)))

# produce the count for each variable
do.call(rbind, lapply(dfNew, table))
  a b c d e g i j
x 3 2 8 2 0 0 0 0
y 0 0 2 4 4 1 3 1

数据

set.seed(1234) df <- data.frame(x=sample(letters[1:4], 15, replace=TRUE), y=sample(letters[3:10], 15, replace=TRUE))

将多项式转换为二项式 - 数千列

1 个答案: