Question

我的数据集包含的字符变量存在问题，这些字符变量实际上是我希望转换为数据帧的值列表。原始数据框包含数千行。

我想拆分列表对象以便将列表转换为数据帧（长格式），但我缺乏列表对象和拆分字符的一些技巧。

可重复的例子：

id <- c("112")
name <- c( "{\"dog\", \"cat\",\"attashee\"}")
value <- c("{\"21000\", \"23400\", \"26800\"}")
test <- data.frame(id, name, value)
test

我希望得到这样的结果：

id <- c("112","112","112")
name <- c( "dog", "cat","attashee")
value <- c("21000", "23400", "26800")
test1 <- data.frame(id, name, value)
test1

我想，我需要首先删除第一个和最后一个字符{和}：

test$name <- gsub("{", "", test$name, fixed=TRUE)
test$name <- gsub("}", "", test$name, fixed=TRUE)

我尝试使用这些string-split-into-list-r，convert-a-list-formatted-as-string-in-a-list和convert-a-character-variable-to-a-list-of-list，

test$name <- strsplit(test$name, ',')[[1]]

但是我收到了一条错误消息（当我尝试将其原始数据放在第一行时）："replacement has 91 rows, data has 1"。

事实是，我很遗憾，因为我需要同时转换名称和值列（我不知道如何转换甚至一列）。

所有帮助和建议都非常感谢。

Answer 1

我会解析并评估：

id <- c("112", "113")
name <- c( "{\"dog\", \"cat\",\"attashee\"}", "{\"dog\", \"cat\",\"attashee\"}")
value <- c("{\"21000\", \"23400\", \"26800\"}", "{\"21001\", \"23401\", \"26801\"}")
test <- data.frame(id, name, value)

clean_parse_eval <- function(x) {
  eval(parse(text = gsub("\\}", ")", gsub("\\{", "c\\(", x))))
}

然后我们需要一个split-apply-combine方法来为每一行执行此操作。当然，这不是很快。

library(data.table)
setDT(test)

test[, lapply(.SD, clean_parse_eval), by = id]
#    id     name value
#1: 112      dog 21000
#2: 112      cat 23400
#3: 112 attashee 26800
#4: 113      dog 21001
#5: 113      cat 23401
#6: 113 attashee 26801

显然，最好避免在开始时产生这种格式错误的数据。

Answer 2

这应该有效：

id <- c("112")
name <- c( "{\"dog\", \"cat\",\"attashee\"}")
value <- c("{\"21000\", \"23400\", \"26800\"}")
test <- data.frame(id, name, value)
test

id <- rep(test$id, length(name))
name <- gsub("\\{", '', name)
name <- gsub("\\}", '', name)
name <- gsub('"', '', name)
name <- gsub('\\s+', '', name)
name <- strsplit(name, ',')[[1]]
value <- gsub("\\{", '', value)
value <- gsub("\\}", '', value)
value <- gsub('"', '', value)
value <- gsub('\\s+', '', value)
value <- strsplit(value, ',')[[1]]
test1 <- data.frame(id, name, value)
test1

Answer 3

这可以为你完成工作，没有图书馆，行内的一些解释

id <- c("112")
name <- c( "{\"dog\", \"cat\",\"attashee\"}")
value <- c("{\"21000\", \"23400\", \"26800\"}")

convert <- function(col, isFloat = FALSE) {
  # remove the two {} characters
  col <- gsub("{", "", gsub("}", "", col, fixed=TRUE), fixed=TRUE)
  # create a vector with the right content
  ans <- eval(parse(text = paste0('c(', col, ')')))
  if (isFloat)
    unlist(lapply(ans, as.numeric))
  else
    ans
}

test <- data.frame(convert(id), convert(name), convert(value, TRUE))
# optionally you can fix the names here
names(test) <- c('id', 'name', 'value')
# final result
test

请注意，如果需要，可以使用convert函数向数据框添加更多列。

如果您的name变量例如还有1000个条目，那么请name <- do.call(paste0, name)将所有条目放入一个字符串中，然后将}{替换为, name <- gsub("}{", ",", name, fixed=TRUE)然后你属于原始情况，然后适用相同的逻辑。

Answer 4

这适用于一列。我为你的名称对象尝试了这个，它完成了工作

Single<MyData>

多个列表对象作为df中的字符变量。如何用R转换成原始df？

4 个答案: