R将原始数据转换为字符

时间:2016-03-08 15:35:17

标签: python r mongodb lapply data-cleaning

我尝试使用来自mongodb的R load数据和包“mongolite”,代码如下:

df <- db$find('{}', '{"CurrentId":1,"_id":0}')

我要提取集合的"CurrentId",变量"CurrentId"是mongodb中的ObjectId,它可能包含多个ObjectId。

并且df看起来像这样:

[[1]]
list()

[[2]]
list()

[[3]]
list()

[[4]]
list()

[[5]]
list()

[[6]]
[[6]][[1]]
[1] 56 cd 5f 02 b8 9b 5b d0 26 cb 39 c9

[[6]][[2]]
[1] 56 cd 6c 13 b8 9b 5b d0 26 cb 39 d5

[[6]][[3]]
[1] 56 cd 6f c6 b8 9b 5b d0 26 cb 39 de

df[[6]][[1]]是:

 [1] 56 cd 5f 02 b8 9b 5b d0 26 cb 39 c9

typeof(df[[6]][[1]])的类型是:

 [1] "raw"

我使用paste(dc3[[6]][[1]],collapse = '')将原始类型转换为字符串,就像mongodb ObjectId格式一样:

 [1] "56cd5f02b89b5bd026cb39c9"

然后我尝试将df中的所有原始数据转换为string,如上所述。所以我使用sapply函数:

sapply(df, function(x) paste(as.character(x),collapse = ''))

得到了这个:

[1] ""                                                                                                                                                                                                                                                   
[2] ""                                                                                                                                                                                                                                                   
[3] ""                                                                                                                                                                                                                                                   
[4] ""                                                                                                                                                                                                                                                   
[5] ""                                                                                                                                                                                                                                                   
[6] "as.raw(c(0x56, 0xcd, 0x5f, 0x02, 0xb8, 0x9b, 0x5b, 0xd0, 0x26, 0xcb, 0x39, 0xc9))as.raw(c(0x56, 0xcd, 0x6c, 0x13, 0xb8, 0x9b, 0x5b, 0xd0, 0x26, 0xcb, 0x39, 0xd5))as.raw(c(0x56, 0xcd, 0x6f, 0xc6, 0xb8, 0x9b, 0x5b, 0xd0, 0x26, 0xcb, 0x39, 0xde))"

但我想得到这样的东西:

[[1]]
list()

[[2]]
list()

[[3]]
list()

[[4]]
list()

[[5]]
list()

[[6]]
[[6]][[1]]
[1] "56cd5f02b89b5bd026cb39c9"

[[6]][[2]]
[1] "56cd6c13b89b5bd026cb39d5"

[[6]][[3]]
[1] "56cd6fc6b89b5bd026cb39de"

有谁知道如何处理这个?还有更有效的方法来完成整个工作吗?

更新

我应该提供一些代码来重现我的原始数据集:

test = as.raw(as.hexmode(x = c("56","cd","5f","02","b8","9b","5b","d0","26","cb","39","c9")))
df = lapply(1:10,function(x) test)

虽然这段代码产生了这个:

[[1]]
list()

[[2]]
[[2]][[1]]
[1] 5f

[[2]][[2]]
[1] d0


[[3]]
[[3]][[1]]
[1] 26

[[3]][[2]]
[1] 56


[[4]]
list()

[[5]]
[[5]][[1]]
[1] cb


[[6]]
list()

它不像原始df,但我真的不知道如何在嵌套列表中粘贴原始数据,希望这对您有所帮助!

sapply(df, function(x) paste(x,collapse = ''))的结果就像这样:

[1] ""                                                                                                                                                                                                                                                   
[2] ""                                                                                                                                                                                                                                                   
[3] ""                                                                                                                                                                                                                                                   
[4] ""                                                                                                                                                                                                                                                   
[5] ""                                                                                                                                                                                                                                                   
[6] "as.raw(c(0x56, 0xcd, 0x5f, 0x02, 0xb8, 0x9b, 0x5b, 0xd0, 0x26, 0xcb, 0x39, 0xc9))as.raw(c(0x56, 0xcd, 0x6c, 0x13, 0xb8, 0x9b, 0x5b, 0xd0, 0x26, 0xcb, 0x39, 0xd5))as.raw(c(0x56, 0xcd, 0x6f, 0xc6, 0xb8, 0x9b, 0x5b, 0xd0, 0x26, 0xcb, 0x39, 0xde))"

1 个答案:

答案 0 :(得分:0)

只需使用paste(),而无需在as.character()来电中致电sapply()。 简短的例子:

convertRaw = function(x) paste(x,collapse = '') # works identical in sapply
test = as.raw(as.hexmode(x = c("56","cd","5f","02","b8","9b","5b","d0","26","cb","39","c9"))) # line copied from your sample
convertRaw(test)
[1] "56cd5f02b89b5bd026cb39c9"

<强>更新 实际上,使用嵌套列表会产生另一个问题。由于您处理嵌套列表,您的sapply调用也需要嵌套。您可以通过lapply()拨打电话。这是一个简短的例子,希望最终解决您的问题:

test = as.raw(as.hexmode(x = c("56","cd","5f","02","b8","9b","5b","d0","26","cb","39","c9")))
testList = list(list(),list(test,test)) # here I create a short nested list
res = lapply(testList,function(y) sapply(y,function(x) paste(x,collapse = '')))
print(res) 

结果是:

[[1]] list() 

[[2]] [1] "56cd5f02b89b5bd026cb39c9" "56cd5f02b89b5bd026cb39c9"

如果你喜欢这个:

[[1]] list()

[[2]] [[2]][[1]] 
[1] "56cd5f02b89b5bd026cb39c9"

[[2]][[2]] 
[1] "56cd5f02b89b5bd026cb39c9"

只需致电,lapply()嵌套:

lapply(testList,function(y) lapply(y,function(x) paste(x,collapse = '')))