来自文本值的二进制列

时间:2017-12-30 20:53:05

标签: r data.table

示例数据:

df_stock2 <-data.frame(url= c("https://www.example.com/test","https://www.example2.com/test","https://www.example3.com/test"), stock_yes_01 = c("Google","Microsoft","Yahoo"), stock_yes_02 = c("Yahoo","Google",NA))

我尝试从here重现代码:

library(data.table)
setDT(df_stock2 )
df_stock3 <- dcast(melt(df_stock2 , url = 'url')[value != 'NA'],
      url ~ value, fun.aggregate = length)

然而,它没有按预期工作。

知道为什么这不起作用或我必须改变什么?

我收到的错误:

> setDT(df_stock2)
    Warning message:
    In melt.data.table(df_stock2, url = "url") :
      To be consistent with reshape2's melt, id.vars and measure.vars are internally guessed when both are 'NULL'. All non-numeric/integer/logical type columns are conisdered id.vars, which in this case are columns [url, stock_yes_01, stock_yes_02, stock_yes_03, ...]. Consider providing at least one of 'id' or 'measure' vars in future.
df_stock3 <- dcast(melt(df_stock2, url = 'url')[value != 'NA'],
    +       url() ~ value, fun.aggregate = length)
    Error in url() : argument "description" is missing, with no default
    In addition: Warning message:
    In melt.data.table(df_stock2, url = "url") :
      To be consistent with reshape2's melt, id.vars and measure.vars are internally guessed when both are 'NULL'. All non-numeric/integer/logical type columns are conisdered id.vars, which in this case are columns [url, stock_yes_01, stock_yes_02, stock_yes_03, ...]. Consider providing at least one of 'id' or 'measure' vars in future.

1 个答案:

答案 0 :(得分:2)

问题是您致电melt。您已将参数id重命名为url,这当然不起作用。 id参数告诉melt函数,应该使用哪些变量来识别观察结果。如果你没有指定它,那么melt将尝试猜测并将任何非数字(或整数或逻辑)变量作为id变量。这就是警告的全部内容。然后,由于在熔化数据后没有任何名为id的列,导致错误。

所以只需保持正确命名的id参数,它就可以了:

df_stock3 <- dcast(melt(df_stock2 , id = 'url')[value != 'NA'],
               url ~ value, fun.aggregate = length)