如何将searchTwitter结果(从库(twitteR))转换为data.frame?

时间:2010-06-16 18:34:36

标签: r twitter rodbc

我正在努力将Twitter搜索结果保存到数据库(SQL Server)中,当我从twitteR中提取搜索结果时出现错误。

如果我执行:

library(twitteR)
puppy <- as.data.frame(searchTwitter("puppy", session=getCurlHandle(),num=100))

我收到错误:

Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class structure("status", package = "twitteR") into a data.frame

这很重要,因为为了使用RODBC将它添加到使用sqlSave的表中,它需要是一个data.frame。至少那是我得到的错误信息:

Error in sqlSave(localSQLServer, puppy, tablename = "puppy_staging",  : 
  should be a data frame

那么有没有人对如何将列表强制转换为data.frame或如何通过RODBC加载列表提出任何建议?

我的最终目标是拥有一个镜像searchTwitter返回的值结构的表。以下是我要检索和加载的示例:

library(twitteR)
puppy <- searchTwitter("puppy", session=getCurlHandle(),num=2)
str(puppy)

List of 2
 $ :Formal class 'status' [package "twitteR"] with 10 slots
  .. ..@ text        : chr "beautifull and  kc reg Beagle Mix for rehomes: This little puppy is looking for a new loving family wh... http://bit.ly/9stN7V "| __truncated__
  .. ..@ favorited   : logi FALSE
  .. ..@ replyToSN   : chr(0) 
  .. ..@ created     : chr "Wed, 16 Jun 2010 19:04:03 +0000"
  .. ..@ truncated   : logi FALSE
  .. ..@ replyToSID  : num(0) 
  .. ..@ id          : num 1.63e+10
  .. ..@ replyToUID  : num(0) 
  .. ..@ statusSource: chr "&lt;a href=&quot;http://twitterfeed.com&quot; rel=&quot;nofollow&quot;&gt;twitterfeed&lt;/a&gt;"
  .. ..@ screenName  : chr "puppy_ads"
 $ :Formal class 'status' [package "twitteR"] with 10 slots
  .. ..@ text        : chr "the cutest puppy followed me on my walk, my grandma won't let me keep it. taking it to the pound sadface"
  .. ..@ favorited   : logi FALSE
  .. ..@ replyToSN   : chr(0) 
  .. ..@ created     : chr "Wed, 16 Jun 2010 19:04:01 +0000"
  .. ..@ truncated   : logi FALSE
  .. ..@ replyToSID  : num(0) 
  .. ..@ id          : num 1.63e+10
  .. ..@ replyToUID  : num(0) 
  .. ..@ statusSource: chr "&lt;a href=&quot;http://blackberry.com/twitter&quot; rel=&quot;nofollow&quot;&gt;Twitter for BlackBerry®&lt;/a&gt;"
  .. ..@ screenName  : chr "iamsweaters"

所以我认为小狗的data.frame应该有列名,如:

- text
- favorited
- replytoSN
- created
- truncated
- replytoSID
- id
- replytoUID
- statusSource
- screenName

6 个答案:

答案 0 :(得分:17)

我使用的是我之前从http://blog.ouseful.info/2011/11/09/getting-started-with-twitter-analysis-in-r/找到的代码:

#get data
tws<-searchTwitter('#keyword',n=10)

#make data frame
df <- do.call("rbind", lapply(tws, as.data.frame))

#write to csv file (or your RODBC code)
write.csv(df,file="twitterList.csv")

答案 1 :(得分:7)

I know this is an old question, but still, here is what I think is a ``modern'' version to solve this. Just use the function twListToDf

gvegayon <- getUser("gvegayon")
timeline <- userTimeline(gvegayon,n=400)
tl <- twListToDF(timeline)

Hope it helps

答案 2 :(得分:3)

试试这个:

ldply(searchTwitter("#rstats", n=100), text)

twitteR返回一个S4类,因此您需要使用其中一个辅助函数,或直接处理其插槽。您可以使用unclass()查看插槽,例如:

unclass(searchTwitter("#rstats", n=100)[[1]])

可以使用相关功能(来自twitteR帮助:?statusSource)直接访问这些插槽:

 text Returns the text of the status
 favorited Returns the favorited information for the status
 replyToSN Returns the replyToSN slot for this status
 created Retrieves the creation time of this status
 truncated Returns the truncated information for this status
 replyToSID Returns the replyToSID slot for this status
 id Returns the id of this status
 replyToUID Returns the replyToUID slot for this status
 statusSource Returns the status source for this status

正如我所提到的,我的理解是你必须在输出中自己指定每个字段。以下是使用两个字段的示例:

> head(ldply(searchTwitter("#rstats", n=100), 
        function(x) data.frame(text=text(x), favorited=favorited(x))))
                                                                                                                                          text
1                                                     @statalgo how does that actually work? does it share mem between #rstats and postgresql?
2                                   @jaredlander Have you looked at PL/R? You can call #rstats from PostgreSQL: http://www.joeconway.com/plr/.
3   @CMastication I was hoping for a cool way to keep data in a DB and run the normal #rstats off that. Maybe a translator from R to SQL code.
4                     The distribution of online data usage: AT&amp;T has recently announced it will no longer http://goo.gl/fb/eTywd #rstat
5 @jaredlander not that I know of. Closest is sqldf package which allows #rstats and sqlite to share mem so transferring from DB to df is fast
6 @CMastication Can #rstats run on data in a DB?Not loading it in2 a dataframe or running SQL cmds but treating the DB as if it wr a dataframe
  favorited
1     FALSE
2     FALSE
3     FALSE
4     FALSE
5     FALSE
6     FALSE

如果您打算经常这样做,可以将其变成一个功能。

答案 3 :(得分:1)

对于那些遇到同样问题的人,我做了一个错误说

Error in as.double(y) : cannot coerce type 'S4' to vector of type 'double' 

我只是改变了

中的文字
ldply(searchTwitter("#rstats", n=100), text) 

到statusText,如下所示:

ldply(searchTwitter("#rstats", n=100), statusText)

只是友好的单挑:P

答案 4 :(得分:0)

这是一个很好的功能,可以将其转换为DF。

TweetFrame<-function(searchTerm, maxTweets)
{
  tweetList<-searchTwitter(searchTerm,n=maxTweets)
  return(do.call("rbind",lapply(tweetList,as.data.frame)))
}

将其用作:

tweets <- TweetFrame(" ", n)

答案 5 :(得分:0)

twitteR软件包现在包括一个函数twListToDF,它将为您完成此任务。

puppy_table <- twListToDF(puppy)