基于索引号在Excel中对Twitter数据进行排序

时间:2015-05-15 11:23:07

标签: r excel twitter

问题:

我正在研究我的论文,我必须说我对Excel中更先进的东西很新,我以前从未使用过R。我做的是以下内容:我使用R连接Twitter,我根据某个关键字搜索并保存了推文。现在我想确保我的数据正确排序,以便我可以对其进行分析。但是,我似乎无法正确修复我的数据,也无法使用R(因为它不读取数据),也不能使用Excel。目前我的数据如下:

数据示例:

,"text","favorited","favoriteCount","replyToSN","created","truncated","replyToSID","id","replyToUID","statusSource","screenName","retweetCount","isRetweet","retweeted","longitude","latitude"                                      
1,"RT @cdavandaag: De hashtag #ikstemCDA is deze maand al 7.500 (!) keer gebruikt, fantastisch. Op naar een mooi uitslag. #CDA #PS15 http://t.…",FALSE,0,NA,2015-03-17 23:58:23,FALSE,NA,"577982342775615488",NA,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","Cecile2511",25,TRUE,FALSE,NA,NA                                      
2,"RT @Matthijs85: Ligt het trouwens aan mij                                        
of wordt verschil CDA/VVD nu heel groot uitgelicht, terwijl ze feitelijk 92% hetzelfde stemmen?                                     
    #…",FALSE,0,NA,2015-03-17 23:58:04,FALSE,NA,"577982262282698752",NA,"<a href=""http://twitter.com"" rel=""nofollow"">Twitter Web Client</a>","meneerharmsen",3,TRUE,FALSE,NA,NA                                     
3,"@PuckPetrus bang makerij bemoei je niet met je buurman les 1                                     
wil jij de les gelezen worden ?                                     
    #vvd #pvda #d66 #cda",FALSE,0,"PuckPetrus",2015-03-17 23:57:39,FALSE,"577980323885105152","577982156426899458","1378104055","<a href=""http://twitter.com"" rel=""nofollow"">Twitter Web Client</a>","pufpufpafpaf",0,FALSE,FALSE,NA,NA                                     
4,"RT @FrankScholman: Het #CDA kiest #LagereLasten! Hier hebben we 7 goede redenen voor: http://t.co/utQt0LfEzl. #NOSdebat #PS15 #MeerBanen ht…",FALSE,0,NA,2015-03-17 23:57:36,FALSE,NA,"577982146582806528",NA,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","gijsdupont",4,TRUE,FALSE,NA,NA                                       
5,"RT @Jan_Slagter: In Hilversum werden de Buma awards uitgereikt, en  Buma wint het #nosdebat #cda",FALSE,0,NA,2015-03-17 23:56:36,FALSE,NA,"577981895570546688",NA,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","Ztrmarco",38,TRUE,FALSE,NA,NA                                        
6,"RT @StSteenbakkers: Peiling Maurice de Hond: tweestrijd VVD en CDA! Stem CDA!!! #Lagerelasten #CDA #100pBrabant",FALSE,0,NA,2015-03-17 23:56:31,FALSE,NA,"577981871168090113",NA,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","gijsdupont",5,TRUE,FALSE,NA,NA    

等等。当我将文本转换为excel中的列时,输出为:

    text    favorited   created id  statusSource    screenName  retweetCount    isRetweet   retweeted                                                                       
1   RT @cdavandaag: De hashtag #ikstemCDA is deze maand al 7.500 (!) keer gebruikt, fantastisch. Op naar een mooi uitslag. #CDA #PS15 http://t.…    FALSE   17-3-2015 23:58 5,77982E+17 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>  Cecile2511  25  TRUE    FALSE                                                                       
2   RT @Matthijs85: Ligt het trouwens aan mij                                                                                                       
    #…" FALSE   0   FALSE   NA  meneerharmsen   3   TRUE    FALSE   NA                                                                      
    #vvd #pvda #d66 #cda"   FALSE   0   FALSE   1378104055  pufpufpafpaf    0   FALSE   FALSE   NA                                                                      
4   RT @FrankScholman: Het #CDA kiest #LagereLasten! Hier hebben we 7 goede redenen voor: http://t.co/utQt0LfEzl. #NOSdebat #PS15 #MeerBanen ht…    FALSE   17-3-2015 23:57 5,77982E+17 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>  gijsdupont  4   TRUE    FALSE

结论:

程序未正确阅读推文。由于我有大量的推文,手动清理它不是一个选择。我认为可以根据第一列中已存在的Indexnumber对Tweets进行排序。有没有办法做到这一点(在excel中)?所以基本上它会在找到下一个数字时跳转到下一行?非常感谢任何帮助!

2 个答案:

答案 0 :(得分:1)

我可以使用

导入您的数据
x <- read.table("text.csv", header = TRUE, comment.char = "Ł", sep = ",")

诀窍是指定一个非默认的注释字符,因为#与twitter hastag冲突。

> str(x)
'data.frame':   6 obs. of  17 variables:
 $ X            : int  1 2 3 4 5 6
 $ text         : Factor w/ 6 levels "@PuckPetrus bang makerij bemoei je niet met je buurman les 1                                     \nwil jij de les gelezen worde"| __truncated__,..: 2 5 1 3 4 6
 $ favorited    : logi  FALSE FALSE FALSE FALSE FALSE FALSE
 $ favoriteCount: int  0 0 0 0 0 0
 $ replyToSN    : Factor w/ 1 level "PuckPetrus": NA NA 1 NA NA NA
 $ created      : Factor w/ 6 levels "2015-03-17 23:56:31",..: 6 5 4 3 2 1
 $ truncated    : logi  FALSE FALSE FALSE FALSE FALSE FALSE
 $ replyToSID   : num  NA NA 5.78e+17 NA NA ...
 $ id           : num  5.78e+17 5.78e+17 5.78e+17 5.78e+17 5.78e+17 ...
 $ replyToUID   : int  NA NA 1378104055 NA NA NA
 $ statusSource : Factor w/ 2 levels "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",..: 2 1 1 2 2 2
 $ screenName   : Factor w/ 5 levels "Cecile2511","gijsdupont",..: 1 3 4 2 5 2
 $ retweetCount : int  25 3 0 4 38 5
 $ isRetweet    : logi  TRUE TRUE FALSE TRUE TRUE TRUE
 $ retweeted    : logi  FALSE FALSE FALSE FALSE FALSE FALSE
 $ longitude    : logi  NA NA NA NA NA NA
 $ latitude     : Factor w/ 5 levels "NA    ","NA                                     ",..: 3 2 2 4 5 1

答案 1 :(得分:0)

我设法做到了!谢谢大家的帮助。将CSV数据的第一列复制到记事本++就可以了。从那里我能够导入它!

由于某种原因,R继续将“Ł”读作“L”。因此它切断了那里的数据。使用comment.char =“”,因为代码解决了问题。谢谢大家!