问题:
我正在研究我的论文,我必须说我对Excel中更先进的东西很新,我以前从未使用过R。我做的是以下内容:我使用R连接Twitter,我根据某个关键字搜索并保存了推文。现在我想确保我的数据正确排序,以便我可以对其进行分析。但是,我似乎无法正确修复我的数据,也无法使用R(因为它不读取数据),也不能使用Excel。目前我的数据如下:
数据示例:
,"text","favorited","favoriteCount","replyToSN","created","truncated","replyToSID","id","replyToUID","statusSource","screenName","retweetCount","isRetweet","retweeted","longitude","latitude"
1,"RT @cdavandaag: De hashtag #ikstemCDA is deze maand al 7.500 (!) keer gebruikt, fantastisch. Op naar een mooi uitslag. #CDA #PS15 http://t.…",FALSE,0,NA,2015-03-17 23:58:23,FALSE,NA,"577982342775615488",NA,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","Cecile2511",25,TRUE,FALSE,NA,NA
2,"RT @Matthijs85: Ligt het trouwens aan mij
of wordt verschil CDA/VVD nu heel groot uitgelicht, terwijl ze feitelijk 92% hetzelfde stemmen?
#…",FALSE,0,NA,2015-03-17 23:58:04,FALSE,NA,"577982262282698752",NA,"<a href=""http://twitter.com"" rel=""nofollow"">Twitter Web Client</a>","meneerharmsen",3,TRUE,FALSE,NA,NA
3,"@PuckPetrus bang makerij bemoei je niet met je buurman les 1
wil jij de les gelezen worden ?
#vvd #pvda #d66 #cda",FALSE,0,"PuckPetrus",2015-03-17 23:57:39,FALSE,"577980323885105152","577982156426899458","1378104055","<a href=""http://twitter.com"" rel=""nofollow"">Twitter Web Client</a>","pufpufpafpaf",0,FALSE,FALSE,NA,NA
4,"RT @FrankScholman: Het #CDA kiest #LagereLasten! Hier hebben we 7 goede redenen voor: http://t.co/utQt0LfEzl. #NOSdebat #PS15 #MeerBanen ht…",FALSE,0,NA,2015-03-17 23:57:36,FALSE,NA,"577982146582806528",NA,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","gijsdupont",4,TRUE,FALSE,NA,NA
5,"RT @Jan_Slagter: In Hilversum werden de Buma awards uitgereikt, en Buma wint het #nosdebat #cda",FALSE,0,NA,2015-03-17 23:56:36,FALSE,NA,"577981895570546688",NA,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","Ztrmarco",38,TRUE,FALSE,NA,NA
6,"RT @StSteenbakkers: Peiling Maurice de Hond: tweestrijd VVD en CDA! Stem CDA!!! #Lagerelasten #CDA #100pBrabant",FALSE,0,NA,2015-03-17 23:56:31,FALSE,NA,"577981871168090113",NA,"<a href=""http://twitter.com/download/iphone"" rel=""nofollow"">Twitter for iPhone</a>","gijsdupont",5,TRUE,FALSE,NA,NA
等等。当我将文本转换为excel中的列时,输出为:
text favorited created id statusSource screenName retweetCount isRetweet retweeted
1 RT @cdavandaag: De hashtag #ikstemCDA is deze maand al 7.500 (!) keer gebruikt, fantastisch. Op naar een mooi uitslag. #CDA #PS15 http://t.… FALSE 17-3-2015 23:58 5,77982E+17 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> Cecile2511 25 TRUE FALSE
2 RT @Matthijs85: Ligt het trouwens aan mij
#…" FALSE 0 FALSE NA meneerharmsen 3 TRUE FALSE NA
#vvd #pvda #d66 #cda" FALSE 0 FALSE 1378104055 pufpufpafpaf 0 FALSE FALSE NA
4 RT @FrankScholman: Het #CDA kiest #LagereLasten! Hier hebben we 7 goede redenen voor: http://t.co/utQt0LfEzl. #NOSdebat #PS15 #MeerBanen ht… FALSE 17-3-2015 23:57 5,77982E+17 <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a> gijsdupont 4 TRUE FALSE
结论:
程序未正确阅读推文。由于我有大量的推文,手动清理它不是一个选择。我认为可以根据第一列中已存在的Indexnumber对Tweets进行排序。有没有办法做到这一点(在excel中)?所以基本上它会在找到下一个数字时跳转到下一行?非常感谢任何帮助!
答案 0 :(得分:1)
我可以使用
导入您的数据x <- read.table("text.csv", header = TRUE, comment.char = "Ł", sep = ",")
诀窍是指定一个非默认的注释字符,因为#与twitter hastag冲突。
> str(x)
'data.frame': 6 obs. of 17 variables:
$ X : int 1 2 3 4 5 6
$ text : Factor w/ 6 levels "@PuckPetrus bang makerij bemoei je niet met je buurman les 1 \nwil jij de les gelezen worde"| __truncated__,..: 2 5 1 3 4 6
$ favorited : logi FALSE FALSE FALSE FALSE FALSE FALSE
$ favoriteCount: int 0 0 0 0 0 0
$ replyToSN : Factor w/ 1 level "PuckPetrus": NA NA 1 NA NA NA
$ created : Factor w/ 6 levels "2015-03-17 23:56:31",..: 6 5 4 3 2 1
$ truncated : logi FALSE FALSE FALSE FALSE FALSE FALSE
$ replyToSID : num NA NA 5.78e+17 NA NA ...
$ id : num 5.78e+17 5.78e+17 5.78e+17 5.78e+17 5.78e+17 ...
$ replyToUID : int NA NA 1378104055 NA NA NA
$ statusSource : Factor w/ 2 levels "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",..: 2 1 1 2 2 2
$ screenName : Factor w/ 5 levels "Cecile2511","gijsdupont",..: 1 3 4 2 5 2
$ retweetCount : int 25 3 0 4 38 5
$ isRetweet : logi TRUE TRUE FALSE TRUE TRUE TRUE
$ retweeted : logi FALSE FALSE FALSE FALSE FALSE FALSE
$ longitude : logi NA NA NA NA NA NA
$ latitude : Factor w/ 5 levels "NA ","NA ",..: 3 2 2 4 5 1
答案 1 :(得分:0)
我设法做到了!谢谢大家的帮助。将CSV数据的第一列复制到记事本++就可以了。从那里我能够导入它!
由于某种原因,R继续将“Ł”读作“L”。因此它切断了那里的数据。使用comment.char =“”,因为代码解决了问题。谢谢大家!