data.frame中的错误:参数意味着行数不同:2,5,19,7,1,11,4,6,9,3,13,12,26,27,30,31,39,35

时间:2016-02-25 18:04:50

标签: r

我正在研究R项目。在尝试分析情绪时,我不得不创建一个数据框(在我的前面,它是“sentiment.df”)。

sentiment.df <- data.frame(text, emotion=emotion, polarity=polarity, stringsAsFactors=FALSE) 

此处,文本 - 包含已处理(已清理)推文的列表,这些推文已拆分为关键字;情感 - 包含一堆人物情感;极性 - 包含+ ve,-ve评论家。运行上面的LOC时,我的RStudio引发了以下错误:

Error in data.frame(c("httpstcoux1aacnxbk", "endalz"), c("i", "have",  : 
  arguments imply differing number of rows: 2, 5, 19, 7, 1, 11, 4, 6, 9, 3, 13, 17, 8, 10, 24, 21, 15, 12, 25, 16, 20, 23, 18, 28, 14, 22, 26, 27, 30, 31, 29, 35

这三个变量的长度 - 文本,情感和极性都相同:2621

这就是我的数据:

    > str(text)
List of 2621
 $ : chr [1:2] "httpstcoux1aacnxbk" "endalz"
 $ : chr [1:5] "i" "have" "the" "best" ...
 $ : chr [1:19] "kenny" "easley" "seahawks" "captain" ...
 $ : chr [1:2] "good" "defense"
 $ : chr [1:7] "superbowlxlix" "party" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__ "" ...
 $ : chr "ihatetombrady"
 $ : chr [1:11] "coachbourbonusa" "understood" "still" "dont" ...
 $ : chr [1:19] "tiwaworks" "whitney" "houston" "sings" ...
 $ : chr [1:4] "thats" "still" "bae" "<U+2764><U+FE0F>""| __truncated__
 $ : chr [1:6] "were" "a" "thousand" "miles" ...
 $ : chr [1:7] "dredoo24" "what" "i" "like" ...
 $ : chr [1:2] "bww" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__
 $ : chr [1:9] "i" "seriously" "cant" "wait" ...
 $ : chr [1:3] "flyysociety" "photoshoot<U+2716><U+FE0F>""| __truncated__ "httptcoxkywsj5i2x"
 $ : chr [1:5] "lienne11" "wait" "whos" "performing" ...
 $ : chr [1:13] "game" "on" "go" "wildcats<U+FFFD><U+FFFD>\u2b07<U+FE0F>""| __truncated__ ...
 $ : chr [1:2] "good" "defense"
 $ : chr [1:11] "seattle" "seahawks" "fan" "" ...
 $ : chr [1:9] "realprestonj" "congratulations" "preston" "the" ...
 $ : chr [1:5] "tsu19" "so" "funny" "bruh" ...
 $ : chr [1:4] "drunk" "tweets" "coming" "soon"
 $ : chr "tb12"
 $ : chr [1:13] "hicksville" "schools" "will" "be" ...
 $ : chr [1:5] "but" "momma" "said" "superbowl" ...
 $ : chr [1:4] "raggedy" "ass" "bitch" ""
 $ : chr [1:5] "arbyscares" "arbys" "prairie" "village" ...
 $ : chr [1:17] "lovetruth79" "ltltltloves" "to" "send" ...
 $ : chr [1:8] "“boynamedhxlz""| __truncated__ "quote" "this" "tweet" ...
 $ : chr [1:13] "stretching" "for" "ballet" "now" ...
 $ : chr [1:7] "jerrodflusche" "janabewley" "narnia" "for" ...
 $ : chr [1:8] "here" "goes" "my" "whole" ...
 $ : chr [1:10] "who" "you" "going" "for" ...
 $ : chr [1:3] "good" "stop" "hawks"
 $ : chr [1:5] "brady" "be" "smokin" "blounts" ...
 $ : chr [1:8] "me" "decepcioné" "perdoné" "hice" ...
 $ : chr [1:7] "happy21stbirthdayharry" "" "its" "also" ...
 $ : chr [1:24] "teammic3rd" "sounds" "amazing" "" ...
 $ : chr [1:21] "millions" "of" "people" "packed" ...
 $ : chr [1:8] "missed" "idina" "singing" "by" ...
 $ : chr [1:2] "your" "stupid"
 $ : chr [1:5] "seahawks" "all" "the" "way" ...
 $ : chr [1:4] "takeathillpill" "you" "are" "vile"
 $ : chr [1:3] "lets" "goo" "superbowlixlix"
 $ : chr [1:4] "snow" "day" "nigga" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__
 $ : chr [1:6] "ill" "just" "watch" "total" ...
 $ : chr [1:9] "liveextra" "site" "down" "its" ...
 $ : chr [1:3] "time" "to" "punt"
 $ : chr [1:5] "zachdettloff516" "groans" "at" "terrible" ...
 $ : chr [1:3] "go" "seahawks" "<U+FFFD><U+FFFD>""| __truncated__
 $ : chr [1:7] "pizza" "friends" "super" "bowl" ...
 $ : chr [1:9] "hold" "onto" "me" "cause" ...
 $ : chr [1:6] "tom" "gonna" "get" "his" ...
 $ : chr [1:6] "lets" "goooooo" "nice" "3rd" ...
 $ : chr [1:15] "2" "fatal" "crashes" "reported" ...
 $ : chr [1:12] "supra" "dope" "atx" "sundayfunday" ...
 $ : chr [1:19] "all" "these" "students" "from" ...
 $ : chr [1:3] "danstricko" "not" "happening"
 $ : chr [1:17] "tom" "brady" "may" "wear" ...
 $ : chr "httptconqabzdezwf"
 $ : chr [1:4] "i" "miss" "you" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__
 $ : chr [1:25] "john" "legend" "and" "idina" ...
 $ : chr [1:13] "snowed" "in" "with" "kadybuchler" ...
 $ : chr [1:6] "that" "bright" "green" "and" ...
 $ : chr [1:9] "ive" "got" "the" "seahawks" ...
 $ : chr [1:9] "sds" "by" "mac" "miller" ...
 $ : chr [1:5] "jakeski52" "rotowire" "or" "roger" ...
 $ : chr "damnit"
 $ : chr "hawks"
 $ : chr [1:7] "my" "nephews" "and" "niece" ...
 $ : chr [1:16] "liking" "your" "own" "posts" ...
 $ : chr [1:2] "bailaconbruce" "fb"
 $ : chr [1:4] "djones7" "hell" "no" "<U+FFFD><U+FFFD>""| __truncated__
 $ : chr [1:7] "best" "part" "of" "the" ...
 $ : chr [1:13] "holls016" "f" "u" "i" ...
 $ : chr [1:6] "mikebarnicle" "nice" "to" "meet" ...
 $ : chr [1:5] "u" "played" "me" "dirty" ...
 $ : chr [1:13] "my" "bac" "is" "looking" ...
 $ : chr [1:2] "est" "2008"
 $ : chr [1:12] "vacation" "time" "" "thats" ...
 $ : chr [1:3] "<U+FFFD><U+FFFD>""| __truncated__ "ok" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD"| __truncated__
 $ : chr [1:2] "common" "seattle"
 $ : chr [1:3] "no" "cacc" "talc"
 $ : chr "lob"
 $ : chr [1:3] "cut" "the" "crap"
 $ : chr [1:11] "im" "at" "las" "alitas" ...
 $ : chr [1:3] "backstreets" "back" "alrighttttt"
 $ : chr [1:6] "the" "seahawks" "are" "going" ...
 $ : chr [1:13] "baby" "its" "cold" "outside" ...
 $ : chr [1:15] "i" "have" "sooo" "much" ...
 $ : chr [1:10] "so" "whos" "gonna" "pull" ...
 $ : chr [1:5] "my" "driveway" "tonight" "nwiweather" ...
 $ : chr "fuck"
 $ : chr [1:21] "now" "that" "its" "actually" ...
 $ : chr [1:7] "green" "goats" "<U+FFFD><U+FFFD>""| __truncated__ "" ...
 $ : chr [1:15] "i" "guess" "its" "time" ...
 $ : chr [1:3] "lets" "go" "seattle"
 $ : chr [1:20] "jozybrambila7" "do" "you" "ever" ...
 $ : chr [1:4] "reggiewo" "nice" "choice" "cheers"
 $ : chr [1:20] "i" "enjoy" "super" "bowl" ...
  [list output truncated]

> str(emotion)
 chr [1:2621] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "joy" ...
> str(polarity)
 chr [1:2621] "positive" "positive" "positive" "positive" "positive" "positive" "positive" ...

当我在网上发布此错误时,程序员会说行数和行数。 cols不一样。即,它不是方形矩阵,Dataframe不适用于矩形矩阵。

如果有人帮助我摆脱这个错误,我将不胜感激。

提前致谢!

1 个答案:

答案 0 :(得分:1)

“文本”中有2621个列表,但文本条目的数量不同。 每个列表可以包含不同数量的单词。 因此即使是unlist()也不会帮助你,因为所有单词的数量都大于'情绪'和'极性'向量中的条目数。