如何获取数据中的数据

时间:2016-09-09 18:19:59

标签: r sentiment-analysis text-analysis

我不知道我是否能够正确地解释这一点但是这里有。我有一个名为ZCP的数据框,其中包含标记化的推文(将用于情感分析)和相关的元数据。结构如下所示:

head(ZAD)
num_tokens unique_tokens
1         12            12
2         11            10
3         11            10
4         12            12
5         22            20
6         11            10
text
1 rt, caradelevingne, fam, a, lam, glastonbury, https, t, co, h, ew, oux
2 rt, caradelevingne, home, sweet, home, glastonbury, https, t, co, zolld, ltvt
3 rt, caradelevingne, home, sweet, home, glastonbury, https, t, co, zolld, ltvt
4 rt, caradelevingne, fam, a, lam, glastonbury, https, t, co, h, ew, oux
5 rt, yahoocelebuk, adele, set, to, dominate, the, uk, albums, chart, as, heads, back, to, number, post, glastonbury, https, t, co, cndkufsgo, https
6 rt, caradelevingne, home, sweet, home, glastonbury, https, t, co, zolld, ltvt

favoriteCount                 id retweetCount isRetweet
1             0 747942553010397184          593      TRUE
2             0 747942530340118529          729      TRUE
3             0 747941795988905986          729      TRUE
4             0 747941781820542976          593      TRUE
5             0 747940287847161856            3      TRUE
6             0 747940084603838464          729      TRUE

基本上我现在只对文本栏中的数据感兴趣。该数据如下所示:

head(ZCP$text) 
$to_return [1] "saw" "viola" "beach" "support" "courteeners" "in"
[7] "w" "ton" "to" "see" "coldplay" "do"
[13] "the" "tribute" "to" "them" "at" "glastonbury" [19] "was" "amazing" "so" "well" "thought" "out"
[[2]] [1] "glastonbury" "coldplay" "elo" "break" "viewing" "records"
[7] "muse" "s" "audience" "doubles" "https" "t"
[13] "co" "eocvqnoeen" "coldplay" "muse" "https" "t"
[19] "co" "yd" "ie" "xr" "n"
[[3]] [1] "another" "cheeky" "glastonbury" "pic" "coldplay" "pyramidstage"[7] "https" "t" "co" "qttz" "xgjpx" "https"
[13] "t" "co" "rm" "y" "pbvml"
[[4]] [1] "i" "m" "having" "my" "very" "own"
[7] "glastonbury" "tonight" "coldplay" "adele"
[[5]] [1] "that" "was" "awesome" "coldplay" "glastonbury" "glasto"
[7] "https" "t" "co" "fz" "ly" "cvx"
[[6]] [1] "beegees" "barry" "gibb" "stayin" "alive" "and"
[7] "coldplay" "en" "glastonbury" "https" "t" "co"
[13] "hoj"

我应该使用哪个运算符来获取个人令牌?我打算编写一个for循环,但我无法通过合适的操作符来获取数据中的数据。 ZCP$text[1]给出了以下结果:

ZCP$text[1] $to_return [1] "saw" "viola" "beach" "support" "courteeners" "in"
[7] "w" "ton" "to" "see" "coldplay" "do"
[13] "the" "tribute" "to" "them" "at" "glastonbury" [19] "was" "amazing" "so" "well" "thought" "out"

如何获取此对象的第一个元素?由于某种原因,我无法找到合适的操作员。任何帮助表示赞赏。感谢。

编辑:@Sotos要求为此输入。不确定这是否是他想要的(我是R的noob,之前从未使用过dput)但是这里是head(ZCP)

structure(list(num_tokens = structure(list(to_return = 24L, 23L, 
17L, 10L, 12L, 16L), .Names = c("to_return", "", "", "", 
"", "")), unique_tokens = structure(list(to_return = 23L, 18L, 
14L, 10L, 12L, 16L), .Names = c("to_return", "", "", "", 
"", "")), text = structure(list(to_return = c("saw", "viola", 
"beach", "support", "courteeners", "in", "w", "ton", "to", "see", 
"coldplay", "do", "the", "tribute", "to", "them", "at", "glastonbury", 
"was", "amazing", "so", "well", "thought", "out"), c("glastonbury", 
"coldplay", "elo", "break", "viewing", "records", "muse", "s", 
"audience", "doubles", "https", "t", "co", "eocvqnoeen", "coldplay", 
"muse", "https", "t", "co", "yd", "ie", "xr", "n"), c("another", 
"cheeky", "glastonbury", "pic", "coldplay", "pyramidstage", "https", 
"t", "co", "qttz", "xgjpx", "https", "t", "co", "rm", "y", "pbvml"
), c("i", "m", "having", "my", "very", "own", "glastonbury", 
"tonight", "coldplay", "adele"), c("that", "was", "awesome", 
"coldplay", "glastonbury", "glasto", "https", "t", "co", "fz", 
"ly", "cvx"), c("beegees", "barry", "gibb", "stayin", "alive", 
"and", "coldplay", "en", "glastonbury", "https", "t", "co", "hoj", 
"u", "j", "yz")), .Names = c("to_return", "", "", "", "", "")),
favoriteCount = structure(list(to_return = 2, 1, 0, 0, 0, 
1), .Names = c("to_return", "", "", "", "", "")), id = structure(list(
to_return = "747938975621521408", "747938533290049537", 
"747934687696420864", "747934531756384256", "747931753373892608", 
"747928260835696640"), .Names = c("to_return", "", "", 
"", "", "")), retweetCount = structure(list(to_return = 1, 
0, 0, 0, 0, 0), .Names = c("to_return", "", "", "", "", 
"")), isRetweet = structure(list(to_return = FALSE, FALSE, 
FALSE, FALSE, FALSE, FALSE), .Names = c("to_return", 
"", "", "", "", ""))), .Names = c("num_tokens", "unique_tokens", 
"text", "favoriteCount", "id", "retweetCount", "isRetweet"), row.names = c(NA, 
6L), class = "data.frame")

0 个答案:

没有答案