我不知道我是否能够正确地解释这一点但是这里有。我有一个名为ZCP
的数据框,其中包含标记化的推文(将用于情感分析)和相关的元数据。结构如下所示:
head(ZAD)
num_tokens unique_tokens
1 12 12
2 11 10
3 11 10
4 12 12
5 22 20
6 11 10
text
1 rt, caradelevingne, fam, a, lam, glastonbury, https, t, co, h, ew, oux
2 rt, caradelevingne, home, sweet, home, glastonbury, https, t, co, zolld, ltvt
3 rt, caradelevingne, home, sweet, home, glastonbury, https, t, co, zolld, ltvt
4 rt, caradelevingne, fam, a, lam, glastonbury, https, t, co, h, ew, oux
5 rt, yahoocelebuk, adele, set, to, dominate, the, uk, albums, chart, as, heads, back, to, number, post, glastonbury, https, t, co, cndkufsgo, https
6 rt, caradelevingne, home, sweet, home, glastonbury, https, t, co, zolld, ltvt
favoriteCount id retweetCount isRetweet
1 0 747942553010397184 593 TRUE
2 0 747942530340118529 729 TRUE
3 0 747941795988905986 729 TRUE
4 0 747941781820542976 593 TRUE
5 0 747940287847161856 3 TRUE
6 0 747940084603838464 729 TRUE
基本上我现在只对文本栏中的数据感兴趣。该数据如下所示:
head(ZCP$text)
$to_return [1] "saw" "viola" "beach" "support" "courteeners" "in"
[7] "w" "ton" "to" "see" "coldplay" "do"
[13] "the" "tribute" "to" "them" "at" "glastonbury" [19] "was" "amazing" "so" "well" "thought" "out"
[[2]] [1] "glastonbury" "coldplay" "elo" "break" "viewing" "records"
[7] "muse" "s" "audience" "doubles" "https" "t"
[13] "co" "eocvqnoeen" "coldplay" "muse" "https" "t"
[19] "co" "yd" "ie" "xr" "n"
[[3]] [1] "another" "cheeky" "glastonbury" "pic" "coldplay" "pyramidstage"[7] "https" "t" "co" "qttz" "xgjpx" "https"
[13] "t" "co" "rm" "y" "pbvml"
[[4]] [1] "i" "m" "having" "my" "very" "own"
[7] "glastonbury" "tonight" "coldplay" "adele"
[[5]] [1] "that" "was" "awesome" "coldplay" "glastonbury" "glasto"
[7] "https" "t" "co" "fz" "ly" "cvx"
[[6]] [1] "beegees" "barry" "gibb" "stayin" "alive" "and"
[7] "coldplay" "en" "glastonbury" "https" "t" "co"
[13] "hoj"
我应该使用哪个运算符来获取个人令牌?我打算编写一个for循环,但我无法通过合适的操作符来获取数据中的数据。 ZCP$text[1]
给出了以下结果:
ZCP$text[1] $to_return [1] "saw" "viola" "beach" "support" "courteeners" "in"
[7] "w" "ton" "to" "see" "coldplay" "do"
[13] "the" "tribute" "to" "them" "at" "glastonbury" [19] "was" "amazing" "so" "well" "thought" "out"
如何获取此对象的第一个元素?由于某种原因,我无法找到合适的操作员。任何帮助表示赞赏。感谢。
编辑:@Sotos要求为此输入。不确定这是否是他想要的(我是R的noob,之前从未使用过dput)但是这里是head(ZCP)
:
structure(list(num_tokens = structure(list(to_return = 24L, 23L,
17L, 10L, 12L, 16L), .Names = c("to_return", "", "", "",
"", "")), unique_tokens = structure(list(to_return = 23L, 18L,
14L, 10L, 12L, 16L), .Names = c("to_return", "", "", "",
"", "")), text = structure(list(to_return = c("saw", "viola",
"beach", "support", "courteeners", "in", "w", "ton", "to", "see",
"coldplay", "do", "the", "tribute", "to", "them", "at", "glastonbury",
"was", "amazing", "so", "well", "thought", "out"), c("glastonbury",
"coldplay", "elo", "break", "viewing", "records", "muse", "s",
"audience", "doubles", "https", "t", "co", "eocvqnoeen", "coldplay",
"muse", "https", "t", "co", "yd", "ie", "xr", "n"), c("another",
"cheeky", "glastonbury", "pic", "coldplay", "pyramidstage", "https",
"t", "co", "qttz", "xgjpx", "https", "t", "co", "rm", "y", "pbvml"
), c("i", "m", "having", "my", "very", "own", "glastonbury",
"tonight", "coldplay", "adele"), c("that", "was", "awesome",
"coldplay", "glastonbury", "glasto", "https", "t", "co", "fz",
"ly", "cvx"), c("beegees", "barry", "gibb", "stayin", "alive",
"and", "coldplay", "en", "glastonbury", "https", "t", "co", "hoj",
"u", "j", "yz")), .Names = c("to_return", "", "", "", "", "")),
favoriteCount = structure(list(to_return = 2, 1, 0, 0, 0,
1), .Names = c("to_return", "", "", "", "", "")), id = structure(list(
to_return = "747938975621521408", "747938533290049537",
"747934687696420864", "747934531756384256", "747931753373892608",
"747928260835696640"), .Names = c("to_return", "", "",
"", "", "")), retweetCount = structure(list(to_return = 1,
0, 0, 0, 0, 0), .Names = c("to_return", "", "", "", "",
"")), isRetweet = structure(list(to_return = FALSE, FALSE,
FALSE, FALSE, FALSE, FALSE), .Names = c("to_return",
"", "", "", "", ""))), .Names = c("num_tokens", "unique_tokens",
"text", "favoriteCount", "id", "retweetCount", "isRetweet"), row.names = c(NA,
6L), class = "data.frame")