我有一个包含超过50,000行推文的列表。现在我已经从该列表中导出了主题标签,但是现在我已经遇到了几千行看起来像这样的主题标签
hashtag1; hashtag2; hashtag3; hashtag4
由于我想进行共同标签分析,我正在寻找一种方法将这些多个主题标签彼此连接,而不必手动将这些行转换为无向边。例如:
hashtag1; hashtag2
hashtag1; hashtag3
hashtag1; hashtag4
hashtag2; hashtag3
hashtag2; hashtag4
hashtag3; hashtag4
那么,您是否了解如何完成此任务(例如通过R)?我是一个R-noob,甚至更少"精通"与其他语言,但我渴望学习。
structure(list(V1 = structure(c(1L, 2L, 3L, 3L, 3L, 3L, 3L, 4L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 7L, 8L, 8L, 9L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 13L,
13L, 13L, 13L, 14L, 14L), .Label = c("profitkapital", "resupply",
"robotik", "rudidutschke", "russland", "sanktionen", "sanktionieren",
"schiller", "siegertyp", "snowden", "sockeleinkommen", "solidarity",
"sozialismus", "sozialphilosoph"), class = "factor"), V2 = structure(c(4L,
3L, 2L, 7L, 7L, 7L, 7L, 17L, 6L, 8L, 9L, 10L, 10L, 11L, 12L,
13L, 18L, 18L, 1L, 15L, 15L, 14L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 5L, 5L, 4L, 4L, 4L, 4L, 16L, 16L), .Label = c("alltag",
"arbeit", "bbq", "bge", "blockupy", "deutschland", "digitalisierung",
"griechenland", "grundeinkommen", "hartziv", "kenfm", "kirche",
"kopf", "kraft", "marx", "negt", "piraten", "sanktion"), class = "factor"),
V3 = structure(c(1L, 3L, 2L, 4L, 4L, 4L, 4L, 4L, 5L, 4L,
4L, 4L, 13L, 10L, 13L, 4L, 14L, 14L, 7L, 6L, 6L, 15L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 1L, 1L, 1L, 1L, 12L, 12L, 11L, 11L,
11L, 11L, 9L, 9L), .Label = c("", "abitur", "bbqrub", "bge",
"brd", "brecht", "deutschen", "fsa", "grundeinkommen", "hartziv",
"linkezukunft", "ows", "vatikan", "widerspruch", "würde"
), class = "factor"), V4 = structure(c(1L, 3L, 6L, 1L, 1L,
1L, 1L, 1L, 8L, 1L, 2L, 1L, 9L, 5L, 9L, 10L, 4L, 4L, 7L,
3L, 3L, 11L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
12L, 12L, 1L, 1L, 1L, 1L, 3L, 3L), .Label = c("", "bank",
"bge", "eilantrag", "haarp", "job", "jobcentern", "merkel",
"pastor", "probleme", "super", "unibrennt"), class = "factor"),
V5 = structure(c(1L, 3L, 5L, 1L, 1L, 1L, 1L, 1L, 7L, 1L,
10L, 1L, 2L, 9L, 2L, 4L, 8L, 8L, 6L, 1L, 1L, 6L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("", "bge", "bgenation", "fliegen", "geld",
"hartziv", "hitler", "sg", "ttip", "vorbild"), class = "factor"),
V6 = structure(c(1L, 5L, 2L, 1L, 1L, 1L, 1L, 1L, 6L, 1L,
1L, 1L, 8L, 4L, 8L, 7L, 4L, 4L, 4L, 1L, 1L, 4L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("", "altersarmut", "antifa", "bge", "deeznuts",
"holocaust", "klatsch", "sex"), class = "factor"), V7 = structure(c(1L,
1L, 2L, 1L, 1L, 1L, 1L, 1L, 6L, 1L, 1L, 1L, 1L, 3L, 1L, 1L,
4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 5L, 5L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "bge",
"cia", "hartz", "spanishrevolution", "wahre"), class = "factor"),
V8 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("", "cityoflondon", "grund", "peace"), class = "factor"),
V9 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,
1L, 1L, 1L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("", "bge", "occupy", "rothschild"), class = "factor"),
V10 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("", "ard", "gezi"), class = "factor"), V11 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "refugeeswelcome",
"zdf"), class = "factor"), V12 = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "nolegida",
"wdr"), class = "factor"), V13 = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "nopegida",
"swr"), class = "factor"), V14 = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "nocastor",
"zukunft"), class = "factor")), .Names = c("V1", "V2", "V3",
"V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12", "V13",
"V14"), class = "data.frame", row.names = c(NA, -41L))

答案 0 :(得分:0)
您可以使用combinat
尝试包combn
,这会产生几个排列
library(combinat)
combn(c("hashtag1", "hashtag2", "hashtag3", "hashtag4"), 2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "hashtag1" "hashtag1" "hashtag1" "hashtag2" "hashtag2" "hashtag3"
[2,] "hashtag2" "hashtag3" "hashtag4" "hashtag3" "hashtag4" "hashtag4"