Question

我正在分析我的Facebook页面的帖子，看看什么样的帖子吸引了大多数人。所以我想创建使用标签的列。以下是数据导出的示例：

Post              Likes
Blah   #a          10
Blah Blah #b       12
Blah Bleh #a       10
Bleh   #b           9
Bleh Blah #a #b    15

我想创建这个：

Post              Likes   tags
Blah   #a          10      #a
Blah Blah #b       12      #b
Blah Bleh #a       10      #a
Bleh   #b           9      #b
Bleh Blah #a #b    15      #a #b
Bleh #b Blah #a    14      #a #b

这可能吗？我想用grep1检查里面带有“＃”的帖子，但是我仍然坚持下一步该做什么。

Answer 1

这似乎有效：

#random data
DF <- data.frame(Post = c("asd wer #a", "dfg #b gg", 
                          "wer #c qwe qweeee #a #b", "asd asd, ioi #a #c"),
                 Likes = c(sample(1:50, 4)), stringsAsFactors = F)

#find tags
Tags <- lapply(DF$Post, function(x) { spl <- unlist(strsplit(x, " ")) ; 
                                      paste(spl[grep("#", spl)], collapse = ",") })

DF$Tags <- Tags

> DF
                     Post Likes     Tags
1              asd wer #a     9       #a
2               dfg #b gg    10       #b
3 wer #c qwe qweeee #a #b    46 #c,#a,#b
4      asd asd, ioi #a #c    31    #a,#c

Answer 2

您可以使用gregexpr来查找所需的模式，并使用regmatches来提取它：

txt = c('Bleh Blah #a #b','Blah Bleh #a')
regmatches(txt,gregexpr('#[a-z]',txt))   ## I assume a tag is # followed by lower letter 
[[1]]
[1] "#a" "#b"

[[2]]
[1] "#a"

使用alexis示例，您可以这样写：

DF$tag <- regmatches(DF$Post,gregexpr('#[a-z]',DF$Post)

修改，以防案件标签有点像#hi（多个字母）：

txt = c('Bleh Blah #hi allo #b','Blah Bleh #a')
regmatches(txt,gregexpr('#[a-z]+',txt))

[1]]
[1] "#hi" "#b" 

[[2]]
[1] "#a"

R：使用导出的Facebook .csv数据创建包含标签的列

2 个答案: