句子检测和提取到相同的数据框架

时间:2015-03-03 11:14:57

标签: regex r

我有以下数据框:

reviews <- data.frame(value = c("Product was received in excellent condition. Made with high quality materials. Very Good product",
                               "Inexpensive. An improvement over integrated graphics.",
                               "I love that product so excite. I will order again if I need more .",
                               "Excellent card, great graphics."),
                      user = c(1,2,3,4),
                      Review_Id = c("101968","101968","210546","112546"), 
                      stringsAsFactors = FALSE)

我需要有所需的输出:

        user     review_Id                                 sentence
           1        101968        Made with high quality materials.
           1        101968                        Very Good product
           2        101968                             Inexpensive.
           2        101968 An improvement over integrated graphics.
           3        210546           I love that product so excite.
           3        210546      I will order again if I need more .
           4        112546          Excellent card, great graphics.

我想知道这样的事情:sent_detect(reviews$value)

但是我如何将该功能组合起来以获得所需的输出。

1 个答案:

答案 0 :(得分:0)

如果您的数据真的很整洁,您可以使用我的“splitstackshape”软件包中的cSplit

library(splitstackshape)
cSplit(reviews, "value", ".", direction = "long")
#                                          value user Review_Id
# 1: Product was received in excellent condition    1    101968
# 2:            Made with high quality materials    1    101968
# 3:                           Very Good product    1    101968
# 4:                                 Inexpensive    2    101968
# 5:     An improvement over integrated graphics    2    101968
# 6:               I love that product so excite    3    210546
# 7:           I will order again if I need more    3    210546
# 8:              Excellent card, great graphics    4    112546