Question

我正在使用Sparklyr，我有一个单词向量：

word_list <- c("toto", "tata")

，我想知道其中一个是否在我的三个文本变量（text_1，text_2，text_3）之一中至少出现一次

table_1 <-  data.table(id = 1:3, 
                        text_1= c("(table  012 APM325)", "(JUI524 toto KIO879)" , "(pink car in the field KJU547 MPO362/JHY879)"), 
                        text_2= c("(chips train)", "(toto)", "(coco loco)"),
                        text_3= c("(train)", "(125 LMP)", "(yid tata)"))

现在我一直在使用这种循环方法：

sdf_table_1 <- copy_to(sc, table_1 , "table_1" , overwrite = TRUE)


sdf_table_2 <-  sdf_table_1 %>%
    mutate(found_word='0')%>%
    mutate(found_word=as.numeric(found_word))



for(k in word_list) {
   sdf_table_2 <- sdf_table_2 %>%
       mutate(found_word=ifelse(locate(k, text_1)>0 | 
                                  locate(k, text_2)>0 | 
                                  locate(k, text_3)>0, 1 , found_word))
   }




glimpse(sdf_table_2)


id         <int> 1, 2, 3
text_1     <chr> "(table  012 APM325)", "(JUI524 toto KIO879)", "(pink car in the field KJU547 MPO362/JHY879)"
text_2     <chr> "(chips train)", "(toto)", "(coco loco)"
text_3     <chr> "(train)", "(125 LMP)", "(yid tata)"
found_word <dbl> 0, 1, 1

由于Spark不能很好地处理循环，所以我想知道是否可以使用spark_apply使用另一种方法来做同样的事情？

感谢您的帮助！

textmining sparklyr spark_apply函数定位

0 个答案: