检查数据框中的单词

时间:2018-06-03 14:38:49

标签: r lapply

我有两个数据帧A和B.我想检查数据帧B中是否存在数据帧A的唯一字。如果存在,则保留该字,否则从数据帧B的每一行中删除字。

A <- data.frame(name = c(
  "X-ray right leg arteries",
  "consultation of gynecologist",
  "x-ray leg arteries",
  "x-ray leg with 20km distance"
), stringsAsFactors = F)

B <- data.frame(name = c(
  "X-ray left leg arteries",
  "consultation (inspection) of gynecalogist",
  "MRI right leg arteries",
  "X-ray right leg arteries with special care"
), stringsAsFactors = F)


k=unique(unlist(strsplit(A$name, " ")))
d = do.call(rbind, lapply(B$name, function(z) {
  xx = lapply(lapply(k, function(x) grepl(x, unlist(strsplit(z, " ")), fixed = T)), which)
  paste(k[sapply(xx, function(x) length(x)>0)], collapse = " ")
}
))

我已经解决了。只是想知道是否有一种有效的方法,因为我的真实数据集中有超过15K行。

1 个答案:

答案 0 :(得分:3)

我们可以使用'k'从'B'中提取唯一的单词,然后将import sys; sys.stdout.write("Hello again")这些元素一起提取出来,而不是多个循环

version: '3.4'
  services:  
     foo1:  
        ports:
          - target: 8081
            published: 8084
            mode: host
        networks:
          - dev-net
        command: make foo1.start
        logging:
            driver: gelf
            options:
               gelf-address: udp://localhost:12201

     some-mongo:
        image: "mongo:3"
        networks:
          - dev-net
     some-elasticsearch:
        image: "elasticsearch:2"
        command: "elasticsearch -Des.cluster.name='graylog'"
        networks:
          - dev-net
     graylog:
        image: graylog2/server:2.1.1-1
        environment:
           GRAYLOG_PASSWORD_SECRET: somepasswordpepper
           GRAYLOG_ROOT_PASSWORD_SHA2: 8c6976e5b5410415bde908bd4dee15dfb1
           GRAYLOG_WEB_ENDPOINT_URI: http://127.0.0.1:9000/api
        links:
           - some-mongo:mongo
           - some-elasticsearch:elasticsearch
        ports:
           - "9000:9000"
           - "12201:12201/udp"
        networks:
           - dev-net

  networks:
    dev-net:
      ipam:
        config:
          - subnet: 192.168.12.0/24