我有两个数据帧A和B.我想检查数据帧B中是否存在数据帧A的唯一字。如果存在,则保留该字,否则从数据帧B的每一行中删除字。
A <- data.frame(name = c(
"X-ray right leg arteries",
"consultation of gynecologist",
"x-ray leg arteries",
"x-ray leg with 20km distance"
), stringsAsFactors = F)
B <- data.frame(name = c(
"X-ray left leg arteries",
"consultation (inspection) of gynecalogist",
"MRI right leg arteries",
"X-ray right leg arteries with special care"
), stringsAsFactors = F)
k=unique(unlist(strsplit(A$name, " ")))
d = do.call(rbind, lapply(B$name, function(z) {
xx = lapply(lapply(k, function(x) grepl(x, unlist(strsplit(z, " ")), fixed = T)), which)
paste(k[sapply(xx, function(x) length(x)>0)], collapse = " ")
}
))
我已经解决了。只是想知道是否有一种有效的方法,因为我的真实数据集中有超过15K行。
答案 0 :(得分:3)
我们可以使用'k'从'B'中提取唯一的单词,然后将import sys; sys.stdout.write("Hello again")
这些元素一起提取出来,而不是多个循环
version: '3.4'
services:
foo1:
ports:
- target: 8081
published: 8084
mode: host
networks:
- dev-net
command: make foo1.start
logging:
driver: gelf
options:
gelf-address: udp://localhost:12201
some-mongo:
image: "mongo:3"
networks:
- dev-net
some-elasticsearch:
image: "elasticsearch:2"
command: "elasticsearch -Des.cluster.name='graylog'"
networks:
- dev-net
graylog:
image: graylog2/server:2.1.1-1
environment:
GRAYLOG_PASSWORD_SECRET: somepasswordpepper
GRAYLOG_ROOT_PASSWORD_SHA2: 8c6976e5b5410415bde908bd4dee15dfb1
GRAYLOG_WEB_ENDPOINT_URI: http://127.0.0.1:9000/api
links:
- some-mongo:mongo
- some-elasticsearch:elasticsearch
ports:
- "9000:9000"
- "12201:12201/udp"
networks:
- dev-net
networks:
dev-net:
ipam:
config:
- subnet: 192.168.12.0/24