我有一个名为“train”的两个数据框,其中包含要在“test”中查找的关键字,该关键字在不同的行中有多个“train”关键字实例。我希望“train”中的每个关键字对应的行以及存储在“test”行中的值。
我使用grep来提取数据,但是我无法为每个关键字的火车循环
示例代码:test[grep(("pharma"),test$org_name),]
这将返回测试中“pharma”的实例。你可以帮我循环所有“火车”的关键词,而不只是制药。
测试数据:
培训数据:
火车:
dput(train)
structure(list(name = structure(c(1L, 4L, 7L, 22L, 29L, 32L,
34L, 35L, 36L, 37L, 42L, 46L, 57L, 58L, 54L, 55L, 9L, 59L, 16L,
41L, 33L, 17L, 3L, 5L, 6L, 8L, 11L, 12L, 10L, 13L, 14L, 15L,
18L, 19L, 20L, 21L, 23L, 24L, 25L, 26L, 27L, 28L, 30L, 31L, 2L,
36L, 38L, 39L, 40L, 41L, 43L, 44L, 44L, 45L, 47L, 48L, 49L, 50L,
51L, 52L, 53L, 55L, 56L, 59L, 60L, 61L, 62L, 63L, 64L), .Label = c("3b",
"3m", "acadia", "advanced", "ajanta", "alexion", "als", "altana",
"astellas", "aurobindo", "avella", "axcan", "bayer", "beximco",
"bluepharma", "chugai", "dainippon", "diffusion", "ego", "elder",
"endo", "eusa", "ferring", "getz", "glenmark", "gulf", "hikma",
"incepta", "index", "intas", "janssen", "jhp", "mitsubishi",
"navidea biopharmaceuticals", "newbridge", "novabay", "nymox",
"octapharma", "orion", "ortho-mcneil", "otsuka", "pamlico", "par",
"pharma", "pharmacosmos", "pharmaxis", "purdue", "regeneron",
"respa", "salix", "sigma", "square", "sun", "takeda", "teva",
"torrent", "tragara", "tribute", "valeant", "veloxis", "vertex",
"vion", "wallace", "zandu"), class = "factor")), .Names = "name", row.names = c(NA,
-69L), class = "data.frame")
测试:
dput(head(test,100))
structure(list(org_name = c("reassign to novo nordisk pharma ltd.",
"acadia pharmaceuticals as", "pharma medica research", "institute of pharmaco economics",
"ucb pharma s.a.", "charles university - faculty of pharmacy",
"pharmasotique", "laboratoires hra pharma", "jacomm pharma aktiebolaget",
"wyeth-lederle pharma gmbh", "cyathus exquirere pharmaforschungsgmbh",
"unison pharmaceuticals", "pharmacetical compaany", "octapharma ag",
"otsuka pharmaceuticals ltd", "genus pharmaceuticals ltd", "dabur dabur pharma",
"ftip002859996 pharmalink consulting ltd", "pharmaqualityeuropesrl",
"smemlt2013022073965_complete solutions pharmacy gen merchan",
"2-8560138_astrazeneca pharmaceuticals phils. in", "2-4772798_qualimed pharma inc.",
"2-8437748_pryce pharmaceuticals inc", "otsuka pharmaceutical italy srl",
"sanitpharma public subnet", "dr falk pharma benelux b.v.", "18-4178048_rose pharmacy inc.-cebu",
"32-5203564_jehu-nissi pharma", "laboratorio drag pharma", "ntt data for egis pharmaceuticals plc",
"sasakawa pharmacy", "pharmathen lan", "pharma square co. ltd.",
"kobayashi pharmaceutical co. ltd.", "dr. kade pharmazeutische fabrik",
"merlion pharmaceuticals pte ltd", "alphalytik pharmaservice gmbh",
"pharmacontrol electronics gmbh", "millennium pharmaceuticals inc.",
"oldens pharmacy inc", "bentley pharmaceuticals", "taisho pharmaceutical rd inc.",
"meda pharmaceuticals inc-somerset- data", "contract pharmacal corp",
"pharma smart international inc", "shanghai china pharmaceutical co. ltd.",
"jebix corp dba mt olivet pharmacy", "changzhou 100 pharmaceutical network technology co. ltd.",
"jingjiang shutaibao pharmacy co. ltd", "pharmaceutical product development inc",
"banner pharmacaps inc", "salix pharmaceuticals", "pharmavail benefit management",
"zhejiang zhejiang university life pharmacy ltd.", "pharmacy monument",
"ann's pharmacy and discount", "fjfz-haihuapharmacy-corp", "pharmaceutical trade services",
"frank s pharmacy adva", "cadila pharmacy", "grand river pharmacy",
"pharmacy support services", "pharmakon solutions", "akorn pharmaceuticals",
"cabernet pharmaceuticals", "galloway pharmacy ii", "strawberry family pharmacy inc.",
"penitas family pharmacy", "apothecare pharmacy", "ream's pharmacy- accounting office bluffdale",
"united pharmacists network inc", "owl specialty pharmacy", "american surgical pharmacy",
"alkemists pharmaceuticals", "unisel pharma 122261", "j.b.chemicals pharmaceuticals ltd",
"medical arts pharmacy", "alcon pharmaceuticals ltd. parstavnieciba latvija",
"affinium pharmaceuticals inc", "medical pharmacies group ltd",
"natural pharmaceuticals sp.z.o.o", "grossiste pharmaceutique",
"laboratoire pharmaceutique de faconnage", "selarl pharmacie dagher",
"pharmaserve alliance sdn bhd", "leo pharma ges.m.b.h.", "ppd pharmaceutical develo",
"novartis pharma zrich interxion", "ftip003144466 pharma mix ltd",
"the royal pharmaceutical society of gb", "pharmanet ltd - ho",
"alpharmaxim limited", "western pharmaceutical centrum", "rbgqpo2013122756673_paramed pharmaceuticals inc.",
"solvay pharma lan", "18-4231499_coson pharmacy", "20-83752_metro pharma",
"2-8462367_medchoice pharma inc.", "germed pharma s p a", "smejup2013092625312_universalgenerics pharmacy inc."
)), .Names = "org_name", row.names = c(NA, 100L), class = "data.frame")
答案 0 :(得分:2)
您可以在基础R或ply
中使用plyr
系列函数。
lapply(train$name, grep, test$org_name)
或者,如果您想为每个字符串添加一个字符串:
sapply(train$name, function(x) paste(grep(x, test$org_name), collapse = ","))
如果需要,可以将其分配给新的列列,例如
train$matched <- sapply(train$name, function(x) paste(grep(x, test$org_name), collapse = ","))
如果您想要名称而不是行号:
train$matched <- sapply(train$name, function(x) paste(grep(x, test$org_name, value = TRUE), collapse = ","))