我有一个名为Test1
的数据框,其中包含230,000家公司。我需要做的是将Tests1
分组为一个名为FinalDS
的新DF。
我创建了一个名为Customers
的列表,其中包含我需要放入FinalDS
DF的客户端的几个名称变体(大约100k)。
我正在寻找的是R来查看我的Customers
DF并在Test1
DF中查找客户名称但是!....我需要的是R来扫描Customers
DF并查看它是否可以匹配Customers
DF中Test1
DF
例如:
我在Customers
DF上有这个客户:
Centrica PLC
但是在Test1
DF我有Centrica
所以通过定义将没有匹配。我知道我可以通过删除PLC
DF中的Customers
部分来让所有客户匹配,但我有一个大约10万客户的列表。
这是我写的代码:
Customers = c("Adidas","ADIDAS GROUP","ALIBABA GROUP","ALIBABA.COM (EUROPE) LTD"
,"Apple Asia Pacific Pte Ltd" ,"APPLE DISTRIBUTION INTERNATIONAL"
,"APPLE EUROPE LTD","Apple Sales International"
,"AVIVA-PLC","Aviva -Norwich Union"
,"Aviva -Norwich Union-MSP","AVIVA PLC"
,"AXA TECHNOLOGY SERVICES UK LTD","AXA UK PLC"
,"Bank of Baroda","Bank of Baroda"
,"BARCLAYS","BARCLAYS BANK PLC"
,"BARCLAYS PLC","BRAVURA SOLUTIONS LTD"
,"CENTRICA PLC","CISCO"
,"Cisco Systems LTD","CSC (NG)-MSP"
,"CSC COMPUTER SCIENCES LTD","EMC CORPORATION"
,"GE Infrastructure UK Limited","GE MEDICAL SYSTEMS INFORMATION TECHNOLOGIES GMBH")
FinalDS = subset(Test1, grepl(paste(Customers, collapse = "|"), Test1$Customer_Name))
所有这一切都是尝试逐字逐句地匹配我Customer
列表中Test1
DF
请帮助!
答案 0 :(得分:1)
这个怎么样?
FinalDS = subset(
Test1,
grepl(paste0("(", paste(Customers, collapse = "|"), ")"), Customer_Name))