我有两个Excel
文件,我想匹配两个字符串,分别引用第一个数据集第2列中的公司和第二个excel文件中的colum 1。在这种情况下,例如BPET LIMITED
和BPET LTD
。 excel文件如下所示:
**ywOExport22** Company name "year" X Y Z
1. BLAFARMERS LIMITED 2017 1234 1 5
2. COTTONBALLS GROUP LIMITED 2017 1254 2 8
3. RIO JANEIRO LIMITED 2017 5233
4. BPET LIMITED 2017 6954 7 2
5. TELOPSTRA CORPORATION 2017 4569 5 1
**X20131403** Name ABN Income $ Taxable $
21ST AGE HOLDINGS PTY LTD 555454 464
A.C.N.A.BPTY LIMITED 546546 5553
ABBA HOLDINGS PTY LTD 455564 56 54646
BPET LTD 546454 6546 44545
ACCOLADE PTY LIMITED 464651 5456
我想在两个excel文件中创建一个匹配列,对另一个列进行“模糊匹配”,然后通过匹配将另一个左连接。我尝试了以下代码:
X20131403$match <- 0
ywOExport22$match <- 0
ywOExport22$match <- mapply(grepl(ywOExport22[,2], X20131403[,1], ignore.case = TRUE, perl = FALSE, fixed = FALSE, useBytes = FALSE))
X20131403$match <- X20131403[,1]
ywOExport22 <- left_join(ywOExport22, X20131403, by="match")
输出:
> ywOExport22$match <- mapply(grepl(ywOExport22[,2], X20131403[,1], ignore.case = TRUE, perl = FALSE,
+ fixed = FALSE, useBytes = FALSE))
Error in match.fun(FUN) :
c("'grepl(ywOExport22[, 2], X20131403[, 1], ignore.case = TRUE, ' ist nicht Funktion, Zeichen oder Symbol", "' perl = FALSE, fixed = FALSE, useBytes = FALSE)' ist nicht Funktion, Zeichen oder Symbol")
In addition: Warning message:
In grepl(ywOExport22[, 2], X20131403[, 1], ignore.case = TRUE, :
argument 'pattern' has length > 1 and only the first element will be used
>
> X20131403$match <- X20131403[,1]
> ywOExport22 <- left_join(ywOExport22, X20131403, by="match")
Error in left_join_impl(x, y, by_x, by_y, aux_x, aux_y, na_matches) :
Can't join on 'match' x 'match' because of incompatible types (character / numeric)
所需的输出:
Company name MATCH ABN Income $ Taxable$
BLAFARMERS LIMITED
COTTONBALLS GROUP LIMITED
RIO JANEIRO LIMITED
BPET LIMITED BPET LTD 5464545452 65466 445
TELOPSTRA CORP LIMITED
关于如何修复我的代码的任何建议?
答案 0 :(得分:0)
set.seed(101)
firstSet <- data.frame(
Company = c('BLAFARMERS LIMITED', 'COTTONBALLS GROUP LIMITED',
'RIO JANEIRO LIMITED', 'BPET LIMITED',
'TELOPSTRA CORPORATION'),
Year = rep(2017, times = 5),
X = runif(5)
)
secondSet <- data.frame(
Company = c('ST AGE HOLDINGS PTY LTD', 'A.C.N.A.BPTY LIMITED',
'ABBA HOLDINGS PTY LTD', 'BPET LTD',
'ACCOLADE PTY LIMITED'),
Income = floor(runif(5, 0, 100))
)
secondSet$MATCH <- secondSet$Company
gsub(
pattern = 'LTD',
replacement = 'LIMITED',
secondSet$Company) -> secondSet$Company
merge(firstSet, secondSet, by = c('Company'))
# output
# Company Year X Income MATCH
# 1 BPET LIMITED 2017 0.6576904 62 BPET LTD
很容易进行修改,以便在输出中获得空行。