我有以下提到的数据框:
ID Value1 Value2
AAA-01 Ert we ert-We
AAA-02 ATT ER ATT ER
AAA-03 Accept accepted
AAA-04 Apple Apple
AAA-05 VEETR veetr
AAA-06 EERTT RRFTF
AAA-07 ETYuU RTTRR
通过使用上述数据框,我想匹配外观相似的文本,并为其赋予TRUE
和FALSE
值。
输出:
ID Value1 Value2 Status
AAA-01 Ert we ert-We TRUE
AAA-02 ATT ER ATT ER TRUE
AAA-03 Accept accepted TRUE
AAA-04 Apple Apple TRUE
AAA-05 VEETR veetr TRUE
AAA-06 EERTT RRFTF FALSE
AAA-07 ETYuU RTTRR FALSE
答案 0 :(得分:1)
下面是一种可能的方法。不知道这是否可以满足您在本例之外的“外观相似的文字”标准,但这可能会让您有所收获。
df = read.table(text="ID Value1 Value2
AAA-01 Ert_we ert-We
AAA-02 ATT_ER ATT_ER
AAA-03 Accept accepted
AAA-04 Apple Apple
AAA-05 VEETR veetr
AAA-06 EERTT RRFTF
AAA-07 ETYuU RTTRR",header=T)
Value1_txt = tolower(gsub('[^[:alpha:] ]','',df$Value1))
Value2_txt = tolower(gsub('[^[:alpha:] ]','',df$Value2))
df$similar = mapply(function(x,y) grepl(x,y) | grepl(y,x) ,Value1_txt,Value2_txt)
输出:
ID Value1 Value2 similar
1 AAA-01 Ert_we ert-We TRUE
2 AAA-02 ATT_ER ATT_ER TRUE
3 AAA-03 Accept accepted TRUE
4 AAA-04 Apple Apple TRUE
5 AAA-05 VEETR veetr TRUE
6 AAA-06 EERTT RRFTF FALSE
7 AAA-07 ETYuU RTTRR FALSE
答案 1 :(得分:0)
在此示例中,假设“外观相似的文本” 表示转换为小写字母后的前三个字符是相同的:
match (r:Reply)--(n:TRANS)
return split(toInteger(n.content), " ")
位置:
df$Status <- with(
df,
tolower(substr(Value1, 1, 3)) == tolower(substr(Value2, 1, 3))
)
df
ID Value1 Value2 Status
1 AAA-01 Ert we ert-We TRUE
2 AAA-02 ATT ER ATT ER TRUE
3 AAA-03 Accept accepted TRUE
4 AAA-04 Apple Apple TRUE
5 AAA-05 VEETR veetr TRUE
6 AAA-06 EERTT RRFTF FALSE
7 AAA-07 ETYuU RTTRR FALSE