我有2个数据集,想要进行模糊连接。
这是两个数据集。
.container2{
float: left;
width: 50%;
background-color: rgba(40,149,68,0.9);
color: white;
font-size: 55px;
}
两个数据集在library(data.table)
# data1
dt1 <- fread("NAME State type
ABERCOMBIE TOWNSHIP ND TS
ABERDEEN TOWNSHIP NJ TS
ABERDEEN TOWNSHIP SD TS
ABBOTSFORD CITY WI CI
ABERDEEN CITY WA CI
ADA TOWNSHIP MI TS
ADAMS IL TS", header = T)
# data2
dt2 <- fread("NAME State type
ABERDEEN TWP N J NJ TS
ABERDEEN WASH WA CI
ABBOTSFORD WIS WI CI
ADA TWP MICH MI TS
ADA OHIO OH CI
ADAMS MASS MA CI
ADAMSVILLE ALA AL CI", header = T)
和State
中具有相同的字符;但是,列type
不同。它们是相似的。
尽管我可以用3或4个宪章减去每个数据上的NAME
列,然后将它们合并,但由于观察到的大量数据,似乎正确的比率可能不高。
NAME
方法不好。
我检查软件包dt1$NameSubstr <- substr(dt1$NAME, 1, 4)
dt2$NameSubstr <- substr(dt2$NAME, 1, 4)
merge(dt1, dt2, by = c("NameSubstr", "State", "type"), all = T)
。但不确定我是否正确。
fuzzyjoin
此练习中的结果是正确的,请参见下文。但是,如果这两个数据中的任何NAME相同,答案将不正确。
我在这两个数据中创建了一个新观察值。
library(fuzzyjoin)
fuzzy_full_join(dt1, dt2, by = c("NAME" = "NAME", "State" = "State", "type" = "type"), match_fun = list(`!=`, `==`, `==`))
# Results
NAME.x State.x type.x NAME.y State.y type.y
1: ABERDEEN TOWNSHIP NJ TS ABERDEEN TWP N J NJ TS
2: ABBOTSFORD CITY WI CI ABBOTSFORD WIS WI CI
3: ABERDEEN CITY WA CI ABERDEEN WASH WA CI
4: ADA TOWNSHIP MI TS ADA TWP MICH MI TS
5: ABERCOMBIE TOWNSHIP ND TS <NA> <NA> <NA>
6: ABERDEEN TOWNSHIP SD TS <NA> <NA> <NA>
7: ADAMS IL TS <NA> <NA> <NA>
8: <NA> <NA> <NA> ADA OHIO OH CI
9: <NA> <NA> <NA> ADAMS MASS MA CI
10: <NA> <NA> <NA> ADAMSVILLE ALA AL CI
这是不正确的结果。 有什么建议吗?
似乎我不能使用dt1 <- fread("NAME State type
ABERCOMBIE TOWNSHIP ND TS
ABERDEEN TOWNSHIP NJ TS
ABERDEEN TOWNSHIP SD TS
ABBOTSFORD CITY WI CI
ABERDEEN CITY WA CI
ADA TOWNSHIP MI TS
ADAMS IL TS
THE SAME AA BB
", header = T)
dt2 <- fread("NAME State type
ABERDEEN TWP N J NJ TS
ABERDEEN WASH WA CI
ABBOTSFORD WIS WI CI
ADA TWP MICH MI TS
ADA OHIO OH CI
ADAMS MASS MA CI
ADAMSVILLE ALA AL CI
THE SAME AA BB
", header = T)
fuzzy_full_join(dt1, dt2, by = c("NAME" = "NAME", "State" = "State", "type" = "type"), match_fun = list(`!=`, `==`, `==`))
NAME.x State.x type.x NAME.y State.y type.y
1: ABERDEEN TOWNSHIP NJ TS ABERDEEN TWP N J NJ TS
2: ABBOTSFORD CITY WI CI ABBOTSFORD WIS WI CI
3: ABERDEEN CITY WA CI ABERDEEN WASH WA CI
4: ADA TOWNSHIP MI TS ADA TWP MICH MI TS
5: ABERCOMBIE TOWNSHIP ND TS <NA> <NA> <NA>
6: ABERDEEN TOWNSHIP SD TS <NA> <NA> <NA>
7: ADAMS IL TS <NA> <NA> <NA>
8: THE SAME AA BB <NA> <NA> <NA>
9: <NA> <NA> <NA> ADA OHIO OH CI
10: <NA> <NA> <NA> ADAMS MASS MA CI
11: <NA> <NA> <NA> ADAMSVILLE ALA AL CI
12: <NA> <NA> <NA> THE SAME AA BB
。
答案 0 :(得分:0)
这是因为您要求Fuzzy_full_join为您提供不匹配的名称(使用!=),然后声明确实匹配的名称和类型(使用== ==)。因此,如果所有三个都匹配,则不会显示。
您可以使用以下命令运行两次:
void Start ()
{
creatingArray(); //creating first array
}
void Update ()
{
xValueForArray = tempArray.GetComponent<arrayOfBoxes>().Xvalue; // position on x of the last element in array
spawnerVal = spawner.transform.position.x - valX; // position from which I spawn arrays of elements
if (xValueForArray < spawnerVal) // if value of xValueForArray is less than value of spawnerVal call creatingArray();
creatingArray();
}
void creatingArray()
{
int poz = Random.Range(0, obstacle.Length);
GameObject temp = Instantiate(obstacle[poz], spawner.transform.position, spawner.transform.rotation);
tempArray = temp;
}
match_fun = list(`!=`, `==`, `==`))
match_fun = list(`==`, `==`, `==`))
由reprex package(v0.2.1)于2019-03-17创建