在查找表中搜索或匹配来自数据集的特定字段的属性的单词,并将查找表中的值分配给R中的数据集

时间:2017-03-07 17:19:41

标签: mysql r

数据集1

Location              State
Haldwani              Uttarakhand
Bangalore             Karnataka
vishakhapatnam        Andhra Pradesh
Mumbai                Maharashtra

查找

text  Location                     State
txt1  Haldwani, India              Uttarakhand
txt2  Vishakhapatnam, India        Andhra Pradesh
txt3  Bangalore Nagarcoil          Karnataka
txt4  India                        NA
Txt5  Dadar, Navi Mumbai           Maharashtra

我希望结果如下:

for

在查找表的位置和状态字段中搜索数据集1中与txt1对应的位置字段的每个单词,并仅将相应的状态分配给dataset1。对整个数据集重复此操作1.我尝试使用匹配和连接,但它不起作用。我认为 TextBlock textBlock = new TextBlock(); textBlock.Text = "NR valve"; Size msrSize = new Size(100, 200); textBlock.Measure(msrSize); Size dsrdSize = textBlock.DesiredSize; 循环可以正常工作。

1 个答案:

答案 0 :(得分:0)

如果数据集不是太大而无法阻碍内存,请考虑使用交叉联接,然后考虑使用%in%grepl()的过滤器:

# ASSIGN JOIN KEYS
dataset$key = 1
lookupdf$key = 1

# CROSS JOIN
newdataset <- merge(dataset, lookupdf, by="key")
# FILTER
newdataset <- newdataset[newdataset$Location.y %in% newdataset$Location.x,
                         c("text", "Location.x", "State")]
# OR newdataset <- newdataset[grepl(newdataset$Location.y, newdataset$Location.x),
#                             c("text", "Location.x", "State")]

# RENAME COLUMNS
names(newdataset) <- c("text", "Location", "State")

# RE-MERGE FOR NA MATCHES
newdataset <- merge(dataset, newdataset, by=c("text", "Location"), all.x=TRUE)

或者,在SQL(MySQL方言)中:

SELECT c.text, c.Location, c.State
FROM
  dataset d
LEFT JOIN
  (SELECT t.text, l.Location, l.State
   FROM dataset t
   CROSS JOIN lookuptbl l
   WHERE t.Location LIKE CONCAT('%',l.Location,'%') c
ON d.text = c.text AND d.Location = c.Location