如何基于一个表的范围值合并两个数据框

时间:2018-07-27 17:51:33

标签: r

DF1

function basvuru_ekle($data)
    {

    $this->db->insert($this->table, $data);
    return $this->db->insert_id();
    }

DF2

SIC     Value
350     100
460     500
140     200
290     400
506     450

注意:SIC1的类具有字符,我们需要转换为数字范围

我正在尝试获取如下所示的输出

所需的输出:

DF3

SIC1          AREA 
100-200      Forest
201-280      Hospital
281-350      Education
351-450      Government
451-550      Land

我首先尝试将SIC1的字符类转换为数字 然后尝试合并,但没有运气,有人可以对此进行指导吗?

2 个答案:

答案 0 :(得分:3)

我们可以进行非股权加入。将'DF2'中的'SIC1'列拆分为(tstrsplit到数字列,然后对第一个数据集进行非等值连接。

library(data.table)
setDT(DF2)[, c('start', 'end') := tstrsplit(SIC1, '-', type.convert = TRUE)]
DF2[, -1, with = FALSE][DF1, on = .(start <= SIC, end >= SIC), 
        mult = 'last'][, .(SIC = start, Value, AREA)]
#  SIC Value      AREA
#1: 350   100 Education
#2: 460   500      Land
#3: 140   200    Forest
#4: 290   400 Education
#5: 506   450      Land

或者如@Frank所述,我们可以进行滚动连接以提取“ AREA”并在第一个数据集上进行更新

setDT(DF1)[, AREA := DF2[DF1, on=.(start = SIC), roll=TRUE, x.AREA]]

数据

DF1 <- structure(list(SIC = c(350L, 460L, 140L, 290L, 506L), Value = c(100L, 
500L, 200L, 400L, 450L)), .Names = c("SIC", "Value"), 
 class = "data.frame", row.names = c(NA, -5L))

DF2 <- structure(list(SIC1 = c("100-200", "201-280", "281-350", "351-450", 
"451-550"), AREA = c("Forest", "Hospital", "Education", "Government", 
"Land")), .Names = c("SIC1", "AREA"), class = "data.frame",
 row.names = c(NA, -5L))

答案 1 :(得分:3)

可以选择将tidyr::separatesqldf一起使用,以将两个表连接到值的范围。

library(sqldf)
library(tidyr)

DF2 <- separate(DF2, "SIC1",c("Start","End"), sep = "-")

sqldf("select DF1.*, DF2.AREA from DF1, DF2 
      WHERE DF1.SIC between DF2.Start AND DF2.End")

#   SIC Value      AREA
# 1 350   100 Education
# 2 460   500       Lan
# 3 140   200    Forest
# 4 290   400 Education
# 5 506   450       Lan

数据:

DF1 <- read.table(text =
"SIC     Value
350     100
460     500
140     200
290     400
506     450",
header = TRUE, stringsAsFactors = FALSE)

DF2 <- read.table(text =
"SIC1          AREA
100-200      Forest
201-280      Hospital
281-350      Education
351-450      Government
451-550      Lan",
header = TRUE, stringsAsFactors = FALSE)