无法通过ID

时间:2017-12-20 16:10:14

标签: r data.table

加入两个data.frames

data.table(df_del)
       KEY           place_Name
    1:  200039  BUFFALO/ROCHESTER   
    2:  200171  MILWAUKEE           
    3:  200197  PEORIA/SPRINGFLD.   
    4:  200233  OKLAHOMA CITY       
    5:  200272  LOS ANGELES      


data.table(df)
        firm_id brand_id   KEY UNITS DOLLARS       DATE
     1:     511      263  647840     1    7.29 2001-01-01
     2:     511      265  647840     2   14.58 2001-01-01
     3:     511      265  532733     1    6.39 2001-01-01
     4:      23      417  263939     1    4.79 2001-01-01
     5:      23      417  648768     5   24.45 2001-01-01

我试图通过KEY加入他们但遇到问题。 df文件大约有500,000个结果,df_del文件大约有12,000个。

df_del文件具有唯一的产品密钥,可以在同一个城市购买,因此一个城市可能有10个KEY值(即该城市的产品交付10次)

df文件也有KEY列 - 但并不总是找到它。 (当我从KEY数据框中复制随机df_del并将其粘贴到df框架的搜索中时,我有时无法获得任何结果。(这是因为我只使用快照df数据和所有df_del数据的替代方式。KEY数据框中的df数字的替代方式并粘贴到df_del中结果(有时在两个data.frames中都会出现多次)

我的问题;

当我尝试跑步时;

library(plyr)
df_test <- join(df, df_del,
     type = "left")

我获得了包含所有df个结果的500,000个结果,但在合并的place name中,我只获得了NA值,我尝试过,rightleft,{{1 }}等等。我也试过inner并得到零结果。

任何帮助都将不胜感激。

merge(df, df_del, by = "KEY")应如下所示:

df

2 个答案:

答案 0 :(得分:0)

你的KEY列没有匹配问题。这可能是因为我的例子中没有常见的数字

library(dplyr)
df_del <- data.table(KEY=c(1,2,3,4,5,6,7,8,9,10),place_name=c("NY","LONDON","PARIS","MELBOURNE","TOKYO","NY","LONDON","PARIS","MELBOURNE","TOKYO"))
df <- data.table(KEY=c(11,15,16,21,52)),UNITS=c(1,5,20,2,4))
merge(df,df_del,by="KEY")
  

清空3个cols的data.table(0行):KEY,UNITS,place_name

left_join(df,df_del,by="KEY") 
KEY UNITS place_name  
1  11     1       <NA>
2  15     5       <NA>
3  16    20       <NA>
4  21     2       <NA>
5  52     4       <NA>

答案 1 :(得分:0)

您的数据:

   library(data.table)

df <- structure(list(is = c(1, 2, 3, 4, 5), firm_id = c(511, 511, 511, 
23, 23), brand_id = c(263, 265, 265, 417, 417), KEY = c(647840, 
647840, 532733, 263939, 648768), UNITS = c(1, 2, 1, 1, 5), DOLLARS = c(7.29, 
14.58, 6.39, 4.79, 24.45), DATE = c("2001-01-01", "2001-01-01", 
"2001-01-01", "2001-01-01", "2001-01-01")), .Names = c("is", 
"firm_id", "brand_id", "KEY", "UNITS", "DOLLARS", "DATE"), 
class = c("data.table", "data.frame"), row.names = c(NA, -5L))


df_del <- structure(list(KEY = c(200039, 200171, 200197, 200233, 200272, 647840, 532733, 263939, 648768
), place_Name = c("BUFFALO/ROCHESTER", "MILWAUKEE", "PEORIA/SPRINGFLD.", 
"OKLAHOMA CITY", "LOS ANGELES", "NYC", "Los Angeles", "Chicago", "Houston")), class = c("data.table", "data.frame"), .Names = c("KEY", 
"place_Name"), row.names = c(NA, -5L))

data.table的美是其简洁的连接语法。

setkey(df, KEY)
setkey(df_del, KEY)

df_del[df]

制作您希望看到的表格

      KEY  place_Name is firm_id brand_id UNITS DOLLARS       DATE
1: 263939     Chicago  4      23      417     1    4.79 2001-01-01
2: 532733 Los Angeles  3     511      265     1    6.39 2001-01-01
3: 647840         NYC  1     511      263     1    7.29 2001-01-01
4: 647840         NYC  2     511      265     2   14.58 2001-01-01
5: 648768     Houston  5      23      417     5   24.45 2001-01-01