加入两个data.frames
data.table(df_del)
KEY place_Name
1: 200039 BUFFALO/ROCHESTER
2: 200171 MILWAUKEE
3: 200197 PEORIA/SPRINGFLD.
4: 200233 OKLAHOMA CITY
5: 200272 LOS ANGELES
data.table(df)
firm_id brand_id KEY UNITS DOLLARS DATE
1: 511 263 647840 1 7.29 2001-01-01
2: 511 265 647840 2 14.58 2001-01-01
3: 511 265 532733 1 6.39 2001-01-01
4: 23 417 263939 1 4.79 2001-01-01
5: 23 417 648768 5 24.45 2001-01-01
我试图通过KEY加入他们但遇到问题。 df
文件大约有500,000个结果,df_del
文件大约有12,000个。
df_del
文件具有唯一的产品密钥,可以在同一个城市购买,因此一个城市可能有10个KEY
值(即该城市的产品交付10次)
df
文件也有KEY
列 - 但并不总是找到它。 (当我从KEY
数据框中复制随机df_del
并将其粘贴到df
框架的搜索中时,我有时无法获得任何结果。(这是因为我只使用快照df
数据和所有df_del
数据的替代方式。KEY
数据框中的df
数字的替代方式并粘贴到df_del
中结果(有时在两个data.frames中都会出现多次)
我的问题;
当我尝试跑步时;
library(plyr)
df_test <- join(df, df_del,
type = "left")
我获得了包含所有df
个结果的500,000个结果,但在合并的place name
中,我只获得了NA值,我尝试过,right
,left
,{{1 }}等等。我也试过inner
并得到零结果。
任何帮助都将不胜感激。
merge(df, df_del, by = "KEY")
应如下所示:
df
答案 0 :(得分:0)
你的KEY列没有匹配问题。这可能是因为我的例子中没有常见的数字
library(dplyr)
df_del <- data.table(KEY=c(1,2,3,4,5,6,7,8,9,10),place_name=c("NY","LONDON","PARIS","MELBOURNE","TOKYO","NY","LONDON","PARIS","MELBOURNE","TOKYO"))
df <- data.table(KEY=c(11,15,16,21,52)),UNITS=c(1,5,20,2,4))
merge(df,df_del,by="KEY")
清空3个cols的data.table(0行):KEY,UNITS,place_name
left_join(df,df_del,by="KEY")
KEY UNITS place_name
1 11 1 <NA>
2 15 5 <NA>
3 16 20 <NA>
4 21 2 <NA>
5 52 4 <NA>
答案 1 :(得分:0)
您的数据:
library(data.table)
df <- structure(list(is = c(1, 2, 3, 4, 5), firm_id = c(511, 511, 511,
23, 23), brand_id = c(263, 265, 265, 417, 417), KEY = c(647840,
647840, 532733, 263939, 648768), UNITS = c(1, 2, 1, 1, 5), DOLLARS = c(7.29,
14.58, 6.39, 4.79, 24.45), DATE = c("2001-01-01", "2001-01-01",
"2001-01-01", "2001-01-01", "2001-01-01")), .Names = c("is",
"firm_id", "brand_id", "KEY", "UNITS", "DOLLARS", "DATE"),
class = c("data.table", "data.frame"), row.names = c(NA, -5L))
df_del <- structure(list(KEY = c(200039, 200171, 200197, 200233, 200272, 647840, 532733, 263939, 648768
), place_Name = c("BUFFALO/ROCHESTER", "MILWAUKEE", "PEORIA/SPRINGFLD.",
"OKLAHOMA CITY", "LOS ANGELES", "NYC", "Los Angeles", "Chicago", "Houston")), class = c("data.table", "data.frame"), .Names = c("KEY",
"place_Name"), row.names = c(NA, -5L))
data.table
的美是其简洁的连接语法。
setkey(df, KEY)
setkey(df_del, KEY)
df_del[df]
制作您希望看到的表格
KEY place_Name is firm_id brand_id UNITS DOLLARS DATE
1: 263939 Chicago 4 23 417 1 4.79 2001-01-01
2: 532733 Los Angeles 3 511 265 1 6.39 2001-01-01
3: 647840 NYC 1 511 263 1 7.29 2001-01-01
4: 647840 NYC 2 511 265 2 14.58 2001-01-01
5: 648768 Houston 5 23 417 5 24.45 2001-01-01