我正在使用SQL和R进行分析,我想加入两个表,如下所示:
表1:
ID date
a11 20150302
a11 20150302
a22 20150303
a22 20150304
a33 20150306
a44 20150306
a55 20150307
a66 20150308
a66 20150309
a66 20150310
表2
ID date
a11 20150303
a22 20150304
a22 20150305
a44 20150306
a66 20150308
a66 20150310
情况如下:客户被叫(table1),客户回电了解更多信息(表二)
所以我想在分析中做的是:
结果:
ID table1 date table2 date
a11 20150302
a11 20150302 20150303
a22 20150303 20150304
a22 20150304 20150305
a44 20150306 20150306
a66 20150308 20150308
a66 20150309
a66 20150310 20150310
对于这个多对多(但我不希望n * m作为结果,我想要1对1)匹配/加入是否有任何解决方案?将需要R或SQL中的解决方案。
由于
答案 0 :(得分:1)
ID
加入Table 2
上的两个表,然后移除Table 1
中不在ROW_NUMBER() OVER (PARTITION BY ID, Date1 ORDER BY Date2 ASC)
中的行。然后使用WHERE RowNumber = 1
匹配+-----+----------+----------+
| ID | Date1 | Date2 |
+-----+----------+----------+
| a11 | 20150302 | 20150303 |
| a22 | 20150303 | 20150304 |
| a22 | 20150304 | 20150304 |
| a44 | 20150306 | 20150306 |
| a66 | 20150308 | 20150308 |
| a66 | 20150309 | 20150310 |
| a66 | 20150310 | 20150310 |
+-----+----------+----------+
子句找到的最近日期。
生成与您列出的条件一致的输出:
#ifndef Globals_h
#define Globals_h
#endif /* Globals_h */
extern NSArray *CompetencyOne;
extern NSArray *CompetencyTwo;
extern NSArray *CompetencyThree;
extern NSArray *CompetencyFour;
extern NSArray *CompetencyFive;
extern NSArray *CompetencySix;
extern NSArray *CompetencySeven;
extern NSArray *CompetencyEight;
extern NSArray *CompetencyNine;
extern NSArray *CompetencyTen;
extern NSArray *CompetencyEleven;
extern NSArray *CompetencyTwelve;
extern NSArray *Competencies;
答案 1 :(得分:1)
我在R中使用dplyr
获得与markmanguy相同的结果。对于a22,20150304初始通话的最接近回调是20150304,而不是20150305.您需要一个时间组件来区分这一点。
library(dplyr)
inner_join(table1,table2,"ID")%>%
group_by(ID,date1)%>%
filter(date1<=date2)%>%
filter(row_number() == 1)
>
Source: local data frame [7 x 3]
Groups: ID, date1 [7]
ID date1 date2
(chr) (int) (int)
1 a11 20150302 20150303
2 a22 20150303 20150304
3 a22 20150304 20150304
4 a44 20150306 20150306
5 a66 20150308 20150308
6 a66 20150309 20150310
7 a66 20150310 20150310
数据强>
table1 <-read.table(text="ID date1
a11 20150302
a11 20150302
a22 20150303
a22 20150304
a33 20150306
a44 20150306
a55 20150307
a66 20150308
a66 20150309
a66 20150310", header=T,stringsAsFactors =F)
table2 <-read.table(text="ID date2
a11 20150303
a22 20150304
a22 20150305
a44 20150306
a66 20150308
a66 20150310", header=T,stringsAsFactors =F)
答案 2 :(得分:1)
这不解决它但是很接近并且可能会给你一个想法
<强> SqlFiddleDemo 强>
With t_left as (
SELECT *, row_number() over (partition by "ID" order by date desc ) as rn
FROM Table1 T
WHERE EXISTS (SELECT 1 FROM Table2 P WHERE T."ID" = P."ID")
),
t_right as (
SELECT *, row_number() over (partition by "ID" order by date desc) as rn
FROM Table2
)
SELECT t_left."ID", t_left."date", t_right."date"
FROM t_left
LEFT JOIN t_right
on t_left.rn = t_right.rn
and t_left."ID" = t_right."ID"
ORDER BY t_left."ID", t_left."date"
<强>输出强>
| ID | date | date |
|-----|----------|----------|
| a11 | 20150302 | 20150303 |
| a11 | 20150302 | (null) |
| a22 | 20150303 | 20150304 |
| a22 | 20150304 | 20150305 |
| a44 | 20150306 | 20150306 |
| a66 | 20150308 | (null) |
| a66 | 20150309 | 20150308 |
| a66 | 20150310 | 20150310 |