匹配R中长度不等的日期列

时间:2017-11-09 10:53:30

标签: r matching

我的数据采用以下格式,包含3个日期列

X <- c(24/02/2016, 25/02/2016, 26/02/2016, 29/02/2016, 01/03/2016, 02/03/2016,  03/03/2016, 04/03/2016, 07/03/2016, 08/03/2016, 09/03/2016, 10/03/2016, 11/03/2016, 14/03/2016, 15/03/2016)
Y <- c(26/08/2014, 10/09/2014,24/09/2014, 09/10/2014, 24/02/2016, 09/03/2016, 24/03/2016, 11/04/2016, 26/04/2016)
Z <- c(15/08/2014,  29/08/2014, 15/09/2014, 30/09/2014, 12/02/2016, 29/02/2016, 15/03/2016, 31/03/2016, 15/04/2016)

我想要的输出如下

X                     Output
24/02/2016          12/02/2016
25/02/2016             NA
26/02/2016             NA
29/02/2016             NA
01/03/2016             NA
02/03/2016             NA
03/03/2016             NA
04/03/2016             NA
07/03/2016             NA
08/03/2016             NA
09/03/2016         29/02/2016
10/03/2016             NA
11/03/2016             NA
14/03/2016             NA
15/03/2016             NA

基本上问题是在X和Y之间存在匹配的地方,我需要在新列中对应于X的Z. 我对R不是很好,所以无法弄清楚如何提出解决方案。有什么想法吗?

3 个答案:

答案 0 :(得分:1)

您可以使用match在基础R中执行此操作,但我发现使用dplyr包和left_join更加清晰。

library(dplyr)

# make a data frame with X as a column
X.df <- data.frame(X = c("24/02/2016", "25/02/2016", "26/02/2016", "29/02/2016", "01/03/2016", "02/03/2016", "03/03/2016", "04/03/2016", "07/03/2016", "08/03/2016", "09/03/2016", "10/03/2016", "11/03/2016", "14/03/2016", "15/03/2016"), stringsAsFactors = F)

# make a data frame with Y and Z as columns
YZ.df <- data.frame(Y = c("26/08/2014", "10/09/2014", "24/09/2014", "09/10/2014", "24/02/2016", "09/03/2016", "24/03/2016", "11/04/2016", "26/04/2016"), Z = c("15/08/2014", "29/08/2014", "15/09/2014", "30/09/2014", "12/02/2016", "29/02/2016", "15/03/2016", "31/03/2016", "15/04/2016"), stringsAsFactors = F)

# do a left join, specifying variables X and Y
left_join(X.df, YZ.df, by = c("X" = "Y"))

请注意,如果Y值与X值匹配,则上面会为X创建重复的行。

答案 1 :(得分:1)

为了完整起见,这里有data.table版补充gatsky's answer

library(data.table)
data.table(Y, Z)[data.table(X), on = .(Y == X), .(X, Z)]
             X          Z
 1: 24/02/2016 12/02/2016
 2: 25/02/2016         NA
 3: 26/02/2016         NA
 4: 29/02/2016         NA
 5: 01/03/2016         NA
 6: 02/03/2016         NA
 7: 03/03/2016         NA
 8: 04/03/2016         NA
 9: 07/03/2016         NA
10: 08/03/2016         NA
11: 09/03/2016 29/02/2016
12: 10/03/2016         NA
13: 11/03/2016         NA
14: 14/03/2016         NA
15: 15/03/2016         NA

数据

Z <- c("15/08/2014", "29/08/2014", "15/09/2014", "30/09/2014", "12/02/2016", "29/02/2016", "15/03/2016", "31/03/2016", "15/04/2016")
Y <- c("26/08/2014", "10/09/2014", "24/09/2014", "09/10/2014", "24/02/2016", "09/03/2016", "24/03/2016", "11/04/2016", "26/04/2016")
X <- c("24/02/2016", "25/02/2016", "26/02/2016", "29/02/2016", "01/03/2016", "02/03/2016", "03/03/2016", "04/03/2016", "07/03/2016", "08/03/2016", "09/03/2016", "10/03/2016", "11/03/2016", "14/03/2016", "15/03/2016")

答案 2 :(得分:0)

使用匹配

package com.collections.java.basic;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

public class SortingDataWhileReading {
    public static void main(String[] args) throws IOException {
          BufferedReader br = new BufferedReader(new FileReader("E:\\BUILD\\numbers.txt"));//this file contains several double data.
            List<Double> numbers = new ArrayList<Double>();
            String line = null;

             //String line = br.readLine();

             while ((line = br.readLine()) != null) {
                 String []strNumbers = line.split(" ");
                 for(String strNumber : strNumbers){
                     numbers.add((double) Double.parseDouble(strNumber));
                 }

             }   

             br.close();

             Collections.sort(numbers);

             System.out.println("minimum value" + numbers.get(0));
             System.out.println("minimum value" + numbers.get(numbers.size() - 1));

             System.out.println(numbers);
    }
}

输出

# Construct data
Z = c("15/08/2014", "29/08/2014", "15/09/2014", "30/09/2014", "12/02/2016", "29/02/2016", "15/03/2016", "31/03/2016", "15/04/2016")
Y = c("26/08/2014", "10/09/2014", "24/09/2014", "09/10/2014", "24/02/2016", "09/03/2016", "24/03/2016", "11/04/2016", "26/04/2016")
df <- data.frame(X = c("24/02/2016", "25/02/2016", "26/02/2016", "29/02/2016", "01/03/2016", "02/03/2016", "03/03/2016", "04/03/2016", "07/03/2016", "08/03/2016", "09/03/2016", "10/03/2016", "11/03/2016", "14/03/2016", "15/03/2016"), stringsAsFactors = F)

# Match df$X to Y and return that index of Z
df$Output<-Z[match(df$X,Y)]