Question

我有两个csv文件，我正在使用R-

https://drive.google.com/open?id=1CSLDs9qQXPMqMegdsWK2cQI_64B9org7

https://drive.google.com/open?id=1mVp1s0m4OZNNctVBn5JXIYK1JPsp-aiw

从文件中可以看出，每个文件都有一个从2008年到现在的日期列表以及其他列。

我希望我的输出是两个文件，但两者都应该包含两个文件中存在的日期的数据行。

例如。假设日期X不存在于1个文件中，那么它应该从存在它的其他文件中删除。只有两列上的日期和相应的行应该在两个输出文件中都存在。

我在dplyr库中尝试了inner_join函数，但由于日期是因子格式，因此无效。

Answer 1

您可以通过添加stringAsFactors = F来避免字符串的因子转换。此外，在您的数据集中，您将NA编码为字符串null，因此您还应在调用read.csv

时指定此字符串

path1 <- "the path for the first dataset KS"
path2 <- "the path for the second dataset 105560.KS"
df1 <- read.csv(path1,stringsAsFactors = F)
df2 <- read.csv(path2,stringsAsFactors = F,na.strings = "null")

df_comb <- inner_join(df1,df2,by = "Date")

如果两个列中的列具有公共条目，则比较两个数据帧

1 个答案: