使用R

时间:2018-10-18 11:45:12

标签: r

我正在尝试比较两个文本文件并列出任何日志文件中的差异,为此,我在下面的命令中使用了“ diffr”库,但是比较结果显示在R studios的“查看器控制台”选项卡上。谁能帮我写出更好的代码来比较文本文件并列出差异?

另外,如果要在一个循环中比较文件,该怎么办,因为我在不同的环境中为同一查询保存了多个文件?

代码:

library(diffr)

setwd("C:/Users/squraishi/Desktop/OnDemand/R_ExtractDataSnapshot/Results")

prod_file <- read.csv2(file = "F_Query_Prod_7 .txt", header = TRUE, sep = "")
beta_file <- read.csv2(file = "F_Query_Beta_7 .txt", header = TRUE, sep = "")

diffr("F_Query_Prod_7 .txt", "F_Query_Beta_7 .txt", contextSize = 0, minJumpSize = 500)

1 个答案:

答案 0 :(得分:2)

该HTML小部件软件包不会给您返回输出,但是它基于javascript library上的based上的python module

我们将使用Python版本,但我们将不会使用reticulate包b / c我不会展示如何迭代R中的Python结构,因此我们将从Python页面获取有关脚本的指针,该指针位于Tools/scripts/diff.py,并从github获取该脚本,以避免尝试在您的系统上找到它。这确实意味着需要安装python。准确地说,是Python 3(因为这是一个脆弱的,零散的生态系统)。

tf <- tempfile(fileext = ".py")
on.exit(unlink(tf), add = TRUE)
writeLines(
  readLines("https://raw.githubusercontent.com/python/cpython/master/Tools/scripts/diff.py"),
  tf
)

现在,我们将在您的系统上找到python3二进制文件,并在您的系统上找到pip3二进制文件:

python <- Sys.which("python3")
pip <- Sys.which("pip3")

并确保已安装了一个非常关键的模块,该模块应始终安装,但是python是如此愚蠢,并非如此:

# just in case you don't have it
system2(command = pip, args = c("install", "datetime"))

现在对我的两个组成文件运行差异:

system2(
  command = python, 
  args = c(
    tf, 
    path.expand("~/Data/so.txt"), 
    path.expand("~/Data/so1.txt")
  ),
  stdout = TRUE
) -> res

并查看您现在需要解析的输出:

res
##  [1] "*** /Users/bob/Data/so.txt\t2018-10-15T06:38:07.169832-04:00" 
##  [2] "--- /Users/bob/Data/so1.txt\t2018-10-18T08:50:51.745551-04:00"
##  [3] "***************"                                              
##  [4] "*** 6,29 ****"                                                
##  [5] "  QX = X-ray|NRW"                                             
##  [6] "  UI = Q000000981"                                            
##  [7] "  "                                                           
##  [8] "- *NEWRECORD"                                                 
##  [9] "- RECTYPE = Q"                                                
## [10] "- SH = analogs & derivatives"                                 
## [11] "- QE = ANALOGS"                                               
## [12] "- QA = AA"                                                    
## [13] "- QT = 1"                                                     
## [14] "- "                                                           
## [15] "- *NEWRECORD"                                                 
## [16] "- RECTYPE = Q"                                                
## [17] "- SH = abnormalities"                                         
## [18] "- QE = ABNORM"                                                
## [19] "- QX = agenesis|NRW"                                          
## [20] "- QX = anomalies|EQV"                                         
## [21] "- QX = aplasia|NRW"                                           
## [22] "- QX = atresia|NRW"                                           
## [23] "- QX = birth defects|NRW"                                     
## [24] "- QX = congenital defects|NRW"                                
## [25] "- QX = defects|NRW"                                           
## [26] "- QX = deformities|NRW"                                       
## [27] "- QX = hypoplasia|NRW"                                        
## [28] "- UI = Q000002"                                               
## [29] "--- 6,8 ----"    

已经完成了所有^^操作,您也可以只使用tools::Rdiff()

(res <- tools::Rdiff("~/Data/so.txt", "~/Data/so1.txt", Log=TRUE))
## $status
## [1] 1
## 
## $out
##  [1] "files differ in number of lines" "9,29d8"                         
##  [3] "< *NEWRECORD"                    "< RECTYPE = Q"                  
##  [5] "< SH = analogs & derivatives"    "< QE = ANALOGS"                 
##  [7] "< QA = AA"                       "< QT = 1"                       
##  [9] "< "                              "< *NEWRECORD"                   
## [11] "< RECTYPE = Q"                   "< SH = abnormalities"           
## [13] "< QE = ABNORM"                   "< QX = agenesis|NRW"            
## [15] "< QX = anomalies|EQV"            "< QX = aplasia|NRW"             
## [17] "< QX = atresia|NRW"              "< QX = birth defects|NRW"       
## [19] "< QX = congenital defects|NRW"   "< QX = defects|NRW"             
## [21] "< QX = deformities|NRW"          "< QX = hypoplasia|NRW"          
## [23] "< UI = Q000002"                 

但我想先展示曲折的路径:-)