Question

以下是数据的外观：

rc

这是我的代码：

         V1       V2   V3      V4           V5 V6   V7
1 1002000000 20180317 1PAC 000000000011000+ 33 33 6171985
2 1002000001 20050202 1PRM 000000000017376+ 20 20 7011985
3 1002000001 20050503 1PRM 000000000017376+ 20 20 7011985
4 1002000001 20050803 1PRM 000000000017376+ 21 21 7011985
5 1002000001 20051031 1PRM 000000000017376+ 21 21 7011985
6 1002000001 20060130 1PRM 000000000017376+ 21 21 7011985

我试图将其放入一种方法中，因为我必须以相同的方式清理多个文件。我把它放在这样的方法中：

ULtrans <- read.table("ULTRANS.txt", sep = "", header = F)
ULtrans <- ULtrans[,-c(5,6)] #remove unused columns
names(ULtrans) <- c("pol_num", "trans_date", "type", "amt", "iss_date")
convert_amt <- function(x){
  as.integer(substr(x,1,nchar(x)-1))*ifelse(substr(x,nchar(x),nchar(x))=="-",-1,1)
}
ULtrans$amt <- convert_amt(as.character(ULtrans$amt))
ULtrans$trans_date <- ymd(ULtrans$trans_date) 
ULtrans$iss_date <- mdy(ULtrans$iss_date)

当我尝试使用该方法时，会给我一条警告消息。 “所有格式都无法解析。找不到格式。”当我不使用该方法时，我会得到清理日期和交易金额。

Answer 1

您的代码中有两个问题。首先，您使用了错误的语法来删除不必要的列。其次，您不会在最后返回干净的data.frame，因此它仅存在于函数环境中。

我已将您的示例数据粘贴到文件中。这是一个建议：由于您要清理许多具有相同格式的文件，因此可以将read.table放入函数中。您也可以在同一行中选择所需的列。

read.and.clean <- function(file) {

  require('lubridate')

  ULtrans <- read.table(file)[c(1:4,7)]
  names(ULtrans) <- c("pol_num", "trans_date", "type", "amt", "iss_date")

  convert_amt <- function(x){
    as.integer(substr(x,1,nchar(x)-1))*ifelse(substr(x,nchar(x),nchar(x))=="-",-1,1)
  }

  ULtrans$amt <- convert_amt(as.character(ULtrans$amt))
  ULtrans$trans_date <- ymd(ULtrans$trans_date) 
  ULtrans$iss_date <- mdy(ULtrans$iss_date)

  return(ULtrans)
}

df <- read.and.clean("example.txt")

> df
     pol_num trans_date type   amt   iss_date
1 1002000000 2018-03-17 1PAC 11000 1985-06-17
2 1002000001 2005-02-02 1PRM 17376 1985-07-01
3 1002000001 2005-05-03 1PRM 17376 1985-07-01
4 1002000001 2005-08-03 1PRM 17376 1985-07-01
5 1002000001 2005-10-31 1PRM 17376 1985-07-01
6 1002000001 2006-01-30 1PRM 17376 1985-07-01

OBS：总是很高兴通知您的示例正常工作所需的任何其他软件包。

放入函数后，计算将在r中停止工作

1 个答案: