使用R处理CSV以评估是否((ColA!= ColB)考虑ColC

时间:2015-05-14 17:49:37

标签: r csv string-comparison

我正在尝试跨两列实现简单的字符串比较。 (模拟)数据的样本:

EMPLID,From_DeptCode,FromDept,To_DeptCode,To_Dept,TransactionTypeCode,TransactionType,EffectiveDate,ChangeType
0239583290,21,Sales,43,CustomerService,10,Promotion,12/12/2012
1230495829,21,Sales,21,Sales,10,Promotion,9/1/2013
4059503918,93,Operations,93,Operations,10,Demotion,11/18/2014
3040593021,19,Headquarters,23,International,11,Reorg,12/13/2011
7029406920,15,Marketing,84,Development,19,Reassignment,01/05/2010
2039052819,19,Headquarters,19,Headquarters,10,Promotion,4/15/2015

我想要使用的逻辑是:

If From_DeptCode = To_DeptCode 
      then ChangeType="No Change" 
ElseIf From_DeptCode != To_DeptCode AND TransactionType = "Reorg" 
      then ChangeType="Reorg"
Else ChangeType="Transfer"

所以我的输出看起来像是:

EMPLID,From_DeptCode,FromDept,To_DeptCode,To_Dept,TransactionTypeCode,TransactionType,EffectiveDate,ChangeType
0239583290,21,Sales,43,CustomerService,10,Promotion,12/12/2012,Transfer
1230495829,21,Sales,21,Sales,10,Promotion,9/1/2013,No Change
4059503918,93,Operations,93,Operations,10,Demotion,11/18/2014,No Change
3040593021,19,Headquarters,23,International,11,Reorg,12/13/2011,Reorg
7029406920,15,Marketing,84,Development,19,Reassignment,01/05/2010,Transfer
2039052819,19,Headquarters,19,Headquarters,10,Promotion,4/15/2015,No Change

这是我目前所知道的:

transfers <- read.csv(file="Transfers.csv", head=TRUE,
    sep=",",colClasses=c(NA,NA,NA,NA,NA,NA,NA,"Date",NA))

此时,我认为,我会实现我的逻辑:

If From_DeptCode = To_DeptCode 
      then ChangeType="No Change" 
ElseIf From_DeptCode != To_DeptCode AND TransactionType = "Reorg" 
      then ChangeType="Reorg"
Else ChangeType="Transfer"

我认为在这里我会写出我的新csv     write.csv(transfers,file =“transfersprocessed.csv”,row.names = FALSE)

关于完成其余部分的任何建议吗?

更新

@josilber的回答,我运行了以下代码:

transfers <- read.csv(file="Transfers.csv", head=TRUE, sep=",", colClasses=c(NA,NA,NA,NA,NA,NA,NA,"Date",NA))

dat$ChangeType <- ifelse(dat$From_DeptCode == dat$To_DeptCode, "No Change",ifelse(dat$TransactionType == "Reorg", "Reorg", "Transfer"))

View(transfers)

关于以下数据:

EMPLID,From_DeptCode,FromDept,To_DeptCode,To_Dept,TransactionTypeCode,TransactionType,EffectiveDate,ChangeType
0239583290,21,Sales,43,CustomerService,10,Promotion,12/12/2012
1230495829,21,Sales,21,Sales,10,Promotion,9/1/2013
4059503918,93,Operations,93,Operations,10,Demotion,11/18/2014
3040593021,19,Headquarters,23,International,11,Reorg,12/13/2011
7029406920,15,Marketing,84,Development,19,Reassignment,01/05/2010
2039052819,19,Headquarters,19,Headquarters,10,Promotion,4/15/2015

ChangeType变量仍为“NA”。

嵌套的ifelse语句语法是否正确?知道为什么ChangeType不起作用吗?

1 个答案:

答案 0 :(得分:3)

您可以使用嵌套的ifelse语句执行此操作:

dat$ChangeType <- ifelse(dat$From_DeptCode == dat$To_DeptCode, "No Change",
                         ifelse(dat$TransactionType == "Reorg", "Reorg", "Transfer"))
dat
#       EMPLID From_DeptCode     FromDept To_DeptCode         To_Dept TransactionTypeCode
# 1  239583290            21        Sales          43 CustomerService                  10
# 2 1230495829            21        Sales          21           Sales                  10
# 3 4059503918            93   Operations          93      Operations                  10
# 4 3040593021            19 Headquarters          23   International                  11
# 5 7029406920            15    Marketing          84     Development                  19
# 6 2039052819            19 Headquarters          19    Headquarters                  10
#   TransactionType EffectiveDate ChangeType
# 1       Promotion    12/12/2012   Transfer
# 2       Promotion      9/1/2013  No Change
# 3        Demotion    11/18/2014  No Change
# 4           Reorg    12/13/2011      Reorg
# 5    Reassignment    01/05/2010   Transfer
# 6       Promotion     4/15/2015  No Change

ifelse传递一个TRUE / FALSE值的向量作为其第一个参数,使用第二个参数表示TRUE个案,并使用第三个参数表示FALSE个案。对于您的错误情况,您实际上想要运行另一个ifelse,这就是逻辑嵌套在这里的原因。

请注意,对于大型数据框,这比循环遍历数据并一次一行地执行嵌套if语句要快得多。