rxDatastep中的transforms和transformFunc的结果不同

时间:2017-05-15 09:05:47

标签: r microsoft-r

我想在xdf文件中添加一个新列。我在rxDatastep中测试了transformstransformFunc

这行代码对我来说很好用:

rxDataStep(nyc_jan_xdf,transforms = list(newCol5=ifelse(payment_type==1,10,20)))

但如果我使用transformFunc:

CashVsCard<-function(x)
{
  if(x$payment_type==1){
    x$newCol13=10
  } else {
    x$newCol13=20
  }
  return(x)
}
rxDataStep(nyc_jan_xdf,transformFunc = CashVsCard)

它不起作用并返回此错误:

Error in doTryCatch(return(expr), name, parentenv, handler) : 
  The variable 'newCol13' has a different number of rows than other columns in the data: 1 vs. 10
In addition: Warning message:
In if (x$payment_type == 1) { :
  the condition has length > 1 and only the first element will be used

为什么transformFunc不起作用?

我的数据示例:

structure(list(VendorID = c(2L, 2L, 2L, 1L, 1L, 1L), tpep_pickup_datetime = c("2016-01-01 00:00:00", 
"2016-01-01 00:00:00", "2016-01-01 00:00:03", "2016-01-01 00:00:04", 
"2016-01-01 00:00:05", "2016-01-01 00:00:06"), tpep_dropoff_datetime = c("2016-01-01 00:00:00", 
"2016-01-01 00:00:00", "2016-01-01 00:15:49", "2016-01-01 00:14:32", 
"2016-01-01 00:14:27", "2016-01-01 00:04:44"), passenger_count = c(5L, 
1L, 6L, 1L, 2L, 1L), trip_distance = c(4.90000009536743, 10.539999961853, 
2.4300000667572, 3.70000004768372, 2.20000004768372, 1.70000004768372
), pickup_longitude = c(-73.9807815551758, -73.9845504760742, 
-73.9693298339844, -74.0043029785156, -73.9919967651367, -73.9821014404297
), pickup_latitude = c(40.7299118041992, 40.6795654296875, 40.7635383605957, 
40.7422409057617, 40.718578338623, 40.7746963500977), RatecodeID = c(1L, 
1L, 1L, 1L, 1L, 1L), store_and_fwd_flag = c("N", "N", "N", "N", 
"N", "Y"), dropoff_longitude = c(-73.9444732666016, -73.9502716064453, 
-73.9956893920898, -74.0073623657227, -74.0051345825195, -73.9709396362305
), dropoff_latitude = c(40.7166786193848, 40.7889251708984, 40.7442512512207, 
40.7069358825684, 40.7399444580078, 40.7967071533203), payment_type = c(1L, 
1L, 1L, 1L, 1L, 1L), fare_amount = c(18, 33, 12, 14, 11, 7), 
    extra = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5), mta_tax = c(0.5, 
    0.5, 0.5, 0.5, 0.5, 0.5), tip_amount = c(0, 0, 3.99000000953674, 
    3.04999995231628, 1.5, 1.64999997615814), tolls_amount = c(0, 
    0, 0, 0, 0, 0), improvement_surcharge = c(0.300000011920929, 
    0.300000011920929, 0.300000011920929, 0.300000011920929, 
    0.300000011920929, 0.300000011920929), total_amount = c(19.2999992370605, 
    34.2999992370605, 17.2900009155273, 18.3500003814697, 13.8000001907349, 
    9.94999980926514)), .Names = c("VendorID", "tpep_pickup_datetime", 
"tpep_dropoff_datetime", "passenger_count", "trip_distance", 
"pickup_longitude", "pickup_latitude", "RatecodeID", "store_and_fwd_flag", 
"dropoff_longitude", "dropoff_latitude", "payment_type", "fare_amount", 
"extra", "mta_tax", "tip_amount", "tolls_amount", "improvement_surcharge", 
"total_amount"), row.names = c(NA, 6L), class = "data.frame")

1 个答案:

答案 0 :(得分:0)

我找到了它。它不是最好的解决方案,但它有效。我应该只改变这样的功能:

CashVsCard<-function(x)
{

  p<-length(x$payment_type)   
  for(i in 1: p)
  {

    if(x$payment_type[i]==1)
    {
      x$cash_vs_Card4[i]="Card"
    }   else    {
      x$cash_vs_Card4[i]="Others"
    }
  }
  return(x)
}