我想在xdf文件中添加一个新列。我在rxDatastep中测试了transforms
和transformFunc
。
这行代码对我来说很好用:
rxDataStep(nyc_jan_xdf,transforms = list(newCol5=ifelse(payment_type==1,10,20)))
但如果我使用transformFunc:
CashVsCard<-function(x)
{
if(x$payment_type==1){
x$newCol13=10
} else {
x$newCol13=20
}
return(x)
}
rxDataStep(nyc_jan_xdf,transformFunc = CashVsCard)
它不起作用并返回此错误:
Error in doTryCatch(return(expr), name, parentenv, handler) :
The variable 'newCol13' has a different number of rows than other columns in the data: 1 vs. 10
In addition: Warning message:
In if (x$payment_type == 1) { :
the condition has length > 1 and only the first element will be used
为什么transformFunc
不起作用?
我的数据示例:
structure(list(VendorID = c(2L, 2L, 2L, 1L, 1L, 1L), tpep_pickup_datetime = c("2016-01-01 00:00:00",
"2016-01-01 00:00:00", "2016-01-01 00:00:03", "2016-01-01 00:00:04",
"2016-01-01 00:00:05", "2016-01-01 00:00:06"), tpep_dropoff_datetime = c("2016-01-01 00:00:00",
"2016-01-01 00:00:00", "2016-01-01 00:15:49", "2016-01-01 00:14:32",
"2016-01-01 00:14:27", "2016-01-01 00:04:44"), passenger_count = c(5L,
1L, 6L, 1L, 2L, 1L), trip_distance = c(4.90000009536743, 10.539999961853,
2.4300000667572, 3.70000004768372, 2.20000004768372, 1.70000004768372
), pickup_longitude = c(-73.9807815551758, -73.9845504760742,
-73.9693298339844, -74.0043029785156, -73.9919967651367, -73.9821014404297
), pickup_latitude = c(40.7299118041992, 40.6795654296875, 40.7635383605957,
40.7422409057617, 40.718578338623, 40.7746963500977), RatecodeID = c(1L,
1L, 1L, 1L, 1L, 1L), store_and_fwd_flag = c("N", "N", "N", "N",
"N", "Y"), dropoff_longitude = c(-73.9444732666016, -73.9502716064453,
-73.9956893920898, -74.0073623657227, -74.0051345825195, -73.9709396362305
), dropoff_latitude = c(40.7166786193848, 40.7889251708984, 40.7442512512207,
40.7069358825684, 40.7399444580078, 40.7967071533203), payment_type = c(1L,
1L, 1L, 1L, 1L, 1L), fare_amount = c(18, 33, 12, 14, 11, 7),
extra = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5), mta_tax = c(0.5,
0.5, 0.5, 0.5, 0.5, 0.5), tip_amount = c(0, 0, 3.99000000953674,
3.04999995231628, 1.5, 1.64999997615814), tolls_amount = c(0,
0, 0, 0, 0, 0), improvement_surcharge = c(0.300000011920929,
0.300000011920929, 0.300000011920929, 0.300000011920929,
0.300000011920929, 0.300000011920929), total_amount = c(19.2999992370605,
34.2999992370605, 17.2900009155273, 18.3500003814697, 13.8000001907349,
9.94999980926514)), .Names = c("VendorID", "tpep_pickup_datetime",
"tpep_dropoff_datetime", "passenger_count", "trip_distance",
"pickup_longitude", "pickup_latitude", "RatecodeID", "store_and_fwd_flag",
"dropoff_longitude", "dropoff_latitude", "payment_type", "fare_amount",
"extra", "mta_tax", "tip_amount", "tolls_amount", "improvement_surcharge",
"total_amount"), row.names = c(NA, 6L), class = "data.frame")
答案 0 :(得分:0)
我找到了它。它不是最好的解决方案,但它有效。我应该只改变这样的功能:
CashVsCard<-function(x)
{
p<-length(x$payment_type)
for(i in 1: p)
{
if(x$payment_type[i]==1)
{
x$cash_vs_Card4[i]="Card"
} else {
x$cash_vs_Card4[i]="Others"
}
}
return(x)
}