有时前我问过以下问题:
我有交易日和市场价值的交易清单。一切 (交易)日新的头寸进入清单,但旧的头寸从未 消失(当位置到期时,值保持不变)。该 列表如下所示:
Deal Trade_Date MktValue Desired_Col Deal1 31.08.2012 10 +10 Deal2 31.08.2012 21 +21 Deal1 03.09.2012 12 +2 Deal2 03.09.2012 19 -2 Deal3 03.09.2012 2 +2
我希望每笔交易能够与之前的交易有所不同 date(上例中的Desidered_Col)。
以下解决方案由 Roland 提供给我:
df< - read.table(text =" Deal Trade_Date MktValue Desidered_Col Deal1 31.08.2012 10 +10 Deal2 31.08.2012 21 +21 Deal1 03.09.2012 12 +2 Deal2 03.09.2012 19 -2 Deal3 03.09.2012 2 + 2",header = TRUE)
library(data.table)dt< - as.data.table(df)
diff.padded< - function(x)c(x [1],diff(x)) DT [,Desidered_Col2:= diff.padded(MktValue),由交易=]
Deal Trade_Date MktValue Desired_Col Desired_Col2 1: Deal1 31.08.2012 10 10 10 2: Deal2 31.08.2012 21 21 21 3: Deal1 03.09.2012 12 2 2 4: Deal2 03.09.2012 19 -2 -2 5: Deal3 03.09.2012 2 2 2
该解决方案与data.table完美配合。 但是考虑到我的表的大小,我决定尝试使用ffdf对象。因此,我现在在ffdf文件中的数据,我试图重现相同的解决方案,但不幸的是没有成功。 你有什么建议我如何在ffdf中重现它? 谢谢你的帮助。
这是我正在运行的完整代码:
# Load needed packages
library(RODBC)
library(data.table)
library(ETLUtils)
library(RSQLite)
library(ffbase)
calendar <- read.csv("Trading_Calendar.csv",sep=";",stringsAsFactors=FALSE)
calendar$STICHTAG <- as.Date(calendar$STICHTAG,"%d.%m.%Y")
ST_a=Sys.Date()-2
rd_a=as.Date("13.11.2012","%d.%m.%Y")
ST=paste("'",as.character(format(ST_a,"%d.%m.%Y")),"'",sep="")
rd=paste("'",as.character(format(rd_a,"%d.%m.%Y")),"'",sep="")
gc(TRUE)
st.strom <- calendar[calendar$STICHTAG>=rd_a & calendar$STICHTAG<=ST_a & calendar$BR_Strom==1,"STICHTAG"]
st.strom <- format(st.strom,"%d.%m.%Y")
st.strom.s <- paste("('",do.call(paste, c(as.list(as.character(st.strom)), sep="','")),"')",sep="")
started.at=proc.time()
Sys.sleep(1)
memory.limit(size=4095)
query <- paste("select * from is_bewertung_data where commodity in ('CASH','COAL','CO2','ELEC','GCERT')
and stichtag in ",st.strom.s,sep="")
deals.strom <- read.odbc.ffdf(query = query,odbcConnect.args=list(dsn="dsn",uid="id",pwd="pwd"),
first.rows = 100000, next.rows = 500000, VERBOSE=TRUE)
result <- ffdfdply(deals.strom, deals.strom$DEALID, FUN=function(x){
x <- split(x, x$DEALID)
x <- lapply(x, FUN=function(onlyonedeal){
onlyonedeal$Desidered_Col2 <- c(NA, -diff(onlyonedeal$STICHTAG))
onlyonedeal
})
x <- do.call(rbind, x)
x
})
cat("Finished in",timetaken(started.at),"\n")
这里是str(deals.strom [1:5,])的结果:
'data.frame': 5 obs. of 39 variables:
$ ABBREVIATION : Factor w/ 33553 levels " C 251"," TÜV EE Donaustrom",..: 1893 1892 1894 1895 1896
$ TRADEDATE : POSIXct, format: "2007-06-19" "2007-06-19" "2007-06-19" ...
$ BOOK : Factor w/ 30 levels "CR_RIR_RISKRED",..: 10 10 10 10 10
$ CONTRACT : Factor w/ 20 levels "Base","DNULL",..: 1 5 5 1 1
$ BUYSELL : Factor w/ 2 levels "BUY","SELL": 2 1 2 1 1
$ RATE : num 54.2 57.2 57.3 54.2 55.1
$ AMOUNT : num 474792 501072 501773 474792 964476
$ CUR : Factor w/ 2 levels "EUR","USD": 1 1 1 1 1
$ VOLUME : num 8760 8760 8760 8760 17520
$ UNIT : Factor w/ 2 levels "MWH","t": 1 1 1 1 1
$ STARTDATE : POSIXct, format: "2010-01-01" "2010-01-01" "2010-01-01" ...
$ ENDDATE : POSIXct, format: "2011-01-01" "2011-01-01" "2011-01-01" ...
$ BROKERAGE : num 0 0 0 0 175
$ DV : num 85078 -98218 98919 -85078 -185048
$ REALIZED : num 85078 -98218 98919 -85078 -185048
$ PV : num 0 0 0 0 0
$ DV_DAY : num 0 0 0 0 0
$ DV_MONTH : num 0 0 0 0 0
$ DV_YEAR : num 0 0 0 0 0
$ TRADER : Factor w/ 16 levels "Adolf Plentz",..: 7 7 7 7 12
$ ACTIVE : Factor w/ 2 levels "LONGTERM","SHORTTERM": 2 2 2 2 2
$ STATUS : Factor w/ 2 levels "GCPTY","INT": 1 1 2 2 1
$ PV_MIN : num 0 0 0 0 0
$ PV_PLUS : num 0 0 0 0 0
$ VERTRAGSPARTY : Factor w/ 21 levels "EDL_G059","EDL_G097",..: 10 10 3 3 10
$ GESELLSCHAFT : Factor w/ 1 level "24/7 Trading": 1 1 1 1 1
$ COMMODITY : Factor w/ 5 levels "CASH","CO2","COAL",..: 4 4 4 4 4
$ TO_BE_DELIVERED: num 0 0 0 0 0
$ ACCOUNT : Factor w/ 8 levels "CR_RISKRED","HO_COAL",..: 5 5 5 5 5
$ VERW_PREIS : num 0 0 0 0 0
$ PV_ND : num 0 0 0 0 0
$ BILANZIERUNG : Factor w/ 2 levels "JA","NEIN": 1 1 1 1 1
$ MOTIV : Factor w/ 8 levels "Emissionszertifikate",..: 4 4 4 4 4
$ STICHTAG : POSIXct, format: "2012-11-13" "2012-11-13" "2012-11-13" ...
$ DEALID : Factor w/ 59704 levels "FUX.E.EEX.K.20090622.002",..: 7175 7103 12584 12500 17985
$ COUNTERPARTY : Factor w/ 174 levels "24sieben GmbH",..: 171 171 53 53 141
$ COMMODITY2 : Factor w/ 8 levels "CASH","CER","COAL",..: 4 4 4 4 4
$ MARKTGEBIET : Factor w/ 3 levels "Kohle","Strom",..: 2 2 2 2 2
$ INSTRUMENT : Factor w/ 88 levels "-","Elektrizität FUX EEX Base Apr11 EEXFUT",..: 1 1 1 1 1
Jan解决后我的解决方案,没有工作:
test <- as.ffdf(deals.strom[,c("DEALID","STICHTAG","PV")])
test <- transform(test,chg=c(NA,diff(PV)),chg2=c(NA,-diff(PV)))
fdd <- as.ff(!duplicated(test$DEALID))
test[fdd,c("chg","chg2")] <- test[fdd,"PV"]
我收到以下错误消息:错误:is.null(rownames(x))不是TRUE 。不知怎的,我无法管理ffdf的子集。
答案 0 :(得分:1)
您好我找到了以下解决方案。它工作正常,但如果您有更优雅的解决方案,我将不胜感激。我仍然被迫在RAM中使用对象,我担心如果数据大小增加,我必须处理数据(作为解决方案甚至不那么优雅)。 数据存储在ffdf文件中。我大约有21Mio。行和39列。
deals # ffdf with 21Mio. rows and 39 columns
deals <- ffdfsort(deals)
deals <- transform(deals, delta_MktValue=0)
diff.padded <- function(x) c(x[1],diff(x))
delta <- data.table(deals[,c("Deal","Trade_Date","MktValue")])
diff <- delta[,diff.padded(MktValue),by=Deal]
deals[,"delta_MktValue"] <- diff[,V1]
rm(diff)
rm(delta)
rm(delta_PV)
gc()
它确实有效,但如果有人能提出更优雅的解决方案,我将不胜感激。特别是我想直接在ffdf中执行计算。谢谢!
答案 1 :(得分:1)
您是否在ffbase包中尝试了ffdfdply?参见例如这里有一个如何使用它的例子。 R language: problems computing "group by" or split with ff package。
所以在你的情况下做一些事情(我在这里根据你的示例脚本自由发挥,但你应该理解ffdf设置中split-apply-combine的意义)
require(ffbase)
result <- ffdfdply(deals[c("Deal","Trade_Date")], deals$Deal, FUN=function(x){
x$Deal <- as.character(x$Deal)
x <- split(x, x$Deal)
x <- lapply(x, FUN=function(onlyonedeal){
onlyonedeal$Desidered_Col2 <- c(NA, -diff(onlyonedeal$Trade_Date))
onlyonedeal
})
x <- do.call(rbind, x)
x
})
另一种解决方案是。这不会在FUN中明确使用split-apply-rbind。
require(ffbase)
require(doBy)
result <- ffdfdply(deals[c("DEALID","STICHTAG")], deals$DEALID, FUN=function(x){
x <- orderBy(~ DEALID + STICHTAG, data = x)
x$Desidered_Col2 <- c(NA, -diff(as.Date(x$STICHTAG)))
firstdealdate <- !duplicated(x$DEALID)
x$Desidered_Col2[firstdealdate] <- NA
x
})