来自R Loop的图使用两个不同大小的数据帧

时间:2018-03-27 20:10:07

标签: r ggplot2

我有2个不同大小的数据帧 - 一个约300行,另一个约30行。尺寸将根据所选输入而变化。我已经成功构建了R代码,它将为每个输入绘制一个循环的输出,但我无法弄清楚如何将所有迭代放到一个图表上。有很多关于多个图的文章,但到目前为止我没有取得任何成功,即不同大小的数据框,在一个图表上绘制所有迭代(不同大小)(一页上不是多个图表 - 一页上有一个图表) 。下面是用于生成单个图表的代码 - 我无法弄清楚如何将它们全部放在同一个图表上....

WellS <- rep(WellSelect[i], length(EW))
WellC <- rep(WellSelect[i], length(X))
dfSurvey <- data.frame(Well = WellS, MD = MD, EW = EW, NS = NS, TVD = TVD)
dfCalc <- data.frame(Well = WellC, Perf = P, X = X, Y = Y, TVDp = TVDp)

上面的代码将此处未显示的计算编译为数据帧dfSurvey和dfCalc。请注意,“WellSelect”是驱动计算原始数据输入的主要变量。可能有2到4000多个独特的“WellSelect”可能性,每个都有2个第一句中提到的大小的数据帧 - 所有这些都是“WellSelect”独有的。除了提到的图表外,一切都有效。我试图绑定数据帧,但不知道如何在不同大小的df上进行。

pname <- paste0(dfSurvey$Well[i])
p <- ggplot() + geom_point(data = dfSurvey, aes(x=EW, y=NS), shape = 2,    size = 2, color = "blue1") +
  geom_point(data = dfCalc, aes(x=X, y=Y), shape = 17, size = 5, color = "Chartreuse3") +
  ggtitle(pname)
ggsave(paste0(pname, ".png"), p)
print(p)

请注意,“dfSurvey”是较大的数据帧,“dfCalc”较小。我很感激一些指导。

已编辑包含数据集和示例图

以下是我目前通过编码获得的情节:

Each "WellSelect" currently generates its own plot

这就是我想要实现的目标:

Combined plot, note red triangles represent "dfCalc" and solid lines are "dfSurvey"

这些链接上有“DS”和“Perf”的缩写示例数据集:

DS(请注意,绘制的变量是EW与NS): https://drive.google.com/open?id=0B5pFHCTpv6BWTUh3MWJoaVhaT0kxZzJFVWJ4QTFaM0Q5S29j

逆足: https://drive.google.com/open?id=0B5pFHCTpv6BWMjhLZnF3Zk9mM0hZaXYxLWVKUlBnWXlPQ0xB

我已经包含了下面代码的全部广度,这些代码应该使用上面的文件运行,其高潮是如前所示的各个图。我知道,代码效率不高,但我是新手,所以我只需要现在有用的东西。

library(ggplot2)

DS <- read.csv(file = "DirectionalSurveys.csv")
Perf <- read.csv(file = "Perforation.csv")

colnames(DS) <- c "IDWELL", "API", "WellName", "Division", "MD", "INCL", "AZIM", "NS", "EW", "TVD", "DLS")
colnames(Perf) <- c("IDWELL", "API", "WellName", "County", "MidPerfMD", "MidPerfTVD")

WellSelect <- c("LINDA GREATHOUSE BRK 1", "LINDA GREATHOUSE BRK 3", "LINDA GREATHOUSE BRK 5", "LINDA GREATHOUSE BRK 205",
            "BARRY GREATHOUSE A 5", "BARRY GREATHOUSE A 10", "BARRY GREATHOUSE B 3")

for(i in seq_along(WellSelect)) {

    S <- DS$MD[DS$WellName == WellSelect[i]]
    P <- Perf$MidPerfMD[Perf$WellName == WellSelect[i]]
    INCL <- DS$INCL[DS$WellName == WellSelect[i]]
    AZIM <- DS$AZIM[DS$WellName == WellSelect[i]]
    NS <- DS$NS[DS$WellName == WellSelect[i]]
    EW <- DS$EW[DS$WellName == WellSelect[i]]
    TVD <- DS$TVD[DS$WellName == WellSelect[i]]

    #Subset to get the survey depths deeper than "P"
    resultGT <- outer(S, P, '>=')
    resultGT[resultGT == FALSE] <- 50
    rownames(resultGT) <- paste0(S)
    colnames(resultGT) <- paste0("P=", P)
    minGT <- as.numeric(rownames(resultGT)[apply(resultGT , 2, which.min)])

    #P is mid-perf MD for each stage, Deep is Survey depth below P, Shallow is Survey depth above P

    deep <- S[match(minGT, S)]
    shallow <- S[match(minGT, S) - 1]

    #Subset "DS" to WellSelect
    Sub1 <- DS[DS$WellName == WellSelect[i], ]

    #Subset Sub1 to get the Survey data
    Sub2 <- Sub1[ , 5]

    #Match deep and shallow to the Survey depths to get location in DS
    deepRow <- match(deep, Sub2)
    shallowRow <- match (shallow, Sub2)

    #Pull the other data for deep and shallow from DS
    deepData <- Sub1[deepRow, ]
    shallowData <- Sub1[shallowRow, ]

    #Calculate Survey Variables

    AA29 <- 2*3.1416/360
    AY <- shallowData[ , "INCL"] + ((P - shallowData[ , "MD"]) / (shallowData[ , "MD"] - deepData[ , "MD"]) * (shallowData[ , "INCL"] - deepData[ , "INCL"] ))
    AZ <- shallowData[ , "AZIM"] + ((P - shallowData[ , "MD"]) / (shallowData[ , "MD"] - deepData[ , "MD"]) * (shallowData[ , "AZIM"] - deepData[ , "AZIM"] ))
    BA <- 0.000001 + acos(cos(AY * AA29 - shallowData[ , "INCL"] * AA29) - sin(shallowData[ , "INCL"] * AA29) * sin(AY * AA29) * (1 - cos(shallowData[ , "AZIM"] * AA29 - AZ * AA29)))
    BB <- 2 / BA * (tan(BA / 2))

    ##NOTE:  "X" and "Y" below are the plotted variables for the red triangles shown on the plots previously##

    Y <- (P - shallowData[ , "MD"]) * ((sin(AY * AA29) * cos(AZ * AA29)) + (sin(shallowData[ , "INCL"] * AA29) * cos(shallowData[ , "AZIM"] * AA29))) / 2 * BB + shallowData[ , "NS"]
    X <- (P - shallowData[ , "MD"]) * ((sin(AY * AA29) * sin(AZ * AA29)) + (sin(shallowData[ , "INCL"] * AA29) * sin(shallowData[ , "AZIM"] * AA29))) / 2 * BB + shallowData[ , "EW"]
    TVDp <- (P - shallowData[ , "MD"]) * (cos(AY * AA29) + cos(shallowData[ , "INCL"] * AA29)) / 2 * BB + shallowData[ , "TVD"]

    #***********************************************************#
    #Calculations all done, now on to the graphing process......#
    #***********************************************************#

    #fill in "WellSelect to match length of dataframe
    WellS <- rep(WellSelect[i], length(EW))
    WellC <- rep(WellSelect[i], length(X))

    #build dataframes for plots        
    dfSurvey <- data.frame(Well = WellS, MD = S, EW = EW, NS = NS, TVD = TVD)
    dfCalc <- data.frame(Well = WellC, Perf = P, X = X, Y = Y, TVDp = TVDp)
    dfSurvey <- dfSurvey[order(dfSurvey$Well, dfSurvey$MD), ]
    dfCalc <- dfCalc[order(dfCalc$Well, dfCalc$Perf), ]

    ###WORKS!!!! but just coded to save each plot and not combine
    pname <- paste0(dfSurvey$Well[i])
    p <- ggplot() + geom_point(data = dfSurvey, aes(x=EW, y=NS), shape = 2, size = 2, color = "blue1") +
    geom_point(data = dfCalc, aes(x=X, y=Y), shape = 17, size = 5, color = "Chartreuse3") + ggtitle(pname)
    ggsave(paste0(pname, ".png"), p)
    print(p)
}

希望这很有用。如果您还有其他需要,请告诉我。谢谢你的帮助!

1 个答案:

答案 0 :(得分:0)

考虑在已编译的单数据框架中绑定所有数据框,并使用ggplot group color 参数:

专门替换for循环:

for(i in seq_along(WellSelect)) {
    ...
}

使用lapply构建数据框列表并删除所有绘图线(稍后完成):

df_lists <- lapply(seq_along(WellSelect), function(i) {
    # ... same code

    # build dataframes for plots        
    dfSurvey <- data.frame(Well = WellS, MD = S, EW = EW, NS = NS, TVD = TVD)
    dfCalc <- data.frame(Well = WellC, Perf = P, X = X, Y = Y, TVDp = TVDp)
    dfSurvey <- dfSurvey[order(dfSurvey$Well, dfSurvey$MD), ]
    dfCalc <- dfCalc[order(dfCalc$Well, dfCalc$Perf), ]

    return(list(dfSurvey, dfCalc))   
}

# COMPILED DATAFRAMES
dfSurveyAll <- do.call(rbind, lapply(df_lists, "[[", 1))

dfCalcAll <- do.call(rbind, lapply(df_lists, "[[", 2))

然后使用 group color 参数运行一个单一的图

p <- ggplot() + 
       geom_point(data = dfSurveyAll, aes(x=EW, y=NS, group="Well", colour="Well"), 
                 shape = 2, size = 2) +
       geom_point(data = dfCalcAll, aes(x=X, y=Y,  group="Well", colour="Well"), 
                  shape = 17, size = 5) + ggtitle(pname)    
p

当你通过 WellName 因子对 DS 数据框进行子集化时,甚至可以使用by。所以在for循环中的块下面:

for(i in seq_along(WellSelect)) {
    S <- DS$MD[DS$WellName == WellSelect[i]]
    P <- Perf$MidPerfMD[Perf$WellName == WellSelect[i]]
    INCL <- DS$INCL[DS$WellName == WellSelect[i]]
    AZIM <- DS$AZIM[DS$WellName == WellSelect[i]]
    NS <- DS$NS[DS$WellName == WellSelect[i]]
    EW <- DS$EW[DS$WellName == WellSelect[i]]
    TVD <- DS$TVD[DS$WellName == WellSelect[i]]
    ...
    Sub1 <- DS[DS$WellName == WellSelect[i], ]
    ...
    WellS <- rep(WellSelect[i], length(EW))
    WellC <- rep(WellSelect[i], length(X)
    ...
}

可以替换为by,其参数 sub 是一个子集化的数据帧,但 Perf (一个单独的数据帧)除外。在这里,by返回两个数据文件的内部列表的命名列表或上面的lapply等效结构。

df_lists <- by(DS, DS$WellName, FUN=function(sub) {

    S <- sub$MD
    P <- Perf$MidPerfMD[Perf$WellName == sub$WellName[1]]
    INCL <- sub$INCL
    AZIM <- sub$AZIM
    NS <- sub$NS
    EW <- sub$EW
    TVD <- sub$TVD

    ...
    Sub1 <- sub
    ...
    WellS <- rep(sub$WellName[1], length(EW))
    WellC <- rep(sub$WellName[1], length(X)

    # build dataframes for plots        
    # ... same as lapply above

})