我有一个棘手的打印输出格式,我试图获得。这是我的数据帧,它由for循环和rbind构建。
bets<- data.frame(status=character(), f_name=character(), d_name=character(), type_bet=character(), sec=character(),
spread=character(), total=character(), deriv=character(), book=character(), edge=character(),
my_f_price=character(), book_f_price=character(), my_d_price=character(), book_d_price=character())
打印样本:
status f_name d_name type_bet sec spread total deriv book edge my_f_price book_f_price my_d_price book_d_price
9:00 PM ET San Diego State Colorado State total h1 3.5 138.5 65 pin 12 120 -108 -120 -108
9:00 PM ET San Diego State Colorado State total h1 3.5 138.5 65 5d 10 120 -110 -120 -110
6:00 PM ET Cincinnati SMU total h1 8 125.5 59 pin 9 122 -103 -122 -113
8:00 PM ET Temple Rutgers total h1 1.5 150 70.5 pin 8 116 -108 -116 -108
8:00 PM ET Temple Rutgers total h1 1.5 150 70.5 5d 6 116 -110 -116 -110
8:05 PM ET Drake Evansville ml h1 7 136 0 5d 4 -214 -210 214 175
8:00 PM ET Northern Iowa Bradley total h1 12 133 62 5d 3 113 -110 -113 -110
6:00 PM ET Cincinnati SMU ml h1 8 125.5 0 5d 2 -242 -240 242 200
6:00 PM ET Cincinnati SMU total h1 8 125.5 58.5 5d 2 112 -110 -112 -110
它有点难以看到,但边缘栏是如何订购的,12,10,9,8,6,4,3,2,2。我想做的是虽然将一些条目组合在一起当f_name,d_name,type_bet和sec都相同,并且唯一不同的列是书时,应将其视为一个组。理想情况下,我希望打印输出看起来像这样:
status f_name d_name type_bet sec spread total deriv book edge my_f_price book_f_price my_d_price book_d_price
9:00 PM ET San Diego State Colorado State total h1 3.5 138.5 65 pin 12 120 -108 -120 -108
9:00 PM ET San Diego State Colorado State total h1 3.5 138.5 65 5d 10 120 -110 -120 -110
6:00 PM ET Cincinnati SMU total h1 8 125.5 59 pin 9 122 -103 -122 -113
6:00 PM ET Cincinnati SMU total h1 8 125.5 58.5 5d 2 112 -110 -112 -110
8:00 PM ET Temple Rutgers total h1 1.5 150 70.5 pin 8 116 -108 -116 -108
8:00 PM ET Temple Rutgers total h1 1.5 150 70.5 5d 6 116 -110 -116 -110
8:05 PM ET Drake Evansville ml h1 7 136 0 5d 4 -214 -210 214 175
8:00 PM ET Northern Iowa Bradley total h1 12 133 62 5d 3 113 -110 -113 -110
6:00 PM ET Cincinnati SMU ml h1 8 125.5 0 5d 2 -242 -240 242 200
现在我能想到的唯一方法就是逐行打印到txt文件,循环遍历数据框(按边列排序),对于每个条目,我可以在数据框的其余部分搜索另一个条目同样的f_name,d_name,type_bet,sec和打印也是如此,并从数据帧中删除它。但我认为还有更好的方法吗?
答案 0 :(得分:0)
我正在使用自己的数据框,因为它比处理上面的文本字符串要少。
假设您想要组成一个组的变量被称为formGroupVarX
(在您的情况下为“f_name”,“d_name”,“type_bet”,“sec”)以及该FreeVarX
之外的变量(所有其他变量)然后你可以显示如下:
formGroupVars = c("formGroupVar1","formGroupVar2","formGroupVar3")
freeVars = c("FreeVar1")
frameToShow <- data.frame(cbind(sample(LETTERS[1:3],20,replace=TRUE),sample(LETTERS[4:6],20,replace=TRUE),
sample(LETTERS[7:9],20,replace=TRUE),sample(letters,20,replace=TRUE) ))
colnames(frameToShow) = c(formGroupVars,freeVars)
frameToShow[order(apply(frameToShow,1,function(X) { paste(X[formGroupVars],collapse="") } )),]
基本上,您创建一个临时因子级别,该级别由要组成组的所有变量的函数组成,并在该临时因子上排序显示。在你和我的例子中,一个简单的值连接可以解决这个问题,但理论上这个函数可以是一个数学函数或任何其他函数。
答案 1 :(得分:0)
您的示例数据(您可以使用dput(yourData)来生成此示例 - 使其更容易提供帮助)
df <- structure(list(status = c("9:00 PM ET", "9:00 PM ET", "6:00 PM ET",
"8:00 PM ET", "8:00 PM ET", "8:05 PM ET", "8:00 PM ET", "6:00 PM ET",
"6:00 PM ET"), f_name = c("San Diego State", "San Diego State",
"Cincinnati", "Temple", "Temple", "Drake", "Northern Iowa", "Cincinnati",
"Cincinnati"),
d_name = c("Colorado State", "Colorado State", "SMU", "Rutgers", "Rutgers",
"Evansville", "Bradley", "SMU", "SMU"), type_bet = c("total", "total", "total",
"total", "total",
"ml", "total", "ml", "total"), sec = c("h1", "h1", "h1", "h1",
"h1", "h1", "h1", "h1", "h1"), spread = c(3.5, 3.5, 8, 1.5, 1.5,
7, 12, 8, 8), total = c(138.5, 138.5, 125.5, 150, 150, 136, 133,
125.5, 125.5), deriv = c(65, 65, 59, 70.5, 70.5, 0, 62, 0, 58.5),
book = c("pin", "5d", "pin", "pin", "5d", "5d", "5d", "5d",
"5d"), edge = c(12L, 10L, 9L, 8L, 6L, 4L, 3L, 2L, 2L), my_f_price = c(120L,
120L, 122L, 116L, 116L, -214L, 113L, -242L, 112L), book_f_price = c(-108L,
-110L, -103L, -108L, -110L, -210L, -110L, -240L, -110L), my_d_price = c(-120L,
-120L, -122L, -116L, -116L, 214L, -113L, 242L, -112L), book_d_price = c(-108L,
-110L, -113L, -108L, -110L, 175L, -110L, 200L, -110L)), .Names = c("status",
"f_name", "d_name", "type_bet", "sec", "spread", "total", "deriv",
"book", "edge", "my_f_price", "book_f_price", "my_d_price", "book_d_price" ),
class = "data.frame", row.names = c(NA, -9L))
#You can sort your data on the required columns - but doesn't produce exactly the output you want
df2 <- df[order(df$f_name, df$d_name, df$type_bet, df$sec) , ]
不确定您想要的输出结构 (即各组之间的空白是什么?),但你可以使用列表接近这一点。
#Split data by required groups (and remove empty dataframes produced by interaction)
df.grp <- split(df , list(df$f_name, df$d_name, df$type_bet, df$sec))
df.grp <- df.grp[sapply(df.grp, function(z) nrow(z)>0)]
#Get in the order of decreasing edge
max.edge <- unlist(lapply(df.grp , function(x) max(x[,'edge'])))
list.names <- names(sort(max.edge, decreasing=T))
(out <- df.grp[match(names(df.grp),list.names)])