我有dataframe
,如下所示:
x <- data.table(Tickers=c("A","A","A","B","B","B","B","D","D","D","D"),
Type=c("put","call","put","call","call","put","call","put","call","put","call"),
Strike=c(35,37.5,37.5,10,11,11,12,40,40,42,42),
Other=sample(20,11))
Tickers Type Strike Other
1: A put 35.0 6
2: A call 37.5 5
3: A put 37.5 13
4: B call 10.0 15
5: B call 11.0 12
6: B put 11.0 4
7: B call 12.0 20
8: D put 40.0 7
9: D call 40.0 11
10: D put 42.0 10
11: D call 42.0 1
我正在尝试分析数据的子集。我想要的子集是ticker
和strike
相同的数据。但是,如果put
下存在call
和type
,我也只想获取此数据。以上面的数据为例,我想返回以下结果:
x[c(2,3,5,6,8:11),]
Tickers Type Strike Other
1: A call 37.5 5
2: A put 37.5 13
3: B call 11.0 12
4: B put 11.0 4
5: D put 40.0 7
6: D call 40.0 11
7: D put 42.0 10
8: D call 42.0 1
我不确定这样做的最佳方法是什么。我的思维过程是我应该创建另一个列向量,如
x$id <- paste(x$Tickers,x$Strike,sep="_")
然后使用此向量仅拉出有多个ID的值。
x[x$id %in% x$id[duplicated(x$id)],]
Tickers Type Strike Other id
1: A call 37.5 5 A_37.5
2: A put 37.5 13 A_37.5
3: B call 11.0 12 B_11
4: B put 11.0 4 B_11
5: D put 40.0 7 D_40
6: D call 40.0 11 D_40
7: D put 42.0 10 D_42
8: D call 42.0 1 D_42
我不确定这是多么有效,因为我的实际数据包含更多行。
此外,此解决方案不会检查type
条件是否有一个put
和一个call
。
标题的措辞可能会好很多,我道歉
编辑:::查看了这篇文章Finding ALL duplicate rows, including "elements with smaller subscripts"
我也可以使用这个解决方案:
x$id <- paste(x$Tickers,x$Strike,sep="_")
x[duplicated(x$id) | duplicated(x$id,fromLast=T),]
答案 0 :(得分:2)
对您的数据进行修改,以提供put
和call
都不存在的情况(我将最后一次“调用”更改为“put”):
x <- data.table(Tickers=c("A","A","A","B","B","B","B","D","D","D","D"),
Type=c("put","call","put","call","call","put","call","put","call","put","put"),
Strike=c(35,37.5,37.5,10,11,11,12,40,40,42,42),
Other=sample(20,11))
由于您使用的是data.table
,因此您可以使用内置计数器.N
和by
变量来计算组和子集。如果通过计算Type
,您可以可靠地确定put
和call
,这可能会有效:
x[, `:=`(n = .N, types = uniqueN(Type)), by = c('Tickers', 'Strike')][n > 1 & types == 2]
第一组[]
中包含的部分进行计数,然后[n > 1 & types == 2]
执行子集化。
答案 1 :(得分:0)
我不是包data.table
的用户,因此此代码仅为基础R.
agg <- aggregate(Type ~ Tickers + Strike, data = x, length)
result <- merge(x, subset(agg, Type > 1)[1:2], by = c("Tickers", "Strike"))[, c(1, 3, 2, 4)]
result
# Tickers Type Strike Other
#1: A call 37.5 17
#2: A put 37.5 7
#3: B call 11.0 14
#4: B put 11.0 20
#5: D put 40.0 15
#6: D call 40.0 2
#7: D put 42.0 8
#8: D call 42.0 1
rm(agg) # final clean up