我有data.table
这种格式:
dt1 <- data.table(row_names=1:5, perf=c(2,NA,NA,3,NA), ticker=rep("aa",5))
dt2 <- data.table(row_names=1:5, perf=c(NA,1,2,5,NA), ticker=rep("aapl",5))
row_names perf ticker
1: 1 2 aa
2: 2 NA aa
3: 3 NA aa
4: 4 3 aa
5: 5 NA aa
row_names perf ticker
1: 1 NA aapl
2: 2 1 aapl
3: 3 2 aapl
4: 4 5 aapl
5: 5 NA aapl
我有N
这些数据表,并希望加入它们,以便我取perf
的平均值。但是,如果其中一个数据表存在NA
值,我不想这样做。在上面的例子中,我想得到data.table:
> res <- data.table(row_names=1:5,perf=c(2,1,2,4,NA),tickers=c("aa","aapl","aapl","aa,aapl",NA))
> res
row_names perf tickers
1: 1 2 aa
2: 2 1 aapl
3: 3 2 aapl
4: 4 4 aa,aapl
5: 5 NA NA
我知道我可以做这样的事情,以便取消移除NA
的方法:
rbind(dt1,dt2)[,list("perf"=mean(perf,na.rm=T)),by=row_names]
row_names perf
1: 1 2
2: 2 1
3: 3 2
4: 4 4
5: 5 NaN
如何设置tickers
列的条件以根据有冲突的NA
粘贴它们。另外,是否将data tables
所有mean
最有效的方式用于执行def shrink(l):
if len(l) <= 1:
return l
if type(l[-1]) == int and type(l[-2]) == int:
l[-2] = int(str(l[-2]) + str(l[-1]))
return shrink(l[:-1])
else:
return shrink(l[:-1]) + [l[-1]]
shrink(lst)
功能?谢谢!
答案 0 :(得分:4)
使用:
res <- rbind(dt1,dt2)[, .(perf = mean(perf, na.rm = TRUE),
tickers = toString(ticker[!is.na(perf)]))
, by = row_names]
给出:
> res row_names perf tickers 1: 1 2 aa 2: 2 1 aapl 3: 3 2 aapl 4: 4 4 aa, aapl 5: 5 NaN
您可以使用参数toString
paste
或paste0
代替collapse = ','
。
根据@Frank的建议,您可以将代码调整为:
res <- rbind(dt1,dt2)[, .(perf = if (all(is.na(perf))) NA_real_ else mean(perf, na.rm = TRUE),
tickers = if (all(is.na(perf))) NA_character_ else toString(ticker[!is.na(perf)]))
, by = row_names]
给出:
> res row_names perf tickers 1: 1 2 aa 2: 2 1 aapl 3: 3 2 aapl 4: 4 4 aa, aapl 5: 5 NA NA