如何在NA上合并多个具有特定条件的数据表

时间:2018-05-06 21:06:59

标签: r dataframe data.table

我有data.table这种格式:

dt1 <- data.table(row_names=1:5, perf=c(2,NA,NA,3,NA), ticker=rep("aa",5))
dt2 <- data.table(row_names=1:5, perf=c(NA,1,2,5,NA), ticker=rep("aapl",5))

   row_names perf ticker
1:         1    2     aa
2:         2   NA     aa
3:         3   NA     aa
4:         4    3     aa
5:         5   NA     aa  

   row_names perf ticker
1:         1   NA   aapl
2:         2    1   aapl
3:         3    2   aapl
4:         4    5   aapl
5:         5   NA   aapl  

我有N这些数据表,并希望加入它们,以便我取perf的平均值。但是,如果其中一个数据表存在NA值,我不想这样做。在上面的例子中,我想得到data.table:

> res <- data.table(row_names=1:5,perf=c(2,1,2,4,NA),tickers=c("aa","aapl","aapl","aa,aapl",NA))
> res
   row_names perf tickers
1:         1    2      aa
2:         2    1    aapl
3:         3    2    aapl
4:         4    4 aa,aapl
5:         5   NA      NA

我知道我可以做这样的事情,以便取消移除NA的方法:

rbind(dt1,dt2)[,list("perf"=mean(perf,na.rm=T)),by=row_names]

   row_names perf
1:         1    2
2:         2    1
3:         3    2
4:         4    4
5:         5  NaN

如何设置tickers列的条件以根据有冲突的NA粘贴它们。另外,是否将data tables所有mean最有效的方式用于执行def shrink(l): if len(l) <= 1: return l if type(l[-1]) == int and type(l[-2]) == int: l[-2] = int(str(l[-2]) + str(l[-1])) return shrink(l[:-1]) else: return shrink(l[:-1]) + [l[-1]] shrink(lst) 功能?谢谢!

1 个答案:

答案 0 :(得分:4)

使用:

res <- rbind(dt1,dt2)[, .(perf = mean(perf, na.rm = TRUE),
                          tickers = toString(ticker[!is.na(perf)]))
                      , by = row_names]

给出:

> res
   row_names perf  tickers
1:         1    2       aa
2:         2    1     aapl
3:         3    2     aapl
4:         4    4 aa, aapl
5:         5  NaN

您可以使用参数toString pastepaste0代替collapse = ','

根据@Frank的建议,您可以将代码调整为:

res <- rbind(dt1,dt2)[, .(perf = if (all(is.na(perf))) NA_real_ else mean(perf, na.rm = TRUE),
                          tickers = if (all(is.na(perf))) NA_character_ else toString(ticker[!is.na(perf)]))
                      , by = row_names]

给出:

> res
   row_names perf  tickers
1:         1    2       aa
2:         2    1     aapl
3:         3    2     aapl
4:         4    4 aa, aapl
5:         5   NA       NA