Question

这是一个基本问题，但我很难过：

我有以下R data.table：

library(data.table)
DT <- fread('unique_point biased    data_points   team   groupID                                                                                                           
 up1          FALSE     3             A      xy28352                                                                                                                 
 up1          TRUE      4             A      xy28352                                                                                                                 
 up2          FALSE     1             A      xy28352                                                                                                                  
 up2          TRUE      0             X      xy28352                                                                                                                  
 up3          FALSE     12            Y      xy28352                                                                                                                 
 up3          TRUE      35            Z      xy28352')

以

打印出来

> DT
   unique_point biased data_points team groupID
1:          up1  FALSE           3    A xy28352
2:          up1   TRUE           4    A xy28352
3:          up2  FALSE           1    A xy28352
4:          up2   TRUE           0    X xy28352
5:          up3  FALSE          12    Y xy28352
6:          up3   TRUE          35    Z xy28352

列team的值是字母A到Z，有26种可能性。在这一刻。如果我使用以下代码计算行值：

DT[, counts := .N, by=c("team")]

给出了

> DT
   unique_point biased data_points team groupID counts
1:          up1  FALSE           3    A xy28352      3
2:          up1   TRUE           4    A xy28352      3
3:          up2  FALSE           1    A xy28352      3
4:          up2   TRUE           0    X xy28352      1
5:          up3  FALSE          12    Y xy28352      1
6:          up3   TRUE          35    Z xy28352      1

我想在DT中创建26个新列，其中包含每个team，A，B，C等的大小。< / p>

结果data.table如下所示：

> DT
   unique_point biased data_points team groupID    A   B   C ... Z
1:          up1  FALSE           3    A xy28352    3   0   0 ... 1
2:          up1   TRUE           4    A xy28352    3   0   0 ... 1
3:          up2  FALSE           1    A xy28352    3   0   0 ... 1
4:          up2   TRUE           0    X xy28352    3   0   0 ... 1
5:          up3  FALSE          12    Y xy28352    3   0   0 ... 1
6:          up3   TRUE          35    Z xy28352    3   0   0 ... 1

我不确定如何使用data.table语法进行此操作..

编辑：我很高兴用基础R和dplyr这样做。

Answer 1

public class DatingService { @Autowired private DatingDaoImpl datingDao; getDatings() { try { datingDao.getDatings(); } catch(Exception e) { //log it please, like so: //logger.debug("Exception: ", e); } } }怎么样，那可以吗？

plyr

Answer 2

这是一个不寻常的解决方案，但它确实有效。我使用了dplyr和tidyr

DT[, counts := .N, by=c("team")]
x <- data.frame(team = sample(LETTERS,26))%>%arrange(team)
y <- DT%>%select(team,counts)%>%unique()
df <- x%>%left_join(y,"team")%>%spread(team, counts,fill = 0)
cbind(DT,df)

注意：left_join会发出警告消息，但不会篡改输出，并且可以解决dplyr join warning: joining factors with different levels

R data.table：基于行值的大小对data.table / dataframe进行子集化

2 个答案: