我有这样的数据集:
data(CO2, package = 'datasets')
## Plant Type Treatment conc uptake
## 1 Qn1 Quebec nonchilled 95 16.0
## 2 Qn1 Quebec nonchilled 175 30.4
## ...
## 17 Qn3 Quebec nonchilled 250 40.3
## 18 Qn3 Quebec nonchilled 350 42.1
## ...
## 27 Qc1 Quebec chilled 675 35.4
## 28 Qc1 Quebec chilled 1000 38.7
## ...
## 36 Qc3 Quebec chilled 95 15.1
## 37 Qc3 Quebec chilled 175 21.0
## ...
## 47 Mn1 Mississippi nonchilled 500 30.9
## ...
## 53 Mn2 Mississippi nonchilled 350 31.8
## 54 Mn2 Mississippi nonchilled 500 32.4
## ...
## 62 Mn3 Mississippi nonchilled 675 28.1
## 63 Mn3 Mississippi nonchilled 1000 27.8
## ...
## 70 Mc1 Mississippi chilled 1000 21.9
## 71 Mc2 Mississippi chilled 95 7.7
## 72 Mc2 Mississippi chilled 175 11.4
## ...
## 83 Mc3 Mississippi chilled 675 18.9
## 84 Mc3 Mississippi chilled 1000 19.9
conc
和uptake
之外的所有变量所以我想指定一下
我不想要用于分组的变量GroupID
,其中包含所有观察结果
属于同一组的具有相同的GroupID
我找到了一个有效的解决方案,但它是一个庞然大物:
library(dplyr)
CO2 %>%
mutate(GroupID=
do.call( group_indices
, c( list(.data=.)
, colnames(.) %>%
setdiff(c('conc','uptake')) %>%
as.name()
)
)
)
## Plant Type Treatment conc uptake GroupID
## 1 Qn1 Quebec nonchilled 95 16.0 1
## 2 Qn1 Quebec nonchilled 175 30.4 1
## ...
## 8 Qn2 Quebec nonchilled 95 13.6 2
## 9 Qn2 Quebec nonchilled 175 27.3 2
## ...
## 15 Qn3 Quebec nonchilled 95 16.2 3
## 16 Qn3 Quebec nonchilled 175 32.4 3
## ...
## 22 Qc1 Quebec chilled 95 14.2 4
## 23 Qc1 Quebec chilled 175 24.1 4
## ...
## 29 Qc2 Quebec chilled 95 9.3 6
## 30 Qc2 Quebec chilled 175 27.3 6
## ...
## 36 Qc3 Quebec chilled 95 15.1 5
## 37 Qc3 Quebec chilled 175 21.0 5
## ...
## 43 Mn1 Mississippi nonchilled 95 10.6 9
## 44 Mn1 Mississippi nonchilled 175 19.2 9
## ...
有更简单的解决方案吗?
奖励:如果有一个解决方案可以使用相同类型的所有变量(例如所有因子变量)进行分组,那将是一个爆炸。
答案 0 :(得分:4)
我们可以使用from flask import Flask
app = Flask(__name__)
@app.route('/')
def show_data():
with open('/tmp/data.txt', 'r') as f:
data = f.read()
return data
根据条件对变量进行分组。在这种情况下,group_by_if
将评估列是否是一个因素。之后,is.factor
可以为每个组生成ID。
group_indices
我们还可以使用library(dplyr)
CO2_2 <- CO2 %>%
mutate(GroupID = CO2 %>%
group_by_if(is.factor) %>%
group_indices())
head(CO2_2)
# Plant Type Treatment conc uptake GroupID
# 1 Qn1 Quebec nonchilled 95 16.0 1
# 2 Qn1 Quebec nonchilled 175 30.4 1
# 3 Qn1 Quebec nonchilled 250 34.8 1
# 4 Qn1 Quebec nonchilled 350 37.2 1
# 5 Qn1 Quebec nonchilled 500 35.3 1
# 6 Qn1 Quebec nonchilled 675 39.2 1
根据列名对数据框进行分组。
group_by_at
答案 1 :(得分:3)
可以使用.GRP
data.table
library(data.table)
setDT(CO2)[, GroupID := .GRP, setdiff(names(CO2), c('conc','uptake'))]
答案 2 :(得分:0)
使用base r我们可以这样做:
A=aggregate(cbind(conc,uptake)~.,CO2,length)[,"uptake"]#You can take either con or uptake
transform(CO2,ID=rep(1:length(A),A))
Plant Type Treatment conc uptake ID
1 Qn1 Quebec nonchilled 95 16.0 1
2 Qn1 Quebec nonchilled 175 30.4 1
:
8 Qn2 Quebec nonchilled 95 13.6 2
9 Qn2 Quebec nonchilled 175 27.3 2
:
15 Qn3 Quebec nonchilled 95 16.2 3
16 Qn3 Quebec nonchilled 175 32.4 3
以一种格式:
transform(CO2,fac=rep(d<-aggregate(cbind(conc,uptake)~.,CO2,length)[,"uptake"],d))