我正在尝试创建一个summarise
/ filter
dplyr
管道,该管道将等同于以下内容:
iris %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width,
Petal.Area = Petal.Length * Petal.Width) %>%
group_by(Species) %>%
filter(Sepal.Area < 17) %>%
tally() %>%
filter(Sepal.Area > 17 & Sepal.Area < 22) %>%
tally() %>%
filter(Sepal.Area > 22) %>%
tally()
或另一种可能的方法:
iris %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width,
Petal.Area = Petal.Length * Petal.Width) %>%
group_by(Species) %>%
summarise(n(Sepal.Area < 17),
n(Sepal.Area > 17 & Sepal.Area < 22),
n(Sepal.Area > 22))
通过分组上的多个过滤器获取计数的最简单方法是什么? 或者只是运行每一个并在以后加入它们?
答案 0 :(得分:3)
您可以尝试from flask import Flask, request
from flask_marshmallow import Marshmallow
from flask_sqlalchemy import SQLAlchemy
from marshmallow import fields
from sqlalchemy import Table, Column, Integer, String, ForeignKey
from sqlalchemy.orm import relationship
class Config(object):
SQLALCHEMY_DATABASE_URI = '<CONNECTION STRING HERE>'
SQLALCHEMY_TRACK_MODIFICATIONS = False
app = Flask(__name__)
app.config.from_object(Config)
db = SQLAlchemy(app)
ma = Marshmallow(app)
# Model
class MailAddress(db.Model):
__tablename__ = 'mail_addresses'
id = Column(Integer, primary_key=True)
user_id = Column(Integer, ForeignKey('users.id'))
mail_type = Column(String(200), nullable=False)
mail = Column(String(200), nullable=False)
def __init__(self, mail, mail_type):
self.mail = mail
self.mail_type = mail_type
class MailAddressSchema(ma.ModelSchema):
class Meta:
model = MailAddress
class User(db.Model):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(200), nullable=False)
mail_addresses = relationship('MailAddress', backref='user')
def __init__(self, name, mail_addresses):
self.name = name
self.mail_addresses = mail_addresses
def __hash__(self):
return hash(self.name)
class UserSchema(ma.ModelSchema):
mail_addresses = fields.Nested(MailAddressSchema, many = True, only=('mail', 'mail_type'))
class Meta:
model = User
# Routes
user_schema = UserSchema()
@app.route('/api/v0/user', methods=['GET'])
def user_get():
users = db.session.query(User).all()
return user_schema.jsonify(users, many = True), 200
@app.route('/api/v0/user', methods=['POST'])
def user_create():
new_instance = user_schema.make_instance(request.json)
db.session.add(new_instance)
db.session.commit()
return user_schema.jsonify(new_instance), 201
# Main
if __name__ == '__main__':
app.run('localhost', 5555)
:
cut
答案 1 :(得分:1)
您必须为所需的不同Sepal.Area范围创建组,然后按这些范围进行分组和计数。试试这个:
iris %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width,
Petal.Area = Petal.Length * Petal.Width) %>% mutate(Sepal.Area.Groups = ifelse(Sepal.Area < 17, 'Sep_less_17', ifelse(Sepal.Area > 17 & Sepal.Area < 22, 'Sep_bet_1722', ifelse(Sepal.Area > 22, 'Sep_gre_22', 'other')))) %>%
group_by(Sepal.Area.Groups) %>%
tally()
# A tibble: 4 x 2
Sepal.Area.Groups n
<chr> <int>
1 Sep_bet_1722 74
2 Sep_gre_22 13
3 Sep_less_17 61
4 other 2
使用dplyr,如果在执行计数后应用过滤器,则基本上是在计算表上进行过滤。
答案 2 :(得分:1)
我认为使用cut是正确的方法。我没有对此answer发表评论的声誉,但您也可以使用标签。
iris %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width,
Petal.Area = Petal.Length * Petal.Width) %>%
mutate(size = cut(Sepal.Area, breaks = c(0, 17, 22, Inf),
labels = c("small", "medium", "large"))) %>%
group_by(size) %>% summarize(count = n())