Dplyr计数/由多个过滤器计数

时间:2017-03-29 14:20:19

标签: r dplyr

我正在尝试创建一个summarise / filter dplyr管道,该管道将等同于以下内容:

iris %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width,
       Petal.Area = Petal.Length * Petal.Width) %>%
  group_by(Species) %>%
      filter(Sepal.Area < 17) %>%
        tally() %>%
      filter(Sepal.Area > 17 & Sepal.Area < 22) %>%
        tally() %>%
      filter(Sepal.Area > 22) %>%
        tally()

或另一种可能的方法:

iris %>%
  mutate(Sepal.Area = Sepal.Length * Sepal.Width,
         Petal.Area = Petal.Length * Petal.Width) %>%
  group_by(Species) %>%
    summarise(n(Sepal.Area < 17),
              n(Sepal.Area > 17 & Sepal.Area < 22),
              n(Sepal.Area > 22))

通过分组上的多个过滤器获取计数的最简单方法是什么? 或者只是运行每一个并在以后加入它们?

3 个答案:

答案 0 :(得分:3)

您可以尝试from flask import Flask, request from flask_marshmallow import Marshmallow from flask_sqlalchemy import SQLAlchemy from marshmallow import fields from sqlalchemy import Table, Column, Integer, String, ForeignKey from sqlalchemy.orm import relationship class Config(object): SQLALCHEMY_DATABASE_URI = '<CONNECTION STRING HERE>' SQLALCHEMY_TRACK_MODIFICATIONS = False app = Flask(__name__) app.config.from_object(Config) db = SQLAlchemy(app) ma = Marshmallow(app) # Model class MailAddress(db.Model): __tablename__ = 'mail_addresses' id = Column(Integer, primary_key=True) user_id = Column(Integer, ForeignKey('users.id')) mail_type = Column(String(200), nullable=False) mail = Column(String(200), nullable=False) def __init__(self, mail, mail_type): self.mail = mail self.mail_type = mail_type class MailAddressSchema(ma.ModelSchema): class Meta: model = MailAddress class User(db.Model): __tablename__ = 'users' id = Column(Integer, primary_key=True) name = Column(String(200), nullable=False) mail_addresses = relationship('MailAddress', backref='user') def __init__(self, name, mail_addresses): self.name = name self.mail_addresses = mail_addresses def __hash__(self): return hash(self.name) class UserSchema(ma.ModelSchema): mail_addresses = fields.Nested(MailAddressSchema, many = True, only=('mail', 'mail_type')) class Meta: model = User # Routes user_schema = UserSchema() @app.route('/api/v0/user', methods=['GET']) def user_get(): users = db.session.query(User).all() return user_schema.jsonify(users, many = True), 200 @app.route('/api/v0/user', methods=['POST']) def user_create(): new_instance = user_schema.make_instance(request.json) db.session.add(new_instance) db.session.commit() return user_schema.jsonify(new_instance), 201 # Main if __name__ == '__main__': app.run('localhost', 5555)

cut

答案 1 :(得分:1)

您必须为所需的不同Sepal.Area范围创建组,然后按这些范围进行分组和计数。试试这个:

iris %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width,
       Petal.Area = Petal.Length * Petal.Width) %>% mutate(Sepal.Area.Groups = ifelse(Sepal.Area < 17, 'Sep_less_17', ifelse(Sepal.Area > 17 & Sepal.Area < 22, 'Sep_bet_1722', ifelse(Sepal.Area > 22, 'Sep_gre_22', 'other')))) %>% 
  group_by(Sepal.Area.Groups) %>%
        tally()

# A tibble: 4 x 2
  Sepal.Area.Groups     n
              <chr> <int>
1      Sep_bet_1722    74
2        Sep_gre_22    13
3       Sep_less_17    61
4             other     2

使用dplyr,如果在执行计数后应用过滤器,则基本上是在计算表上进行过滤。

答案 2 :(得分:1)

我认为使用cut是正确的方法。我没有对此answer发表评论的声誉,但您也可以使用标签。

iris %>%
mutate(Sepal.Area = Sepal.Length * Sepal.Width,
       Petal.Area = Petal.Length * Petal.Width) %>% 
mutate(size = cut(Sepal.Area, breaks = c(0, 17, 22, Inf), 
                              labels = c("small", "medium", "large"))) %>%
group_by(size) %>% summarize(count = n())