Pyspark组数据按年份范围

时间:2018-08-01 15:18:49

标签: dataframe pyspark pyspark-sql

我有以下类似的pyspark数据框

import React from "react";
import { Text } from "react-native";
import { Link } from "react-router-native";
import FontAwesome, { Icons } from "react-native-fontawesome";
import AppContext from "../../../../AppContext";
import ButtonStyles from "./Button.styles";

export default class Button extends React.Component {
  render() {
    const { size, type, icon, children, onPress, variant, linkTo } = this.props;

    return (
      <AppContext.Consumer>
        {app => {
          const style = ButtonStyles.get({ app, size, type, variant });

          return (
            <Link
              to={linkTo}
              onPress={onPress}
              style={[style.body, this.props.style]}
            >
              <React.Fragment>
                {icon && (
                  <FontAwesome style={style.icon}>{Icons[icon]}</FontAwesome>
                )}
                <Text style={style.text}>{children}</Text>
              </React.Fragment>
            </Link>
          );
        }}
      </AppContext.Consumer>
    );
  }
}

我想要一个新的数据框,以便将年份分组到范围内,并在年份范围内对count列求和。例如,预期的结果输出将是:

>>> df.show()
+----+---+---+-----+
|Year|Sex|NOC|count|
+----+---+---+-----+
|1924|  M|BUL|   31|
|1948|  M|EGY|  166|
|1980|  F|POL|  127|
|1984|  M|SYR|    9|
|1992|  F|NGR|   30|
|1992|  M|PER|   15|
|1996|  M|BUL|  128|
|1976|  M|ISL|   26|
|2004|  F|GRN|    2|
|2012|  M|SVK|   35|
|2002|  F|SLO|   41|
|2008|  F|SLO|   27|
|2008|  M|HKG|   22|
|2012|  M|MLT|    3|
|2000|  F|SEN|   23|
|1964|  M|GRE|   26|
|2006|  M|ESP|   12|
|2008|  M|MON|    4|
|2002|  M|DEN|    5|
|1964|  F|ISL|    1|
+----+---+---+-----+

当前我正在使用以下方法,但无法正常工作

Year  Sex  NOC  count
1921-1940  M  BUL 153
1921-1940  F  BUL 132
1941-1960  M  BUL 984
1941-1960  F  BUL 112
....
2001-2016  M  GER  651
2001-2016  F  GER  322

任何帮助将不胜感激。谢谢!

0 个答案:

没有答案