Question

我的数据框如下所示：

datetime       | ID |
======================
20180201000000 | 275 |
20171231113024 | 534 |
20180201220000 | 275 |
20170205000000 | 28  |

我想要做的是每月按ID计算。

这种方式完全奏效了：

通过从datetime列中提取来添加月份列：

new_df = df.withColumn('month', df.datetime.substr(0,6))
按ID＆amp;月：

count_df = new_df.groupBy('ID','month').count()

但有没有办法将某些列值的子字符串用作groupBy()函数的参数？喜欢：

`count_df = df.groupBy('ID', df.datetime.substr(0,6)).count()`

至少，这段代码没有用。

如果存在使用值的子字符串的方法，则不需要添加新列并节省大量资源（如果是大数据）。

但即使这种做法错了，你有更好的想法得到同样的结果吗？

Answer 1

试试这个

$.ajax({
  type: 'POST',
  url: 'php1.php',
  dataType: "json",
  data: {
    id: id
  },
  success: function(data) {
  }
}).done(function(data) {
  var tdata = [];
  for (var z = 0; z < data.length; z++) {
    (function(index) {
      console.log(index);
      tdata[index] = data[index];
      $.ajax({
        type: 'POST',
        url: 'php2.php',
        dataType: "json",
        data: {
          id: tdata[index].id
        },
        success: function(data) {
          console.log(index);
        }
      }).done(function(data) {
        console.log(tdata);
        console.log(index); // value is 1 should be 0
        console.log(tdata[index].id);
      });
    })(z);
   }
});

pyspark - 我可以使用value的子字符串作为groupBy（）函数的键吗？

1 个答案: