熊猫:根据条件在每个组中创建一个新行

时间:2020-07-25 11:34:43

标签: python pandas loops dataframe group-by

我有一个日期框架(df),

google-api-python-client
oauth2client

是这样的:

class SearchBar: UISearchBar{

        override func layoutSubviews() {
         //    self.layer.cornerRadius = self.frame.height / 2
           // better to use 
             self.layer.cornerRadius = self.bounds.midX
        }
        override func awakeFromNib() {
           
            self.backgroundImage = UIImage()
            self.backgroundColor = UIColor.white
            self.tintColor = UIColor.white
            self.layer.borderColor = UIColor.lightGray.cgColor
            self.layer.borderWidth = 1.0
            if let textField = self.value(forKey: "searchField") as? UITextField{
                textField.borderStyle = .none
                textField.backgroundColor = UIColor.white
            }
        }
    }

对于每个人(组),我希望在每个组(“ ID”)的第一行上创建一个新的重复行,在“ ID”,“ From_num”和“ To_num”列中创建行的值应该与上一行第一行相同,但“日期”值是第一行的旧日期加上一天,例如对于James,新创建的行值是:“ James”,“ 78”,“ 96”,“ 2020-05-13”,与其余数据相同,所以我的预期结果是:

df = pd.DataFrame({
    'ID': ['James', 'James', 'James','Max', 'Max', 'Max', 'Max','Park','Tom', 'Tom', 'Tom', 'Tom','Wong'],
    'From_num': [78, 420, 'Started', 298, 36, 298, 'Started', 'Started', 60, 520, 99, 'Started', 'Started'],
    'To_num': [96, 78, 420, 36, 78, 36, 298, 311, 150, 520, 78, 99, 39],
    'Date': ['2020-05-12', '2020-02-02', '2019-06-18',
             '2019-06-20', '2019-01-30', '2018-10-23',
             '2018-08-29', '2020-05-21', '2019-11-22',
             '2019-08-26', '2018-12-11', '2018-10-09', '2019-02-01']})

我希望订单/序列与我预期的结果相同。如果您有什么好主意,请帮忙。非常感谢

1 个答案:

答案 0 :(得分:1)

使用:

df['Date'] = pd.to_datetime(df['Date'])
df['order'] = df.groupby('ID').cumcount().add(1)

df1 = (
    df.groupby('ID', as_index=False).first()
    .assign(Date=lambda x: x['Date'] + pd.Timedelta(days=1), order=0)
)

df1 = pd.concat([df, df1]).sort_values(['ID', 'order'], ignore_index=True).drop('order', 1)

详细信息:

Date列转换为熊猫datetime系列,并在列IDDataFrame.groupby上使用groupby.cumcount来施加总的排序在数据框中的每个组中。

print(df)
       ID From_num  To_num       Date  order
0   James       78      96 2020-05-13      1
1   James       78      96 2020-05-12      2
2   James      420      78 2020-02-02      3
3   James  Started     420 2019-06-18      4
4     Max      298      36 2019-06-21      1
5     Max      298      36 2019-06-20      2
6     Max       36      78 2019-01-30      3
7     Max      298      36 2018-10-23      4
8     Max  Started     298 2018-08-29      5
9    Park  Started     311 2020-05-22      1
10   Park  Started     311 2020-05-21      2
11    Tom       60     150 2019-11-23      1
12    Tom       60     150 2019-11-22      2
13    Tom      520     520 2019-08-26      3
14    Tom       99      78 2018-12-11      4
15    Tom  Started      99 2018-10-09      5
16   Wong  Started      39 2019-02-02      1
17   Wong  Started      39 2019-02-01      2

通过在列df1上使用DataFrame.groupby创建新的数据帧ID,并使用groupby.first进行聚合并分配order=0并以Date递增1 days中的{3}}。

print(df1)
      ID From_num  To_num       Date  order
0  James       78      96 2020-05-14      0 # Date incremented by 1 days
1    Max      298      36 2019-06-22      0 # and ordering added
2   Park  Started     311 2020-05-23      0
3    Tom       60     150 2019-11-24      0
4   Wong  Started      39 2019-02-03      0

使用pd.Timedelta合并数据帧dfdf1,并使用pd.concat对列IDorder上的数据帧进行排序。

print(df1)
       ID From_num  To_num       Date
0   James       78      96 2020-05-14
1   James       78      96 2020-05-13
2   James       78      96 2020-05-12
3   James      420      78 2020-02-02
4   James  Started     420 2019-06-18
5     Max      298      36 2019-06-22
6     Max      298      36 2019-06-21
7     Max      298      36 2019-06-20
8     Max       36      78 2019-01-30
9     Max      298      36 2018-10-23
10    Max  Started     298 2018-08-29
11   Park  Started     311 2020-05-23
12   Park  Started     311 2020-05-22
13   Park  Started     311 2020-05-21
14    Tom       60     150 2019-11-24
15    Tom       60     150 2019-11-23
16    Tom       60     150 2019-11-22
17    Tom      520     520 2019-08-26
18    Tom       99      78 2018-12-11
19    Tom  Started      99 2018-10-09
20   Wong  Started      39 2019-02-03
21   Wong  Started      39 2019-02-02
22   Wong  Started      39 2019-02-01