根据值(时间戳)向数据框添加列

时间:2015-08-04 22:26:15

标签: python pandas

有两个数据框:

df = pd.DataFrame([
    [1, '20150601T060000', 1, 3],
    [2, '20150601T061500', 1, 3],
    [3, '20150601T061500', 2, 3],
    [4, '20150601T063000', 2, 3],
    [5, '20150602T060000', 1, 3],
    [6, '20150602T061500', 1, 3],
    [7, '20150602T060000', 2, 3],
    [8, '20150602T061500', 2, 3],
    [9, '20150603T061500', 2, 3],
    ],columns='A B C D'.split())
df2 = pd.DataFrame([
    [1, '20150601T060000', '20150601T070000', 1, 0],
    [2, '20150601T061500', '20150601T070000', 2, 0],
    [3, '20150602T060000', '20150602T070000', 1, 0],
    [4, '20150602T060000', '20150602T070000', 2, 0],
    [5, '20150603T060000', '20150603T070000', 2, 0],
    ],columns='A B1 B2 C D'.split())

如何将列B1和B2从df2添加到df,以便两个数据帧中的C列具有相同的值,B列值介于B1和B2之间?

结果应该是

   A                B  C  D               B1               B2
0  1  20150601T060000  1  3  20150601T060000  20150601T070000
1  2  20150601T061500  1  3  20150601T060000  20150601T070000
2  3  20150601T061500  2  3  20150601T061500  20150601T070000
3  4  20150601T063000  2  3  20150601T061500  20150601T070000
4  5  20150602T060000  1  3  20150602T060000  20150602T070000
5  6  20150602T061500  1  3  20150602T060000  20150602T070000
6  7  20150602T060000  2  3  20150602T060000  20150602T070000
7  8  20150602T061500  2  3  20150602T060000  20150602T070000
8  9  20150603T061500  2  3  20150603T060000  20150603T070000

2 个答案:

答案 0 :(得分:1)

这个怎么样

@Override
public void onSubjectCheck(SubjectInfo si) {
    for (int go = 0; go < subjectList.size(); go++) {
        if (subjectList.get(go).equals(si)) {
            si = new SubjectInfo(si.subjectName, si.itemHeaderTitle, si.subjectGrade,
                    si.subjectArchived, !si.subjectChecked);
            amFragment.subjectList.set(go, si);
        }
    }
    amFragment.sorterAndFilter(false);
}

更新

如果您想按列merged = pandas.merge( df, df2[['C','B1','B2']], on='C') result = merged.query( 'B1 <= B <= B2') # A B C D B1 B2 #0 1 20150601T060000 1 3 20150601T060000 20150601T070000 #2 2 20150601T061500 1 3 20150601T060000 20150601T070000 #5 5 20150602T060000 1 3 20150602T060000 20150602T070000 #7 6 20150602T061500 1 3 20150602T060000 20150602T070000 #8 3 20150601T061500 2 3 20150601T061500 20150601T070000 #11 4 20150601T063000 2 3 20150601T061500 20150601T070000 #15 7 20150602T060000 2 3 20150602T060000 20150602T070000 #18 8 20150602T061500 2 3 20150602T060000 20150602T070000 #22 9 20150603T061500 2 3 20150603T060000 20150603T070000 排序(如您所需的结果所示),那么只需'A'

除了

我通常不会使用datetime列,但在执行查询之前首先将这些列显式转换为datetime dtypes会更安全(加上数据更容易阅读)

result = merged.query('B1<=B<=B2').sort('A')

答案 1 :(得分:0)

修改由于您在我的原始答案后已经更改了您的问题,因此您需要一个稍微复杂的解决方案。基本上在这种情况下你需要

  1. 转换时间序列datetime数据类型,并设置为索引
  2. 重新采样,以便您有规律的间隔,并填充空值
  3. 左合并。
  4. 如果需要,按原始时间掩盖。
  5. 转换和重新取样:

    df.index = pd.to_datetime(df.B)
    df2.index = pd.to_datetime(df2.B1)
    df_resampled = df.resample('15min').fillna(method='pad')
    df2_resampled = df2.resample('15min').fillna(method='pad')
    

    左键连接索引,以及列C并从原始索引中拉回值:

    merged = pd.merge(df_resampled, df2_resampled, 
                   left_index = True, right_index = True, on='C', how='left')
    merged.ix[df.index]
    Out[182]: 
                         A_x    C  D_x  A_y  D_y
    B                                           
    2015-06-01 06:00:00  1.0  1.0    3  1.0    0
    2015-06-01 06:15:00  2.5  1.5    3  2.0    0
    2015-06-01 06:15:00  2.5  1.5    3  2.0    0
    2015-06-01 06:30:00  4.0  2.0    3  2.0    0
    2015-06-02 06:00:00  6.0  1.5    3  3.5    0
    2015-06-02 06:15:00  7.0  1.5    3  3.5    0
    2015-06-02 06:00:00  6.0  1.5    3  3.5    0
    2015-06-02 06:15:00  7.0  1.5    3  3.5    0
    2015-06-03 06:15:00  9.0  2.0    3  5.0    0