我试图在每个时间戳找到数据框中的列名,其值与同一时间戳的时间序列中的列匹配。
这是我的数据框:
>>> df
col5 col4 col3 col2 col1
1979-01-01 00:00:00 1181.220328 912.154923 648.848635 390.986156 138.185861
1979-01-01 06:00:00 1190.724461 920.767974 657.099560 399.395338 147.761352
1979-01-01 12:00:00 1193.414510 918.121482 648.558837 384.632475 126.254342
1979-01-01 18:00:00 1171.670276 897.585930 629.201469 366.652033 109.545607
1979-01-02 00:00:00 1168.892579 900.375126 638.377583 382.584568 132.998706
>>> df.to_dict()
{'col4': {<Timestamp: 1979-01-01 06:00:00>: 920.76797370744271, <Timestamp: 1979-01-01 00:00:00>: 912.15492332839756, <Timestamp: 1979-01-01 18:00:00>: 897.58592995700656, <Timestamp: 1979-01-01 12:00:00>: 918.1214819496729}, 'col5': {<Timestamp: 1979-01-01 06:00:00>: 1190.7244605667831, <Timestamp: 1979-01-01 00:00:00>: 1181.2203275146587, <Timestamp: 1979-01-01 18:00:00>: 1171.6702763228691, <Timestamp: 1979-01-01 12:00:00>: 1193.4145103184442}, 'col2': {<Timestamp: 1979-01-01 06:00:00>: 399.39533771666561, <Timestamp: 1979-01-01 00:00:00>: 390.98615646597591, <Timestamp: 1979-01-01 18:00:00>: 366.65203285812231, <Timestamp: 1979-01-01 12:00:00>: 384.63247469269874}, 'col3': {<Timestamp: 1979-01-01 06:00:00>: 657.09956023625466, <Timestamp: 1979-01-01 00:00:00>: 648.84863460462293, <Timestamp: 1979-01-01 18:00:00>: 629.20146872682449, <Timestamp: 1979-01-01 12:00:00>: 648.55883747413225}, 'col1': {<Timestamp: 1979-01-01 06:00:00>: 147.7613518219286, <Timestamp: 1979-01-01 00:00:00>: 138.18586102094068, <Timestamp: 1979-01-01 18:00:00>: 109.54560722575859, <Timestamp: 1979-01-01 12:00:00>: 126.25434189361377}}
包含我想在每个时间戳匹配的值的时间序列:
>>> ts
1979-01-01 00:00:00 1181.220328
1979-01-01 06:00:00 657.099560
1979-01-01 12:00:00 126.254342
1979-01-01 18:00:00 109.545607
Freq: 6H
>>> ts.to_dict()
{<Timestamp: 1979-01-01 06:00:00>: 657.09956023625466, <Timestamp: 1979-01-01 00:00:00>: 1181.2203275146587, <Timestamp: 1979-01-01 18:00:00>: 109.54560722575859, <Timestamp: 1979-01-01 12:00:00>: 126.25434189361377}
然后结果将是:
>>> df_result
value Column
1979-01-01 00:00:00 1181.220328 col5
1979-01-01 06:00:00 657.099560 col3
1979-01-01 12:00:00 126.254342 col1
1979-01-01 18:00:00 109.545607 col1
我希望我的问题足够明确。任何人都知道如何获得df_result?
由于
格雷格
答案 0 :(得分:8)
这是一种,也许是不优雅的方式:
df_result = pd.DataFrame(ts, columns=['value'])
设置一个函数,用于获取包含值的列名(来自ts
):
def get_col_name(row):
b = (df.ix[row.name] == row['value'])
return b.index[b.argmax()]
每行,测试哪些元素等于该值,并提取True的列名。
apply
它(行方式):
In [3]: df_result.apply(get_col_name, axis=1)
Out[3]:
1979-01-01 00:00:00 col5
1979-01-01 06:00:00 col3
1979-01-01 12:00:00 col1
1979-01-01 18:00:00 col1
即。使用df_result['Column'] = df_result.apply(get_col_name, axis=1)
。
注意:get_col_name
中有相当多的事情发生,所以它可能需要进一步解释:
In [4]: row = df_result.irow(0) # an example row to pass to get_col_name
In [5]: row
Out[5]:
value 1181.220328
Name: 1979-01-01 00:00:00
In [6]: row.name # use to get rows of df
Out[6]: <Timestamp: 1979-01-01 00:00:00>
In [7]: df.ix[row.name]
Out[7]:
col5 1181.220328
col4 912.154923
col3 648.848635
col2 390.986156
col1 138.185861
Name: 1979-01-01 00:00:00
In [8]: b = (df.ix[row.name] == row['value'])
#checks whether each elements equal row['value'] = 1181.220328
In [9]: b
Out[9]:
col5 True
col4 False
col3 False
col2 False
col1 False
Name: 1979-01-01 00:00:00
In [10]: b.argmax() # index of a True value
Out[10]: 0
In [11]: b.index[b.argmax()] # the index value (column name)
Out[11]: 'col5'
可能有更有效的方法来做到这一点......
答案 1 :(得分:6)
根据Andy的详细答案,选择每行最高值的列名称的解决方案可以简化为一行:
df['column'] = df.apply(lambda x: df.columns[x.argmax()], axis = 1)
答案 2 :(得分:4)
只是想补充一下,如果多个列可能具有该值并且您想要 all 列表中的列名称,则可以执行以下操作(例如对于要获取所有值= 1)的列的情况:
df.apply(lambda row: row[row == 1].index, axis=1)
这个想法是,您将每一行变成一个系列(通过添加axis = 1),其中列名现在变成了该系列的索引。然后,您使用条件(例如row == 1)过滤系列,然后获取索引值(也就是列名!)。
答案 3 :(得分:1)
我试图创建一个新列来指示哪个现有列具有最大的行值。这给了我所需的字符串列标签:
df['column_with_biggest_value'] = df.idxmax(axis=1)