我想按df
和Date
在下面的ItemId
分组:
Id Timestamp Data ItemId Date
2012-04-21 19389576 2012-04-21 00:04:03.533 39.0 1 2012-04-21
2012-04-21 19389577 2012-04-21 00:04:04.870 38.5 1 2012-04-21
2012-04-21 19389608 2012-04-21 00:07:03.450 38.0 1 2012-04-21
...
2012-04-22 19389609 2012-04-21 00:03:04.817 37.5 2 2012-04-21
2012-04-22 19389620 2012-04-21 00:10:04.400 37.0 2 2012-04-21
...
获取Date
和ItemId
的所有组合,然后使用df
和Date
的每种组合从原始数据帧ItemId
中进行选择,例如{ {1}},Date== 2012-04-21 and ItemId==1
...
如何在for循环中同时使用2列选择数据?
答案 0 :(得分:2)
由于使用group by
时,每个行索引将是一个元组(2012-04-21,1)
,(2012-04-21,2)
,(2012-04-22,1)
:
from datetime import datetime
import pandas as pd
import io
s_e=""" Id Timestamp Data ProductId Date
2012-04-21 19389576 2012-04-21 00:04:03.533 39.0 1 2012-04-21
2012-04-21 19389577 2012-04-21 00:04:04.870 38.5 1 2012-04-21
2012-04-21 19389608 2012-04-21 00:07:03.450 38.0 1 2012-04-22
2012-04-22 19389609 2012-04-21 00:03:04.817 37.5 2 2012-04-21
2012-04-22 19389620 2012-04-21 00:10:04.400 37.0 2 2012-04-22
"""
pd.set_option('display.max_columns', None )
df = pd.read_csv(io.StringIO(s_e), sep=' ', parse_dates=[1,4], engine='python')
df=df.groupby(['Date','ProductId']).agg(list)
print('df:\n',df)
print('df.index.values:\n',df.index.values)
输出:
>>>df:
Timestamp Data
Date ProductId
2012-04-21 1 [2012-04-21 00:04:03.533000, 2012-04-21 00:04:04.870000] [39.0, 38.5]
2 [2012-04-21 00:03:04.817000] [37.5]
2012-04-22 1 [2012-04-21 00:07:03.450000] [38.0]
2 [2012-04-21 00:10:04.400000] [37.0]
>>>df.index.values:
[(Timestamp('2012-04-21 00:00:00'), 1)
(Timestamp('2012-04-21 00:00:00'), 2)
(Timestamp('2012-04-22 00:00:00'), 1)
(Timestamp('2012-04-22 00:00:00'), 2)]
您可以尝试执行以下操作来选择特定的组合,例如Date== 2012-04-21 and ItemId==1
组合:
datetoselect=(datetime.strptime('2012-04-21','%Y-%m-%d'),2) #Date== 2012-04-21 and ItemId==1
print(df[[i==datetoselect for i in df.index.values]])
输出:
Id Timestamp Data
Date ProductId
2012-04-21 2 [2012-04-22 19389609] [2012-04-21 00:03:04.817000] [37.5]
答案 1 :(得分:2)
IIUC,如果您只想打印每个组的数据,请使用:
for key, group in df.groupby(['ItemId', 'Date']):
print(key)
print(group)
此打印:
(1, '2012-04-21')
Id Timestamp Data ItemId Date
2012-04-21 19389576 2012-04-21 00:04:03.533 39.0 1 2012-04-21
2012-04-21 19389577 2012-04-21 00:04:04.870 38.5 1 2012-04-21
2012-04-21 19389608 2012-04-21 00:07:03.450 38.0 1 2012-04-21
(2, '2012-04-21')
Id Timestamp Data ItemId Date
2012-04-22 19389609 2012-04-21 00:03:04.817 37.5 2 2012-04-21
2012-04-22 19389620 2012-04-21 00:10:04.400 37.0 2 2012-04-21
答案 2 :(得分:1)
尝试通过将每个选择器添加到一组括号中并在之间添加&符&:
来执行双重选择器df[(df[“Date”] == “2020-04-21”)& (df[“ItemId”] == 2)]