数据帧分组依据并在新数据帧中查找值

时间:2017-11-19 20:07:16

标签: python-3.x pandas pandas-groupby

我有一个数据框,我必须看看是否存在一个条目,其中每个市场都有last_saturday数据,这也是一个最大数据条目。

data = {
        'marketplace': [3, 3, 4, 4, 5, 3, 4],
        'date': ['2017-11-11', '2017-11-10', '2017-11-07', '2017-11-08', '2017-11-10', '2017-11-09', '2017-11-10']
       }
last_saturday = '2017-11-11'

df = pd.DataFrame(data, columns= ['marketplace', 'date'])

df_sub = df.groupby(['marketplace'])['date'].max()
print(df_sub)

我得到df_sub =

marketplace
3    2017-11-11
4    2017-11-10
5    2017-11-10
Name: date, dtype: object

如何遍历df_sub以查看市场的日期是否与last_saturday匹配?

当我尝试打印日期print(df_sub['date'])时,我收到以下错误:

TypeError: an integer is required
    tz=getattr(series.dtype, 'tz', None))
  File "pandas/_libs/index.pyx", line 83, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 91, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 141, in pandas._libs.index.IndexEngine.get_loc
KeyError: 'date'

我认为为了访问df_sub中的数据,我必须使用ilocloc,但不确定如何使用struct sockaddr_in

1 个答案:

答案 0 :(得分:0)

我认为您需要将Series与仅值进行比较 - 获取布尔值掩码并需要any来检查至少一个True

print ((df_sub == last_saturday).any())
True

print (df_sub == last_saturday)
3     True
4    False
5    False
Name: date, dtype: bool

首先按参数DataFramereset_index创建as_index=False

df_sub = df.groupby(['marketplace'], as_index=False)['date'].max()
#df_sub = df.groupby(['marketplace'])['date'].max().reset_index()
print(df_sub)
   marketplace        date
0            3  2017-11-11
1            4  2017-11-10
2            5  2017-11-10

比较专栏:

print ((df_sub['date'] == last_saturday).any())
True