Question

我正在尝试使用groupby并将值作为列表获取。

结尾df应该以“出价”作为索引，得分作为第二列的列表（例如，如果它们都具有相同的“出价”，则为[85，58]]

这是我的df：

当我使用merged.groupby("bid")['score_y'].apply(list)

我收到TypeError：“系列”对象是可变的，因此无法进行散列。

有人知道我为什么会收到此错误吗？

编辑1：

这是数据源：https://data.sfgov.org/Health-and-Social-Services/Restaurant-Scores-LIVES-Standard/pyih-qa8i

df“ ins”产生以下内容，其中“ bid”是“ iid”中“ _”之前的数字。

到目前为止，我的代码：

ins2018 = ins[ins['year'] == 2018] #.drop(["iid", 'date', 'type', 'timestamp', 'year', 'Missing Score'], axis = 1)
    
# new = ins2018.loc[ins2018["score"] > 0].sort_values("date").groupby("bid").count()
# new = new.loc[new["iid"] == 2]
# merge = pd.merge(new, ins2018, how = "left", on = "bid").sort_values('date_y')
# merged = merge.loc[merge['score_y'] > 0].drop(['iid_x', 'date_x', 'score_x', 'type_x', 'timestamp_x', 'year_x', 'Missing Score_x', 'iid_y', 'type_y', 'timestamp_y', 'year_y', 'Missing Score_y', "date_y"], axis = 1)

Answer 1

使用pandas.DataFrame.aggregat将[DllImport ("gsdll32.dll", EntryPoint = "gsapi_new_instance")] private static extern int CreateAPIInstance (out IntPtr instance, IntPtr caller);聚合到list上
取决于score_y，可能需要重设索引。

merged

示例

# reset the index of of merged
merged = merged.reset_index(drop=True)

# groupby bid and aggregate a list onto score_y
merged.groupby('bid').agg({'score_y': list})

使用来自Restaurant Scores - LIVES Standard的数据
尝试遵循OP中的代码。

import pandas as pd
import numpy as np
import random

np.random.seed(365)
random.seed(365)
rows = 100
data = {'a': np.random.randint(10, size=(rows)),
        'groups': [random.choice(['1-5', '6-25', '26-100', '100-500', '500-1000', '>1000']) for _ in range(rows)]}
df = pd.DataFrame(data)

# groupby and aggregate a list
dfg = df.groupby('groups').agg({'a': list})

dfg
[out]: 
                                                                                 a
groups                                                                            
1-5                                     [7, 8, 4, 3, 1, 7, 9, 3, 2, 7, 6, 4, 4, 6]
100-500                        [4, 3, 2, 8, 6, 3, 1, 5, 7, 7, 3, 5, 4, 7, 2, 2, 4]
26-100                   [4, 2, 2, 9, 5, 3, 1, 0, 7, 9, 7, 7, 9, 9, 9, 7, 0, 0, 4]
500-1000                                      [2, 8, 0, 7, 6, 6, 8, 4, 6, 2, 2, 5]
6-25      [5, 9, 7, 0, 6, 5, 7, 9, 9, 9, 6, 5, 6, 0, 2, 7, 4, 0, 3, 9, 0, 5, 0, 3]
>1000                                   [2, 1, 3, 6, 7, 6, 0, 5, 9, 9, 3, 2, 6, 0]

在

import pandas as pd

# load data
ins = pd.read_csv('data/Restaurant_Scores_-_LIVES_Standard.csv')

# convert inspection_date to a datetime format
ins.inspection_date = pd.to_datetime(ins.inspection_date)

# add a year column
ins['year'] = ins.inspection_date.dt.year

# select data for 2018
ins2018 = ins[ins['year'] == 2018] 

################################################################
# this is where you run into issues
# new is the counts for every column

# this is what you could have done to get the number of inspection counts
# just count the occurrences of business_id
counts = ins2018.groupby('business_id').agg({'business_id': 'count'}).rename(columns={'business_id': 'inspection_counts'}).reset_index()

# don't do this: get dataframe of counts
# new = ins2018.loc[ins2018["inspection_score"] > 0].sort_values("inspection_date").groupby("business_id").count()

# don't do this: select data
# new = new.loc[new["inspection_id"] == 2].reset_index()

# merge updated
merge = pd.merge(counts, ins2018, how = "left", on = "business_id")

################################################################

# select data again
merged = merge.loc[(merge['inspection_score_y'] > 0) & (merge.inspection_counts >= 2)]

# groupby and aggregate list
mg = merged.groupby('business_id').agg({'inspection_score_y': list})

# display(mg)

            inspection_score_y
business_id                   
31                [96.0, 96.0]
54                [94.0, 94.0]
61                [94.0, 94.0]
66                [98.0, 98.0]
101               [92.0, 92.0]

上的

`groupby`已更新

ins

如何过滤熊猫数据框，然后进行分组和汇总值列表？

1 个答案:

示例

`groupby`已更新

如何过滤熊猫数据框，然后进行分组和汇总值列表？

1 个答案:

示例

groupby已更新

`groupby`已更新