我有以下数据框
df=
city code qty year
hyd 1 10 2016
hyd 2 12 2016
pune 2 15 2016
pune 4 25 2016
hyd 1 10 2017
hyd 3 12 2017
pune 1 15 2017
pune 2 25 2017
hyd 2 10 2018
hyd 4 10 2018
hyd 6 12 2018
pune 1 15 2018
pune 4 25 2018
我想在此处添加所有独特年份(2016,2017,2018)并比较同一城市和一年的代码与其他年份相比的年份(即2018年与2017,2016相比, 2015年和2017年与2016,2015等...)。如果在其他年份可以使用相同的城市和代码,则将其标记为Y(如果不存在则为N)。我们比较的对象必须留空。
以下必须是结果数据框。
city code qty year year_2016 year_2017 year_2018
hyd 1 10 2016
hyd 2 12 2016
pune 2 15 2016
pune 4 25 2016
hyd 1 10 2017 Y
hyd 3 12 2017 N
pune 1 15 2017 N
pune 2 25 2017 Y
hyd 2 10 2018 Y N
hyd 4 12 2018 N N
hyd 6 12 2018 N N
pune 1 15 2018 N Y
pune 4 25 2018 Y N
提前致谢
答案 0 :(得分:1)
# Get a list of all year, this way we know how many columns to make and which columns to mark as N
all_years = df.year.unique()
def my_func(x):
# Function to create new year_... rows
# Get the city and code names
city, code = x.name
# This function will return a pandas.DataFrame
out = pd.DataFrame()
# Loop through each year
for key, year in x.iteritems():
append_series = pd.Series()
# If this (city, code) has multiple years we must iterate over each year vs the other years
iterate = [year]
if len(x.values) > 1:
iterate = x.drop(key).values
# Create a pandas.Series to add to the main dataframe 'out'
for other_year in iterate:
append_series.at['year'] = year
append_series.at["year_"+str(other_year)] = "Y"
append_series.at["city"] = city
append_series.at["code"] = code
# If any year does not show up then we must mark is as N
for missing_year in (set(all_years) - set(x.values)):
append_series.at["year_" + str(missing_year)] = "N"
# Add this series to the main dataframe 'out'
out = out.append(append_series, ignore_index=True)
return out
df.groupby(['city', 'code'])['year'].apply(my_func).reset_index(drop=True).fillna("")
Out[]:
city code year year_2016 year_2017 year_2018
0 hyd 1.0 2016.0 Y N
1 hyd 1.0 2017.0 Y N
2 hyd 2.0 2016.0 N Y
3 hyd 2.0 2018.0 Y N
4 hyd 3.0 2017.0 N Y N
5 hyd 4.0 2018.0 N N Y
6 hyd 6.0 2018.0 N N Y
7 pune 1.0 2017.0 N Y
8 pune 1.0 2018.0 N Y
9 pune 2.0 2016.0 Y N
10 pune 2.0 2017.0 Y N
11 pune 4.0 2016.0 N Y
12 pune 4.0 2018.0 Y N