Question

我有以下数据框

df=
 city    code     qty    year
 hyd     1        10    2016
 hyd     2        12    2016
 pune    2        15    2016
 pune    4        25    2016
 hyd     1        10    2017
 hyd     3        12    2017
 pune    1        15    2017
 pune    2        25    2017
 hyd     2        10    2018
 hyd     4        10    2018
 hyd     6        12    2018
 pune    1        15    2018
 pune    4        25    2018

我想在此处添加所有独特年份（2016,2017,2018）并比较同一城市和一年的代码与其他年份相比的年份（即2018年与2017,2016相比， 2015年和2017年与2016,2015等...）。如果在其他年份可以使用相同的城市和代码，则将其标记为Y（如果不存在则为N）。我们比较的对象必须留空。

以下必须是结果数据框。

city    code     qty    year    year_2016     year_2017    year_2018 
hyd     1         10    2016                                 
hyd     2         12    2016                                         
pune    2         15    2016                                  
pune    4         25    2016                                
hyd     1         10    2017        Y                                          
hyd     3         12    2017        N                         
pune    1         15    2017        N                           
pune    2         25    2017        Y                         
hyd     2         10    2018        Y            N        
hyd     4         12    2018        N            N
hyd     6         12    2018        N            N
pune    1         15    2018        N            Y
pune    4         25    2018        Y            N

提前致谢

Answer 1

# Get a list of all year, this way we know how many columns to make and which columns to mark as N
all_years = df.year.unique()

def my_func(x):
    # Function to create new year_... rows

    # Get the city and code names
    city, code = x.name

    # This function will return a pandas.DataFrame
    out = pd.DataFrame()

    # Loop through each year
    for key, year in x.iteritems():
        append_series = pd.Series()

        # If this (city, code) has multiple years we must iterate over each year vs the other years
        iterate = [year]
        if len(x.values) > 1:
            iterate = x.drop(key).values

        # Create a pandas.Series to add to the main dataframe 'out'
        for other_year in iterate:
            append_series.at['year'] = year
            append_series.at["year_"+str(other_year)] = "Y"
            append_series.at["city"] = city
            append_series.at["code"] = code

            # If any year does not show up then we must mark is as N
            for missing_year in (set(all_years) - set(x.values)):
                append_series.at["year_" + str(missing_year)] = "N"

        # Add this series to the main dataframe 'out'
        out = out.append(append_series, ignore_index=True)
    return out

df.groupby(['city', 'code'])['year'].apply(my_func).reset_index(drop=True).fillna("")


Out[]:
    city  code    year year_2016 year_2017 year_2018
0    hyd   1.0  2016.0                   Y         N
1    hyd   1.0  2017.0         Y                   N
2    hyd   2.0  2016.0                   N         Y
3    hyd   2.0  2018.0         Y         N          
4    hyd   3.0  2017.0         N         Y         N
5    hyd   4.0  2018.0         N         N         Y
6    hyd   6.0  2018.0         N         N         Y
7   pune   1.0  2017.0         N                   Y
8   pune   1.0  2018.0         N         Y          
9   pune   2.0  2016.0                   Y         N
10  pune   2.0  2017.0         Y                   N
11  pune   4.0  2016.0                   N         Y
12  pune   4.0  2018.0         Y         N

比较python中列的行

1 个答案: