如何基于熊猫中的行值创建新列

时间:2018-07-12 15:48:12

标签: python pandas

我想遍历大熊猫中的行,并根据该值创建一个新列。我在这里设置了数据:

  Political Entity  Recipient ID           Recipient Recipient last name  \
0       Candidates          4350       Whelan, Susan              Whelan   
1       Candidates          4350       Whelan, Susan              Whelan   
2       Candidates          4350       Whelan, Susan              Whelan   
3       Candidates          4350       Whelan, Susan              Whelan   
4       Candidates         15453  Mastroianni, Steve         Mastroianni   

  Recipient first name Recipient middle initial Political Party of Recipient  \
0                Susan                      NaN      Liberal Party of Canada   
1                Susan                      NaN      Liberal Party of Canada   
2                Susan                      NaN      Liberal Party of Canada   
3                Susan                      NaN      Liberal Party of Canada   
4                Steve                      NaN      Liberal Party of Canada   

  Electoral District        Electoral event Fiscal/Election date  \
0              Essex  38th general election           2004-06-28   
1              Essex  38th general election           2004-06-28   
2              Essex  38th general election           2004-06-28   
3              Essex  38th general election           2004-06-28   
4  Windsor--Tecumseh  40th general election           2008-10-14   

        ...       Monetary amount Non-Monetary amount  \
0       ...                 800.0                 0.0   
1       ...                1280.0                 0.0   
2       ...                 250.0                 0.0   
3       ...                1000.0                 0.0   
4       ...                 800.0                 0.0   

我想创建一个新列,其中包含政党和年份,并添加货币值。例如:

+------------------------------+----------------------------+--+--+--+
| 2004 Liberal Party of Canada | 2004 Green Party of Canada |  |  |  |
+------------------------------+----------------------------+--+--+--+
| 8000                         | 0                          |  |  |  |
+------------------------------+----------------------------+--+--+--+
|                              |                            |  |  |  |
+------------------------------+----------------------------+--+--+--+
|                              |                            |  |  |  |
+------------------------------+----------------------------+--+--+--+

我创建了两个函数来帮助入门:

def year_political_column(row):
    return row['Fiscal/Election date'][:4] + ' ' + row['Political Party of Recipient']


def monetary(row):
    return row['Monetary amount']

每当我查找解决方案时,似乎都必须已设置了列。谁能引导我朝正确的方向前进?

样本输出应为:

  Political Entity  Recipient ID           Recipient Recipient last name  \
0       Candidates          4350       Whelan, Susan              Whelan   
1       Candidates          4350       Whelan, Susan              Whelan   
2       Candidates          4350       Whelan, Susan              Whelan   
3       Candidates          4350       Whelan, Susan              Whelan   
4       Candidates         15453  Mastroianni, Steve         Mastroianni   

  Recipient first name Recipient middle initial Political Party of Recipient  \
0                Susan                      NaN      Liberal Party of Canada   
1                Susan                      NaN      Liberal Party of Canada   
2                Susan                      NaN      Liberal Party of Canada   
3                Susan                      NaN      Liberal Party of Canada   
4                Steve                      NaN      Liberal Party of Canada   

  Electoral District        Electoral event Fiscal/Election date  \
0              Essex  38th general election           2004-06-28   
1              Essex  38th general election           2004-06-28   
2              Essex  38th general election           2004-06-28   
3              Essex  38th general election           2004-06-28   
4  Windsor--Tecumseh  40th general election           2008-10-14   

        ...       Monetary amount Non-Monetary amount  \
0       ...                 800.0                 0.0   
1       ...                1280.0                 0.0   
2       ...                 250.0                 0.0   
3       ...                1000.0                 0.0   
4       ...                 800.0                 0.0   

  Contribution given through Ontario first name Ontario last name  \
0                        NaN                J M            
1                        NaN                  J             
2                        NaN                  B            
3                        NaN                  H            
4                        NaN                  H            

   Ontario Address Ontario city Ontario Province Ontario Postal Code  \
0                

  Ontario Phone #  
0      
1      
2      
3      
4      

我要查找的所有政治数据都附在右侧。

2 个答案:

答案 0 :(得分:1)

使用选举年和政党名称创建一列,然后进行分组并转置:

df['year_political'] = df['Fiscal/Election date'].astype(str).str.slice(0,4) + ' '+ df['Political Party of Recipient']
df.groupby('year_political')['Monetary amount'].sum().reset_index().transpose()

答案 1 :(得分:1)

这可以通过多种方式实现:

  • pivot
  • pivot_table
  • groupby

但是,其中大多数将需要刷牙才能输出所需的格式。如果您不想要聚合函数并且想要输入,则只有数字2起作用。

def column_name(row):
    return '{} {}'.format(row['Fiscal/Election date'].year, row['initial Political Party of Recipient'])

df['Fiscal/Election date'] = pd.to_datetime(df['Fiscal/Election date'])

df['Column Name'] = df.apply(column_name, axis=1)

1)pivot_table

In [4]: df[['Column Name', 'Monetary amount']].pivot_table(columns='Column Name'
   ...: , 
   ...:                                                    values='Monetary amou
   ...: nt', 
   ...:                                                    aggfunc='sum')
   ...:                                                    
Out[4]: 
Column Name      2004 Liberal Party of Canada  2008 Liberal Party of Canada
Monetary amount                          3330                           800

2)pivot

In [5]: (df[['Column Name', 'Monetary amount']]
   ...: .pivot(columns='Column Name', values='Monetary amount'))
Out[5]: 
Column Name  2004 Liberal Party of Canada  2008 Liberal Party of Canada
0                                   800.0                           NaN
1                                  1280.0                           NaN
2                                   250.0                           NaN
3                                  1000.0                           NaN
4                                     NaN                         800.0

3)groupby

In [6]: pd.DataFrame(df.groupby('Column Name')['Monetary amount'].sum()).transpo
   ...: se()
Out[6]: 
Column Name      2004 Liberal Party of Canada  2008 Liberal Party of Canada
Monetary amount                          3330                           800