Question

我在Pandas上有以下数据框，我想将日期转置为各列，并对每个人的总时数求和。

# Current Pandas DF

| Full Name |    Date    | Hours| 
 ----------- ------------ ------
|  John A   | 2019-01-01 |  5.7 |
|  John A   | 2019-01-02 |  NaN |
|  John A   | 2019-01-03 |  6.0 |
|  John B   | 2019-01-01 |  8.0 |
|  John B   | 2019-01-02 |  3.5 |
|  John C   | 2019-01-01 |  1.0 |
|  John C   | 2019-01-02 |  1.0 |
|  John C   | 2019-01-03 |  NaN |

# Desired result

| Full Name | 2019-01-01 | 2019-01-02 | 2019-01-03 | Total | 
 ----------- ------------ ------------ ------------ -------
|  John A   |    5.7     |     0.0    |    6.0     |  11.7 |
|  John B   |    8.0     |     3.5    |    0.0     |  11.5 |  
|  John C   |    1.0     |     1.0    |    0.0     |   2.0 |

我已经从原始数据集中手动清除了NaN，并将值替换为0，然后得出了以下代码片段：

pd.pivot_table(sheet_data_cleaned, values = sheet_data_cleaned.groupby('Full Name')[['Hours']].sum(), index=['Full Name'], columns = 'Date').reset_index()

我的代码的问题是它不显示总工时，此外，手动清理NaN并不是最好的方法，特别是如果您有太多记录的话。

我想知道如何替换熊猫中的NaN并获得所需的数据框。随时改进此问题，我们将不胜感激。

Answer 1

突出显示crosstab + margins

pd.crosstab(df['Full Name'],df.Date,df.Hours,margins=True,aggfunc='sum',margins_name='Total').drop('Total').fillna(0)
Out[628]: 
Date          2019-01-01    2019-01-02    2019-01-03   Total
Full Name                                                   
  John A              5.7           0.0           6.0   11.7
  John B              8.0           3.5           0.0   11.5
  John C              1.0           1.0           0.0    2.0

Answer 2

您可以分两步考虑以下内容：

df2 = pd.pivot_table(df, values='Hours', index=['Full Name'],
             columns=['Date'], aggfunc=np.sum).fillna(0).reset_index()
df2['Total'] = df2.apply(lambda row : sum([row[x] for x in df.Date.unique()]), axis = 1)
df2.columns = pd.Index(df2.columns, dtype='object', name=None)
df2

输出

+-------+------------+-------------+-------------+-------------+-------+
|       | Full Name  | 2019-01-01  | 2019-01-02  | 2019-01-03  | Total |
+-------+------------+-------------+-------------+-------------+-------+
|    0  | John A     |        5.7  |        0.0  |        6.0  |  11.7 |
|    1  | John B     |        8.0  |        3.5  |        0.0  |  11.5 |
|    2  | John C     |        1.0  |        1.0  |        0.0  |   2.0 |
+-------+------------+-------------+-------------+-------------+-------+

编辑摆脱df2（Date）中的索引名称

如何旋转表格并使用Pandas获得总数？

2 个答案: