pandas将表格转换为数据框

时间:2017-03-10 00:48:07

标签: python pandas reshape

我有一个如下所示的数据框(df):

+---------+-------+------------+----------+
| subject | pills |    date    | strength |
+---------+-------+------------+----------+
|       1 |     4 | 10/10/2012 |      250 |
|       1 |     4 | 10/11/2012 |      250 |
|       1 |     2 | 10/12/2012 |      500 |
|       2 |     1 | 1/6/2014   |     1000 |
|       2 |     1 | 1/7/2014   |      250 |
|       2 |     1 | 1/7/2014   |      500 |
|       2 |     3 | 1/8/2014   |      250 |
+---------+-------+------------+----------+

当我在R中使用重塑时,我得到了我想要的东西:

reshape(df, idvar = c("subject","date"), timevar = 'strength', direction = "wide")

+---------+------------+--------------+--------------+---------------+
| subject |    date    | strength.250 | strength.500 | strength.1000 |
+---------+------------+--------------+--------------+---------------+
|       1 | 10/10/2012 | 4            | NA           | NA            |
|       1 | 10/11/2012 | 4            | NA           | NA            |
|       1 | 10/12/2012 | NA           | 2            | NA            |
|       2 | 1/6/2014   | NA           | NA           | 1             |
|       2 | 1/7/2014   | 1            | 1            | NA            |
|       2 | 1/8/2014   | 3            | NA           | NA            |
+---------+------------+--------------+--------------+---------------+

使用pandas:

df.pivot_table(df, index=['subject','date'],columns='strength')

+---------+------------+-------+----+-----+
|         |            | pills            |
+---------+------------+-------+----+-----+
|         | strength   | 250   | 500| 1000|
+---------+------------+-------+----+-----+
| subject | date       |       |    |     |
+---------+------------+-------+----+-----+
| 1       | 10/10/2012 | 4     | NA | NA  |
|         | 10/11/2012 | 4     | NA | NA  |
|         | 10/12/2012 | NA    | 2  | NA  |
+---------+------------+-------+----+-----+
| 2       | 1/6/2014   | NA    | NA | 1   |
|         | 1/7/2014   | 1     | 1  | NA  |
|         | 1/8/2014   | 3     | NA | NA  |
+---------+------------+-------+----+-----+

如何使用pandas获得与R完全相同的输出?我只想要1个标题。

1 个答案:

答案 0 :(得分:28)

转动后,将数据框转换为记录,然后再转换回dataframe:

flattened = pd.DataFrame(pivoted.to_records())
#   subject        date  ('pills', 250)  ('pills', 500)  ('pills', 1000)
#0        1  10/10/2012             4.0             NaN              NaN
#1        1  10/11/2012             4.0             NaN              NaN
#2        1  10/12/2012             NaN             2.0              NaN
#3        2    1/6/2014             NaN             NaN              1.0
#4        2    1/7/2014             1.0             1.0              NaN
#5        2    1/8/2014             3.0             NaN              NaN

如果需要,您现在可以“修复”列名:

flattened.columns = [hdr.replace("('pills', ", "strength.").replace(")", "") \
                     for hdr in flattened.columns]
flattened
#   subject        date  strength.250  strength.500  strength.1000
#0        1  10/10/2012           4.0           NaN            NaN
#1        1  10/11/2012           4.0           NaN            NaN
#2        1  10/12/2012           NaN           2.0            NaN
#3        2    1/6/2014           NaN           NaN            1.0
#4        2    1/7/2014           1.0           1.0            NaN
#5        2    1/8/2014           3.0           NaN            NaN

这很尴尬,但确实有效。