Python Pandas Dataframe赋值

时间:2017-02-06 20:02:32

标签: python python-3.x pandas dataframe

我正在关注Lynda教程,他们使用以下代码:

index.html

它完美无缺。但是,在我的情况下,似乎代码没有编译,对于最后一行我一直收到错误。

  

TypeError:无法将项目插入尚未成为现有类别的CategoricalIndex

我在视频中知道他们使用的是Python 2,但是因为我正在学习工作(使用Python 3),所以我有Python 3。我已经能够弄清楚大多数差异,但是我无法弄清楚如何用乘客的总和来创建这个名为import pandas as pd import seaborn flights = seaborn.load_dataset('flights') flights_indexed = flights.set_index(['year','month']) flights_unstacked = flights_indexed.unstack() flights_unstacked['passengers','total'] = flights_unstacked.sum(axis=1) 的新列。

1 个答案:

答案 0 :(得分:8)

此错误消息的根本原因是month列的分类性质:

In [42]: flights.dtypes
Out[42]:
year             int64
month         category
passengers       int64
dtype: object

 In [43]: flights.month.cat.categories
Out[43]: Index(['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'], d
type='object')

并且您正在尝试添加类别total - Pandas不喜欢这样。

解决方法:

In [45]: flights.month.cat.add_categories('total', inplace=True)

In [46]: x = flights.pivot(index='year', columns='month', values='passengers')

In [47]: x['total'] = x.sum(1)

In [48]: x
Out[48]:
month  January  February  March  April    May   June   July  August  September  October  November  December   total
year
1949     112.0     118.0  132.0  129.0  121.0  135.0  148.0   148.0      136.0    119.0     104.0     118.0  1520.0
1950     115.0     126.0  141.0  135.0  125.0  149.0  170.0   170.0      158.0    133.0     114.0     140.0  1676.0
1951     145.0     150.0  178.0  163.0  172.0  178.0  199.0   199.0      184.0    162.0     146.0     166.0  2042.0
1952     171.0     180.0  193.0  181.0  183.0  218.0  230.0   242.0      209.0    191.0     172.0     194.0  2364.0
1953     196.0     196.0  236.0  235.0  229.0  243.0  264.0   272.0      237.0    211.0     180.0     201.0  2700.0
1954     204.0     188.0  235.0  227.0  234.0  264.0  302.0   293.0      259.0    229.0     203.0     229.0  2867.0
1955     242.0     233.0  267.0  269.0  270.0  315.0  364.0   347.0      312.0    274.0     237.0     278.0  3408.0
1956     284.0     277.0  317.0  313.0  318.0  374.0  413.0   405.0      355.0    306.0     271.0     306.0  3939.0
1957     315.0     301.0  356.0  348.0  355.0  422.0  465.0   467.0      404.0    347.0     305.0     336.0  4421.0
1958     340.0     318.0  362.0  348.0  363.0  435.0  491.0   505.0      404.0    359.0     310.0     337.0  4572.0
1959     360.0     342.0  406.0  396.0  420.0  472.0  548.0   559.0      463.0    407.0     362.0     405.0  5140.0
1960     417.0     391.0  419.0  461.0  472.0  535.0  622.0   606.0      508.0    461.0     390.0     432.0  5714.0

更新:或者,如果您不想触摸原始DF,则可以删除flights_unstacked DF中的分类列:

In [76]: flights_unstacked.columns = \
    ...:     flights_unstacked.columns \
    ...:     .set_levels(flights_unstacked.columns.get_level_values(1).categories,
    ...:                 level=1)
    ...:

In [77]: flights_unstacked['passengers','total']  = flights_unstacked.sum(axis=1)

In [78]: flights_unstacked
Out[78]:
      passengers
month    January February March April  May June July August September October November December total
year
1949         112      118   132   129  121  135  148    148       136     119      104      118  1520
1950         115      126   141   135  125  149  170    170       158     133      114      140  1676
1951         145      150   178   163  172  178  199    199       184     162      146      166  2042
1952         171      180   193   181  183  218  230    242       209     191      172      194  2364
1953         196      196   236   235  229  243  264    272       237     211      180      201  2700
1954         204      188   235   227  234  264  302    293       259     229      203      229  2867
1955         242      233   267   269  270  315  364    347       312     274      237      278  3408
1956         284      277   317   313  318  374  413    405       355     306      271      306  3939
1957         315      301   356   348  355  422  465    467       404     347      305      336  4421
1958         340      318   362   348  363  435  491    505       404     359      310      337  4572
1959         360      342   406   396  420  472  548    559       463     407      362      405  5140
1960         417      391   419   461  472  535  622    606       508     461      390      432  5714