Question

我试图在Python 2.7中对以下Pandas DataFrame进行排序：

import numpy as np
import pandas as pd

heading_cols =  ["Video Title", "Up Ratings", "Down Ratings", "Views", "User Name","Subscribers"]
column_1 = ["Adelaide","Brisbane","Darwin","Hobart","Sydney","Melbourne","Perth"]
column_2 = [1295, 5905, 112, 1357, 2058, 1566, 5386]
column_3 = [1158259, 1857594, 120900, 205556, 4336374, 3806092, 1554769]
column_4 = [600.5, 1146.4, 1714.7, 619.5, 1214.8, 646.9, 869.4]
column_5 = ["Bob","Tom","Dave","Sally","Rick","Mary","Roberta"]
column_6 = [25000,30000,15000,15005,20000,31111,11000]

#Generate data:
xdata_arr = np.array([column_1,column_2,column_3,column_4,column_5,column_6]).T

# Generate the DataFrame:
df = pd.DataFrame(xdata_arr, columns=heading_cols)
print df

接下来的两行代码会导致问题：

# Print DataFrame and basic stats:
print df["Up Ratings"].describe()
print df.sort('Views', ascending=False)

问题：

排序不适用于任何列。
统计数据应该包括mean，std，min，max等等。这些都不会出现。

问题是dtypes（）正在返回＆＃34; object＆＃34;对于所有列。这是错的。有些应该是整数，但我无法弄清楚如何只改变数字。我试过了：

df.convert_objects(convert_numeric=True)

但这不起作用。所以，然后我去了NumPy数组并试图改变那里的dtypes：

dt = np.dtype([(heading_cols[0], np.str_), (heading_cols[1], np.int16), (heading_cols[2], np.int16), (heading_cols[3], np.int16), (heading_cols[4], np.str_), (heading_cols[5], np.int16) ])

但这也不起作用。

有没有办法手动将dtype更改为数字？

Answer 1

与pandas中的大多数方法一样，convert_objects会返回一个NEW对象。

In [20]: df.convert_objects(convert_numeric=True)
Out[20]: 
  Video Title  Up Ratings  Down Ratings   Views User Name  Subscribers
0    Adelaide        1295       1158259   600.5       Bob        25000
1    Brisbane        5905       1857594  1146.4       Tom        30000
2      Darwin         112        120900  1714.7      Dave        15000
3      Hobart        1357        205556   619.5     Sally        15005
4      Sydney        2058       4336374  1214.8      Rick        20000
5   Melbourne        1566       3806092   646.9      Mary        31111
6       Perth        5386       1554769   869.4   Roberta        11000

In [21]: df.convert_objects(convert_numeric=True).dtypes
Out[21]: 
Video Title      object
Up Ratings        int64
Down Ratings      int64
Views           float64
User Name        object
Subscribers       int64
dtype: object

来自NumPy数组的Pandas Dataframe - 数据类型不正确且无法更改

1 个答案: