Question

我正在从Excel文件中抓取一些数据，并在python中对其进行处理。但是，列中的数据似乎有一些字符串，而我需要它们是整数。我正在尝试对数据进行排序，但它给了我错误，因为它正在尝试对字符串中的数字进行排序。

我正在尝试计算档案中各个年龄段的谋杀案数量。

这是我的代码。

xl = pd.ExcelFile('Murders.xlsx')
df = xl.parse('Sheet1')
#df = df[df["Perpetrator Age"].ne("Blanks")]
age = df['Perpetrator Age']

#print(df["Perpetrator Age"].dtype)
freq1 = collections.Counter(df['Perpetrator Age'].sort_values())
freq = [{'Perpetrator_Age': m, 'Freq': f} for m, f in freq1.items()]
file = open("MurderPerpAge.js", "w+")
file.write(json.dumps(freq))
file.close()

我尝试使用Excel内置的“筛选器”按钮，但是数据中似乎仍然存在字符串。这是错误/输出：

TypeError：“ int”和“ str”的实例之间不支持“ <”

我希望输出按照年龄进行排序，如下例所示

[{"Perpetrator_Age": 15, "Freq": 5441}, {"Perpetrator_Age": 17, "Freq": 14196},...

Answer 1

我建议使用pandas.astype（'int16'），如下所示：

（int16，因为您要处理年龄，范围非常有限）

df['Perpetrator Age'] = df['Perpetrator Age'].astype('int16')
df.sort_values(axis=0)

In [14]: df['Perpetrator Age'].astype('int16').sort_values(axis=0).head()                                 
Out[14]: 
83    15
62    15
64    15
27    15
48    17
Name: Perpetrator Age, dtype: int16

希望对您有帮助！

如何从Excel中的整数列中过滤掉字符串以在Python中处理

1 个答案: