如何在python中将列转换为数字以进行排序

时间:2017-09-17 14:12:53

标签: python

我是python(学习者)的新手。请检查我的问题并帮助我解决问题。

我的csv文件包含以下内容

test,cycle,date,status
func,2,09/07/17,pass
func,10,09/08/17,fail
func,3,09/08/17,pass
func,1,09/08/17,no run
func,22,09/08/17,in progress
func,11,09/08/17,on hold

当我排序第二列(循环)时,它显示以下输出

['func', '1', '09/08/17', 'no run']
['func', '10', '09/08/17', 'fail']
['func', '11', '09/08/17', 'on hold']
['func', '2', '09/07/17', 'pass']
['func', '22', '09/08/17', 'in progress']
['func', '3', '09/08/17', 'pass']

我遇到的问题是它是排序为字符串,因此它显示输出为1,10,11,2,22,3。但我想得到输出按数字排序(int / float)这样我就可以得到输出1,2,3,10,11,22。

以下是我的小脚本。你可以帮我修改脚本,在排序之前将列改为数字吗?

with open ('C:\Automation\sample.csv') as csvfile:

readCSVfile = csv.reader(csvfile,delimiter =',')

for row in readCSVfile:
sort = sorted(readCSVfile, key=operator.itemgetter(1), reverse = False)
 for eachline in sort:
print eachline`

4 个答案:

答案 0 :(得分:0)

您可以在阅读时预先处理这些行:

#!python2
import csv
import operator

with open ('sample.csv','rb') as csvfile:
    readCSVfile = csv.reader(csvfile)
    header = next(readCSVfile)
    rows = []
    for row in readCSVfile:
        test,cycle,date,status = row
        rows.append([test,int(cycle),date,status])
rows.sort(key=operator.itemgetter(1))
for row in rows:
    print row

输出:

['func', 1, '09/08/17', 'no run']
['func', 2, '09/07/17', 'pass']
['func', 3, '09/08/17', 'pass']
['func', 10, '09/08/17', 'fail']
['func', 11, '09/08/17', 'on hold']
['func', 22, '09/08/17', 'in progress']

您还可以使用不同的排序键,将列保留为字符串:

#!python2
import csv
import operator

with open ('sample.csv','rb') as csvfile:
    readCSVfile = csv.reader(csvfile)
    header = next(readCSVfile)
    rows = [row for row in readCSVfile]
rows.sort(key=lambda row: int(row[1]))
for row in rows:
    print row

输出:

['func', '1', '09/08/17', 'no run']
['func', '2', '09/07/17', 'pass']
['func', '3', '09/08/17', 'pass']
['func', '10', '09/08/17', 'fail']
['func', '11', '09/08/17', 'on hold']
['func', '22', '09/08/17', 'in progress']

答案 1 :(得分:0)

然后你必须将它转换为数字。 Python csv模块不会自动识别数据类型。

你可以通过以下方式来实现:

numberedCSV = []
for row in readCSVfile:
    row[1] = int(row[1])
    numberedCSV.append(row)

然后对numberedCSV进行排序。

顺便说一下,我不明白你发布的代码的意图。为什么需要两个循环?

答案 2 :(得分:0)

这可能就是你要找的东西。

    # take second element for sort
def takeSecond(elem):
    return int(elem[1])

# random list
stuff = [['func', '1', '09/08/17', 'no run'],
 ['func', '10', '09/08/17', 'fail'],
 ['func', '11', '09/08/17', 'on hold'],
 ['func', '2', '09/07/17', 'pass'],
 ['func', '22', '09/08/17', 'in progress'],
 ['func', '3', '09/08/17', 'pass']]

# sort list with key
sortedList = sorted(stuff, key=takeSecond)

# print list
print('Sorted list:', sortedList)

欢呼声。

答案 3 :(得分:0)

如其他答案所述,您可以

  • 在排序时使用operator.itemgetter以外的其他功能将值转换为int
  • 或使用for循环在排序之前转换数组数据。

但是,如果经常使用这种表格数据,最好使用pandas。您需要安装它,但是再次:如果经常执行此操作,那是值得的。

import pandas as pd

df = pd.read_csv('sample.csv')

df['cycle'] = df['cycle'].astype(int)

print(df.sort_values(by='cycle'))

# or reverse
print(df.sort_values(by='cycle', ascending=False))