Pandas CSV平均和排序

时间:2015-07-18 13:33:35

标签: python database sorting csv pandas

对于我的最终计算Python任务,我被要求用Python编写一个数据库程序,这将允许我访问三个类数据库,每个数据库包含一个参加算术测验的学生的三个分数。有三种方法必须对代码进行排序;按字母顺序使用名字,作为平均值,通过将所有三个分数相加并除以三来找到唯一值,并将分数从最高分数排序到最低分数。 因此,假设以下是其中一个CSV文件:

name1       name2 score1 score2 score3
Atticus     Finch 9      8      10
Jem         Finch 5      7      6
Jean Louise Finch 3      2      4

如果最终用户希望按字母顺序排序,这就是它在Python IDLE GUI上的样子:

Atticus     Finch 9      8      10
Jean Louise Finch 3      2      4
Jem Finch   Finch 5      7      6

如果最终用户希望它按平均值排序,那么它应该是这样的:

Atticus     Finch 9
Jem         Finch 6
Jean Louise Finch 3

如果最终用户希望它从最高到最低排序,那么它应该是这样的:

Atticus     Finch 10     9      8
Jem         Finch 7      6      5
Jean Louise Finch 4      3      2

现在这是我的代码目前的样子:

print("Welcome to the Database sorter. The system works based on the following functions. Choose your class by inputting a letter, and choose the method of sorting the data by inputing a number afterwards. A is for Class A, B is for Class B and C is the Class C.1 is for soritng the data as an average, 2 is for sorting the data in alphabetical order and 3 is for sorting the data from highest to lowest.")

classanddatasorter =''
while classanddatasorter not in ["A1","A2","A3","B1","B2","B3","C1","C2","C3"]:
classanddatasorter = input("You have the following nine options. Input A1 to sort the results of Class A as an average. Input A2 to sort the results of Class A in alphabetical order. Input A3 to sort the results of Class A from highest to lowest. Input B1 to sort the results of Class B as an average. Input B2 to sort the results of Class B in alphabetical order. Input B3 to sort the results of Class B from highest to lowest. Input C1 to sort the results of Class C as an average. Input C2 to sort the results of Class C in alphabetical order. Input C3 to sort the results of Class C from highest to lowest. ")
if classanddatasorter == "A1":
 df = pd.read_csv('classa.csv')
 df[["score1", "score2","score3"]].mean(axis=1)

elif classanddatasorter == "A2":
 df = pd.read_csv('classa.csv')
 saved_column = df.column_name
 name = df.name
 name.sort 

elif classanddatasorter == "A3":
 df = pd.read_csv('classa.csv')
 df.sort[('score1','score2','score3'], ascending=False) 

elif classanddatasorter == "B1":
 df = pd.read_csv('classb.csv')
 df[["score1", "score2","score3"]].mean(axis=1)  

elif classanddatasorter == "B2":
 df = pd.read_csv('classb.csv')
 saved_column = df.column_name
 name = df.name

elif classanddatasorter == "B3":
 df = pd.read_csv('classb.csv')
 df.sort[('score1','score2','score3'], ascending=False)

elif classanddatasorter == "C1":
 df = pd.read_csv('classc.csv')
 df[["score1", "score2","score3"]].mean(axis=1)

elif classanddatasorter == "C2":
 bamboo = pd.read_csv('classc.csv')
 saved_column = df.column_name
 name = df.name
 name.sort 

elif classanddatasorter == "C3":
 df = pd.read_csv('classc.csv')
 df.sort[('score1','score2','score3'], ascending=False)

到目前为止我收到了以下错误:

尝试将代码排序为平均值:

 Traceback (most recent call last):
  File "C:\Users\MVMCJK\Downloads\Python code\Seperate independent draft of Task 3 (not intergated with Task 1 and 2) draft 3.py", line 70, in <module>
df[["score1", "score2","score3"]].mean(axis=1)
  File "C:\Users\MVMCJK\Anaconda3\lib\site-packages\pandas\core\frame.py", line 1791, in __getitem__
return self._getitem_array(key)
  File "C:\Users\MVMCJK\Anaconda3\lib\site-packages\pandas\core\frame.py", line 1835, in _getitem_array
indexer = self.ix._convert_to_indexer(key, axis=1)
  File "C:\Users\MVMCJK\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1112, in _convert_to_indexer
raise KeyError('%s not in index' % objarr[mask])
KeyError: "['score1' 'score2' 'score3'] not in index"

尝试按字母顺序对代码进行排序:

Traceback (most recent call last):
  File "C:\Users\MVMCJK\Downloads\Python code\Seperate independent draft of Task 3 (not intergated with Task 1 and 2) draft 3.py", line 74, in <module>
saved_column = df.column_name
  File "C:\Users\MVMCJK\Anaconda3\lib\site-packages\pandas\core\generic.py", line 2150, in __getattr__
(type(self).__name__, name))
AttributeError: 'DataFrame' object has no attribute 'column_name'

最后一部分甚至不能远程工作:由于语法无效,它拒绝默认运行,我必须消除它以使程序正常工作,当我输入A3时甚至不给出响应。 我已经尝试了谷歌搜索KeyError和AttributeError但我找不到任何与我的问题相关的东西,让我找到了进一步的修复。有谁知道我的节目有什么好笑的?任何帮助将不胜感激。

编辑:更新了仍未运行的代码:

print("Welcome to the Database sorter. The system works based on the following functions. Choose your class by inputting a letter, and choose the method of sorting the data by inputing a number afterwards. A is for Class A, B is for Class B and C is the Class C.1 is for soritng the data as an average, 2 is for sorting the data in alphabetical order and 3 is for sorting the data from highest to lowest.")
classanddatasorter =''
while classanddatasorter not in ["A1","A2","A3","B1","B2","B3","C1","C2","C3"]:
classanddatasorter = input("You have the following nine options. Input A1 to sort the results of Class A as an average. Input A2 to sort the results of Class A in alphabetical order. Input A3 to sort the results of Class A from highest to lowest. Input B1 to sort the results of Class B as an average. Input B2 to sort the results of Class B in alphabetical order. Input B3 to sort the results of Class B from highest to lowest. Input C1 to sort the results of Class C as an average. Input C2 to sort the results of Class C in alphabetical order. Input C3 to sort the results of Class C from highest to lowest. ")
if classanddatasorter == "A1":
 df = pd.read_csv('classa.csv')
 df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1)

elif classanddatasorter == "A2":
 df = pd.read_csv('classa.csv', index_col='name1')
 saved_column = df.column_name
 name = df.name
 name.sort 

elif classanddatasorter == "A3":
 df = pd.read_csv('classa.csv')
 scores = df[['score1', 'score2', 'score3']].values
 scores.sort(axis=1)


elif classanddatasorter == "B1":
 df = pd.read_csv('classb.csv')
 df['average'] = df[["score1", "score2","score3"]].mean(axis=1)


elif classanddatasorter == "B2":
 df = pd.read_csv('classb.csv',index_col='name1')
 saved_column = df.column_name
 name = df.name

elif classanddatasorter == "B3":
 df = pd.read_csv('classb.csv')
 scores = df[['score1', 'score2', 'score3']].values
 scores.sort(axis=1)

elif classanddatasorter == "C1":
 df = pd.read_csv('classc.csv')
 df['average'] = df[["score1", "score2","score3"]].mean(axis=1)

elif classanddatasorter == "C2":
 df = pd.read_csv('classc.csv',index_col='name1')
 saved_column = df.column_name
 name = df.name
 df = name.sort 

elif classanddatasorter == "C3":
 df = pd.read_csv('classc.csv')
 scores = df[['score1', 'score2', 'score3']].values
 scores.sort(axis=1)

编辑2:更新了一些bakkal的代码示例。

print("Welcome to the Database sorter. The system works based on the following functions. Choose your class by inputting a letter, and choose the method of sorting the data by inputing a number afterwards. A is for Class A, B is for Class B and C is the Class C.1 is for soritng the data as an average, 2 is for sorting the data in alphabetical order and 3 is for sorting the data from highest to lowest.")
classanddatasorter =''
while classanddatasorter not in ["A1","A2","A3","B1","B2","B3","C1","C2","C3"]:
 classanddatasorter = input("You have the following nine options. Input A1 to sort the results of Class A as an average. Input A2 to sort the results of Class A in alphabetical order. Input A3 to sort the results of Class A from highest to lowest. Input B1 to sort the results of Class B as an average. Input B2 to sort the results of Class B in alphabetical order. Input B3 to sort the results of Class B from highest to lowest. Input C1 to sort the results of Class C as an average. Input C2 to sort the results of Class C in alphabetical order. Input C3 to sort the results of Class C from highest to lowest. ")

if classanddatasorter == "A1":
 df = pd.read_csv('classa.csv')
 print('Sorted by name1')
 df.sort('name1')
 print(df)
elif classanddatasorter == "A2":
 df = pd.read_csv('classa.csv')
 print('Sorted by average column')
 df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1)
 print(df)
 print(df[['name1', 'name2', 'average']].sort('average'))
elif classanddatasorter == "A3":
 df = pd.read_csv('classa.csv')
 print('Sorted scores')
 scores = df[['score1', 'score2', 'score3']].values
 scores.sort(axis=1)

 for i in xrange(0, scores.shape[1]):
     column_name = 'rank{}'.format(i)
     df[column_name] = scores[:, i]

print(df[['name1', 'name2', 'rank2', 'rank1', 'rank0']])
elif classanddatasorter == "B1":
 df = pd.read_csv('classb.csv')
 print('Sorted by name1')
 df.sort('name1')
 print(df)
elif classanddatasorter == "B2":
 df = pd.read_csv('classb.csv')
 print('Sorted by average column')
 df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1)
 print(df)
 print(df[['name1', 'name2', 'average']].sort('average'))
elif classanddatasorter == "B3":
 df = pd.read_csv('classb.csv')
 print('Sorted scores')
 scores = df[['score1', 'score2', 'score3']].values
 scores.sort(axis=1)

for i in xrange(0, scores.shape[1]):
    column_name = 'rank{}'.format(i)
    df[column_name] = scores[:, i]

print(df[['name1', 'name2', 'rank2', 'rank1', 'rank0']])
elif classanddatasorter == "C1":
 df = pd.read_csv('classc.csv')
 print('Sorted by name1')
 df.sort('name1')
 print(df)
elif classanddatasorter == "C2":
 df = pd.read_csv('classc.csv')
 print('Sorted by average column')
 df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1)
 print(df)
 print(df[['name1', 'name2', 'average']].sort('average'))
elif classanddatasorter == "C3":
 df = pd.read_csv('classc.csv')
 print('Sorted scores')
 scores = df[['score1', 'score2', 'score3']].values
 scores.sort(axis=1)

 for i in xrange(0, scores.shape[1]):
     column_name = 'rank{}'.format(i)
     df[column_name] = scores[:, i]

print(df[['name1', 'name2', 'rank2', 'rank1', 'rank0']]) 

1 个答案:

答案 0 :(得分:0)

解析和探索

假设我们有一个这样的CSV文件(在逗号后面留出空格,并用逗号分隔,否则你需要使用特定格式的CSV选项)

<强> scores.csv

name1,name2,score1,score2,score3
Atticus,Finch,9,8,10
Jem,Finch,5,7,6
Jean Louise,Finch,3,2,4

我们阅读了CSV文件

df = pd.read_csv('scores.csv')

现在df是:

         name1  name2  score1  score2  score3
0      Atticus  Finch       9       8      10
1          Jem  Finch       5       7       6
2  Jean Louise  Finch       3       2       4

df.columns是:

Index([u'name1', u'name2', u'score1', u'score2', u'score3'], dtype='object')

您可以看到dfcolumns但没有column_name属性,因此您的错误低于

  

AttributeError:&#39; DataFrame&#39;对象没有属性&#39; column_name&#39;

排序

现在让我们按字母顺序排序

df.sort('name1')

结果是:

         name1  name2  score1  score2  score3
0      Atticus  Finch       9       8      10
2  Jean Louise  Finch       3       2       4
1          Jem  Finch       5       7       6

你想要平均值,让我们添加一列

df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1)

df现在有一个你可以排序的新列!

         name1  name2  score1  score2  score3  average
0      Atticus  Finch       9       8      10        9
1          Jem  Finch       5       7       6        6
2  Jean Louise  Finch       3       2       4        3

如果您只想查看average

df[['name1', 'name2', 'average']].sort('average')


         name1  name2  average
0      Atticus  Finch        9
1          Jem  Finch        6
2  Jean Louise  Finch        3

您想要的最后一个分数排序有点棘手,因为数据不整齐/标准化,但这是一次尝试

scores = df[['score1', 'score2', 'score3']].values

scores现在看起来像这样

array([[ 9,  8, 10],
       [ 5,  7,  6],
       [ 3,  2,  4]])

我们对scores数组

进行排序
scores.sort(axis=1)

array([[ 8,  9, 10],
       [ 5,  6,  7],
       [ 2,  3,  4]])

这些是您想要的排序分数,因此我们将它们放入我们的df,我们必须为每个分数列执行此操作,因此我们可以使用scores.shape[1]这是该2D数组中的列数

for i in xrange(0, scores.shape[1]):
    column_name = 'rank{}'.format(i)
    df[column_name] = scores[:, i]

现在我们的df看起来像这样

         name1  name2  score1  score2  score3  rank0  rank1  rank2
0      Atticus  Finch       9       8      10      8      9     10
1          Jem  Finch       5       7       6      5      6      7
2  Jean Louise  Finch       3       2       4      2      3      4

获得你想要的显示器

df[['name1', 'name2', 'rank2', 'rank1', 'rank0']]


         name1  name2  rank2  rank1  rank0
0      Atticus  Finch     10      9      8
1          Jem  Finch      7      6      5
2  Jean Louise  Finch      4      3      2

整洁数据

您可以阅读this PDF paper

,详细了解整理数据

基本上,如果例如,很多操作会更容易您的数据应如下所示

name, test, score
bob, 1, 10
bob, 2, 9

而不是

name, score1, score2
bob, 10, 9

Python脚本

import pandas as pd
df = pd.read_csv('scores.csv')

print('Original Data')
print(df)

print('Sorted by name1')
df.sort('name1')
print(df)

print('Sorted by average column')
df['average'] = df[['score1', 'score2', 'score3']].mean(axis=1)
print(df)
print(df[['name1', 'name2', 'average']].sort('average'))

print('Sorted scores')
scores = df[['score1', 'score2', 'score3']].values
scores.sort(axis=1)

for i in xrange(0, scores.shape[1]):
    column_name = 'rank{}'.format(i)
    df[column_name] = scores[:, i]

print(df[['name1', 'name2', 'rank2', 'rank1', 'rank0']])

您也可以将结果数据框保存到另一个print(),而不是.csv.to_csv('score_sorted_avg.csv')