Python搜索数据集

时间:2012-03-21 19:06:20

标签: dataset python-3.x

我把这作为一个家庭作业问题而且不知道我应该怎么做。

首先,我收到了一份数据集,其中列出了员工的姓名,地址,电子邮件等,共有约50名员工。

  

您被要求编写一份申请,以提供有关员工的信息。您的程序应提示用户输入搜索条件。任何符合搜索条件的员工都应按以下格式打印到屏幕上:

     

Position Designation Room and Extension Name and Email Address
  (列以制表符分隔)

     

匹配信息............
  您必须修改数据集以进行处理,您可以选择将其保存在单独的文件中,尽管这不是必需的。您的程序应满足某些约束条件:

     
      
  • 您应该将数据集中的每一列与搜索条件进行比较。
  •   
  • 比较不应区分大小写。
  •   
  • 除电子邮件地址外,所有输出均应为首字母大写。
  •   
  • 如果找到匹配项,则应打印结果行,列应全部排成一行。
  •   
  • 如果没有匹配项,则应打印一条消息,不带标题行。
  •   
     

您应该保存(1)您的程序,以及(2)一段解释您如何完成数据集的处理。

     

您还应该在您的应用程序上运行这些测试用例:

     
      
  • 搜索'brenda'
  •   
  • 搜寻所有文职人员。
  •   
  • 搜索'BredNa'
  •   
  • 找到Carr博士的职位
  •   
  • Neil位于哪个办公室?
  •   

那么,首先,我该如何阅读这个数据集?我应该以文本文件的形式阅读它还是创建一个元组,字典?等


staff = [['prof.liam maguire','head of school','academic','MS127','75605','lguire@ulster.ac.uk'],
 ['prof. martin McGinnity','director of intelligent systems research centre','academic','MS112','75616','tinnity@ulster.ac.uk'],
 ['dr laxmidhar Behera','reader','academic','MS107','75276','lra@ulster.ac.uk'],
 ['dr girijesh Prasad','professor','academic','MS137','75645','gad@ulster.ac.uk'],
 ['dr kevin Curran','senior lecturer','academic','MS130','75565','krran@ulster.ac.uk'],
 ['mr aiden McCaughey','Senior Lecturer','academic','MG126','75131','aughey@ulster.ac.uk'],
 ['dr tom Lunney','postgraduate courses co-ordinator (Senior Lecturer)','academic','MG121D','75388','tfney@ulster.ac.uk'],
 ['dr heather Sayers','undergraduate courses','co-ordinator (Senior Lecturer)','academic','MG121C','75148','hmyers@ulster.ac.uk'],
 ['dr liam Mc Daid','senior lecturer','academic','MS016','75452','ljid@ulster.ac.uk'], 
['mr derek Woods','senior lecturer','academic','MS134','75380','dnoods@ulster.ac.uk'],
 ['dr ammar Belatreche','lecturer','academic','MS104','75185','aatreche@ulster.ac.uk'],
 ['mr michael Callaghan','lecturer','academic','MS132','75771','mjllaghan@ulster.ac.uk'],
 ['dr sonya Coleman','lecturer','academic','MS133','75030','saeman@ulster.ac.uk'],
 ['dr joan Condell','lecturer','academic','MS131','75024','jdell@ulster.ac.uk'],
 ['dr damien Coyle','lecturer','academic','MS103','75170','dhle@ulster.ac.uk'],
 ['mr martin Doherty','lecturer','academic','MG121A','75552','merty@ulster.ac.uk'],
 ['dr jim Harkin','lecturer','academic','MS108','75128','jgrkin@ulster.ac.uk'],
 ['dr yuhua Li','lecturer','academic','MS106','75528','yi@ulster.ac.uk'],
 ['dr sandra Moffett','lecturer','academic','MS015','75381','soffett@ulster.ac.uk'],
 ['mrs mairin Nicell','lecturer','academic','MG127','75007','micell@ulster.ac.uk'],
 ['mrs maeve Paris','lecturer','academic','MG040','75212','m@ulster.ac.uk'],
 ['dr jose Santos','lecturer','academic','MG035','75034','jantos@ulster.ac.uk'],
 ['dr nH. Siddique','lecturer','academic','MG037','75340','nhique@ulster.ac.uk'],
 ['dr zumao Weng','lecturer','academic','MG050','75358','zmng@ulster.ac.uk'],
 ['dr shane Wilson','lecturer','academic','MG038','75527','s.on@ulster.ac.uk'],
 ['dr caitriona carr','computing and Technical Support','MG121B','75003','crr@ulster.ac.uk'],
 ['mr neil McDonnell','technical Services Supervisor','computing and Technical Support','MS030 / MF143','75360','ndonnell@ulster.ac.uk'],
 ['mr paddy McDonough','technical Services Engineer','computing and Technical Support','MS034','75322','p.ugh@ulster.ac.uk'],
 ['mr bernard McGarry','network Assistant','computing and Technical Support','MG132','75644','bgrry@ulster.ac.uk'],
 ['mr stephen Friel','secretary','clerical staff','MG048','75148','siel@ulster.ac.uk'],
 ['ms emma McLaughlin','secretary','clerical staff','MG048','75153','eughlin1@ulster.ac.uk'],
 ['mrs. brenda Plummer','secretary','clerical staff','MS126','75605','blmmer@ulster.ac.uk'],
 ['miss paula Sheerin','secretary','clerical staff','MS111','75616','perin@ulster.ac.uk'],
 ['mrs michelle Stewart','secretary','clerical staff','MG048','75382','mwart@ulster.ac.uk']]


matches = []

criterion = input ("please enter search criterion: ")
 criterion = criterion.lower()

for person in staff:
 for characteristic in person:
 if characteristic in person:
 if criterion in characteristic:
 matches.append(person)
 break
 if len(matches) == 0:
 print("No Match")
 else:
  print("POSITION |||DESIGNATION ||| EXT & ROOM NO||| NAME & EMAIL")
 for i in matches:
 print (i[1].title(),': ',i[2].title(),':',i[3].upper()+ i[4],':',i[0].title(), i[5].title())`

这是我到目前为止所提出的,它似乎有效,你会做出任何改进吗?

3 个答案:

答案 0 :(得分:1)

感谢您诚实并告诉我们这是一个家庭作业问题。 StackOverflow不鼓励直接给出家庭作业问题的答案,但我们可以引导您找到正确的答案。

关于“修改数据集以进行处理”:这意味着数据当前不是一致的格式。您需要做的第一件事是查看您给出的数据,并确定数据的最佳表示。

我建议使用柱状制表符分隔的数据文件 - 通过将数据放在电子表格中并将其另存为文本,可以在Microsoft Excel中轻松创建。 (Excel会抱怨它会丢失所有使它成为电子表格而不是文本文件的东西,但没关系 - 你想要一个文本文件。)保存更新的文件。

Excel生成了所谓的制表符分隔的文本文件:一个二维数据网格(如电子表格的形状),每行一行数据表示(重新描述,< em> linebreak symbol 用于分隔数据行,文本编辑器将其解释为开始在新行上书写的命令),以及制表符(在转义字符串中用Python编写为\t ,但实际上是它自己的单个字符)分隔每行中的单元格。这也称为制表符分隔值或TS​​V。密切相关的是逗号分隔值或CSV,这是Excel中的另一个选项。 CSV也可以代表字符分隔值,这是表示数据网格的任何文本文件的通用术语,使用一些字符(','表示逗号分隔,'\ t'表示选项卡 - 分离)分隔记录。

CSV是一种非常常见的文件格式,因此Python随时准备为您提供帮助。 Python有a library, csv,旨在为您读取这些文件。如果您使用的是Excel文本格式,则需要告诉它dialectexcel-tab,因为当Excel输出它们时,它会将制表符分隔的文件符号化。

您需要构建一个csv.reader来读取您的格式化数据文件。使用列中的顺序来理解当您一次读取一行时获得的列表 - 列的顺序和每行中项目的顺序相同,因此请使用该信息进行索引正确地进入列表以查找每个字段。

一旦你读了一行,你想用它做什么?

您可以在程序中选择存储格式:

  • 将每条记录保存到列表中(就像列表列表一样,因为每条记录都像列表一样)。现在它被加载了,当你想要搜索它时,你迭代你的整个列表列表并使用相等测试来查找匹配。这可以通过列表理解来完成,这几乎可以肯定是老师正在寻找的。
  • 此外,为文件的每一列创建一个dict,并将每个记录存储在每个字典中:每个字典将该列值映射到您的密钥。这里有一个问题!一个字典只能存储每个键的一个记录,但你肯定会有相同的“指定”(多个教授,多个文职人员等)的不同人员,并且没有办法确定没有两个人会有同名。您的索引序列必须自己存储记录列表,而不仅仅是单个记录。

对于重复查询,第二种方法要快得多,因为您在开始时组织所有记录以进行快速查找。然而,第一个更容易实现,更有可能是您的老师所期望的。我建议实现第一个,理解它,然后如果你有时间,实现第二个。

当然,所有这些的用户界面由您决定,但这应该会让您顺利实现程序的核心。祝你好运。

答案 1 :(得分:1)

我会这样做:

staff_details = [["Prof. Liam Maguire","Head of School","Academic","MS127","75605","lp.maguire@ulster.ac.uk"],
                 ["Prof. Martin McGinnity","Director of Intelligent Systems Research Centre","Academic","MS112","75616","tm.mcginnity@ulster.ac.uk"],
                 ["Dr Laxmidhar Behera","Reader","Academic","MS107","75276", "l.behera@ulster.ac.uk"],
                 ["Dr  Girijesh Prasad","Professor","Academic","MS137","75645","g.prasad@ulster.ac.uk"],
                 ["Dr  Kevin Curran","Senior Lecturer","Academic","MS130","75565","kj.curran@ulster.ac.uk"],
                 ["Mr Aiden McCaughey","Senior Lecturer","Academic","MG126","75131","a.mccaughey@ulster.ac.uk"],
                 ["Dr Tom Lunney","Postgraduate Courses’ Co-ordinator (Senior Lecturer) ","Academic","MG121D","75388","tf.lunney@ulster.ac.uk"],
                 ["Dr Heather Sayers","Undergraduate Courses’ Co-ordinator (Senior Lecturer) ","Academic","MG121C","75148","hm.sayers@ulster.ac.uk"],
                 ["Dr  Liam Mc Daid","Senior Lecturer","Academic","MS016","75452","lj.mcdaid@ulster.ac.uk"],
                 ["Mr Derek Woods","Senior Lecturer","Academic","MS134","75380","dn.woods@ulster.ac.uk"],
                 ["Dr Ammar Belatreche","Lecturer","Academic","MS104","75185","a.belatreche@ulster.ac.uk"],
                 ["Mr Michael Callaghan","Lecturer","Academic","MS132","75771","mj.callaghan@ulster.ac.uk"],
                 ["Dr  Sonya Coleman","Lecturer","Academic","MS133","75030","sa.coleman@ulster.ac.uk"],
                 ["Dr  Joan Condell","Lecturer","Academic","MS131","75024","j.condell@ulster.ac.uk"],
                 ["Dr Damien Coyle","Lecturer","Academic","MS103","75170","dh.coyle@ulster.ac.uk"],
                 ["Mr Martin Doherty","Lecturer","Academic","MG121A","75552","m.doherty@ulster.ac.uk"],
                 ["Dr  Jim Harkin","Lecturer","Academic","MS108","75128","jg.harkin@ulster.ac.uk"],
                 ["Dr Yuhua Li","Lecturer","Academic","MS106","75528","y.li@ulster.ac.uk"],
                 ["Dr  Sandra Moffett","Lecturer","Academic","MS015","75381","sm.moffett@ulster.ac.uk"],
                 ["Mrs Mairin Nicell","Lecturer","Academic","MG127","75007","ma.nicell@ulster.ac.uk"],
                 ["Mrs Maeve Paris","Lecturer","Academic","MG040","75212","m.paris@ulster.ac.uk"],
                 ["Dr Jose Santos","Lecturer","Academic","MG035","75034","ja.santos@ulster.ac.uk"],
                 ["Dr  NH. Siddique","Lecturer","Academic","MG037","75340","nh.siddique@ulster.ac.uk"],
                 ["Dr  Zumao Weng","Lecturer","Academic","MG050 ","75358","zm.weng@ulster.ac.uk"],
                 ["Dr  Shane Wilson","Lecturer","Academic","MG038","75527","s.wilson@ulster.ac.uk"],
                 ["Dr Caitriona Carr","Technical Services Engineer","Computing and Technical Support","MG121B","75003","c.carr@ulster.ac.uk"],
                 ["Mr Neil McDonnell","Technical Services Supervisor","Computing and Technical Support","MS030 / MF143","75360", "n.mcdonnell@ulster.ac.uk"],
                 ["Mr Paddy McDonough","Technical Services Engineer","Computing and Technical Support","MS034","75322","p.mcdonough@ulster.ac.uk"],
                 ["Mr Bernard McGarry","Network Assistant","Computing and Technical Support","MG132","75644","bg.mcgarry@ulster.ac.uk"],
                 ["Mr Stephen Friel","Secretary","Clerical Staff","MG048","75148","s.friel@ulster.ac.uk"],
                 ["Ms Emma McLaughlin","Secretary","Clerical Staff","MG048","75153","e.mclaughlin1@ulster.ac.uk"],
                 ["Mrs. Brenda Plummer","Secretary","Clerical Staff","MS126","75605","bl.plummer@ulster.ac.uk"],
                 ["Miss Paula Sheerin","Secretary","Clerical Staff","MS111","75616","p.sheerin@ulster.ac.uk"],
                 ["Mrs Michelle Stewart","Secretary","Clerical Staff","MG048","75382","m.stewart@ulster.ac.uk"]]

search_result = []

search_input = input ("Please enter a search criterion: ")
search_input = search_input.title()

for person in staff_details:
    for characteristic in person:
 if characteristic in person:
     if search_input in characteristic:
             search_result.append(person)
              break

if len(search_result) == 0:
    print ("No staff members match your search criterion of ->", search_input)


else:
    print("We have a match!")
    print ("{0:<30} {1:<40} {2:<40} {3:<50}".format("Position:", "Designation:", "Room and Extension:", "Name and Email:"))
    print ("-" * 160)

for align in search_result:
    print("{0:<30} {1:<40} {2:<40} {3:<50}".format((align[1]), (align[2]), (align[3] + ", Ext:" + align[4]), align[0] + "(" + align[5] + ")"))

我希望这会帮助你!

答案 2 :(得分:0)

我假设您将数据集作为纯文本文件(或电子邮件中的可复制文本等)然后您有两个几个选项:

  1. 创建一个文本文件,其中每一行以指定的格式存储有关一名员工的信息:“名称”,“位置”等。在这种情况下,要进行搜索,您需要扫描文件并打印匹配的行,然后重复匹配的部分。

  2. 使用Python数据类型将信息存储在内存中,例如带有“名称”,“位置”等键的词典列表然后搜索将变得有点复杂(只是一点点,真的),但你将能够以你喜欢的任何方式格式化输出。 但首先你需要通过阅读文本文件来填充数据列表(或者如果你绝望的话,手动硬编码)。

  3. 您可以通过仅从文件的匹配行形成一个字典来组合这些方法。

  4. 您可以使用真正的数据库引擎,例如MySQL,但这对于这项作业来说真的有点过头了。