用于处理行和列的数据结构

时间:2015-10-24 19:11:13

标签: python data-structures

我抓住了表格形式的Python数据:

Name  Sport   Score  
John  Golf    100
Jill  Rugby   55
John  Hockey  100
Bob   Golf    45

如何在Python中格式化此表,以便于对项目进行排序或分组。例如,如果我想看到打高尔夫球的所有人的名字或所有在任何运动中得分100的人的名字。或者只是约翰的所有数据。

5 个答案:

答案 0 :(得分:1)

pandas' DataFrame将是最佳选择:

import pandas as pd

df = pd.DataFrame({'Name': ['John', 'Jill', 'John', 'Bob'], 
                   'Sport' : ['Golf', 'Rugby', 'Hockey', 'Golf'],
                   'Score': [100, 50, 100, 45]}) 

# the names of people that played Golf

df[df['Sport'] == 'Golf']['Name'].unique()
>> ['John' 'Bob']

# all of the people that scored 100 on any sport.

df[df['Score'] == 100]['Name'].unique()
>> ['John']

# all of the data for just John.
df[df['Name'] == 'John']
>>    Name  Score   Sport
   0  John    100    Golf
   2  John    100  Hockey

答案 1 :(得分:1)

带有mapfilter

namedtuplelambda可用于此任务。

from collections import namedtuple

# Create a named tuple to store the rows
Row = namedtuple('Row', ('name', 'sport', 'score'))

data = '''Name  Sport   Score  
          John  Golf    100
          Jill  Rugby   55
          John  Hockey  100
          Bob   Golf    45'''

# Read the data, skip the first line
lines = data.splitlines()[1:]
rows = []
for line in lines:
    name, sport, score = line.strip().split()
    rows.append(Row(name, sport, int(score)))

# People that played Golf
golf_filter = lambda row: row.sport == 'Golf'
golf_players = filter(golf_filter, rows)

# People that scored 100 on any sport
score_filter = lambda row: row.score == 100
scorers = filter(score_filter, rows)

# People named John
john_filter = lambda row: row.name == 'John'
john_data = filter(john_filter, rows)

# If you want a specific column than you can map the data
# Names of golf players
get_name = lambda row: row.name
golf_players_names = map(get_name, golf_players)

结果:

>>> golf_players
[Row(name='John', sport='Golf', score=100),
 Row(name='Bob', sport='Golf', score=45)]

>>> john_data
[Row(name='John', sport='Golf', score=100),
 Row(name='John', sport='Hockey', score=100)]

>>> scorers
[Row(name='John', sport='Golf', score=100),
 Row(name='John', sport='Hockey', score=100)]

>>> golf_players_names
['John', 'Bob']

答案 2 :(得分:1)

这个怎么样?

yourDS={"name":["John","Jill","John","Bob"],
    "sport":["Golf","Rugby","Hockey","Golf"],
    "score":[100,55,100,45]
}

这应该保持每个条目的关系,因为列表是有序的。

要避免列表中重复元素的影响,请先从列表中创建一个新的set

对于您期望的查询,您可以执行类似的操作。

for index,value in enumerate(yourDS["score"]):
    if value=="x":
        print yourDS["name"][index] 

最好使用list存储结果并将其设为set,以避免某些情况,例如,如果某人的得分为x两种不同的游戏。

答案 3 :(得分:-1)

您可以创建列表列表。每一行都是列表中的一个列表。

lst1=[['John','Golf',100],['Jill','Rugby',55],['John','Hockey',100],['Bob','Golf',45]]
lst100=[]
for lst in lst1:
    if lst[2]==100:
        lst100.append(lst)
print lst100

答案 4 :(得分:-1)

如果您想根据数据检索信息,我会选择SQL。它非常适合回答这些问题:

  

...看到打高尔夫球的所有人的名字......

     

......所有在任何运动中得分100的人......

     

......只是约翰的所有数据。

目前最流行的数据库语言是SQL,事实上,Python实际上通过sqlite3 module内置了对它的支持。

SQL虽然不是一个值得学习的重大任务,但超出了这个答案的范围。要了解这一点,我建议您查看CodecademyCode SchoolSQLZOO(他们都是互动的)。

或者,如果您只想阅读并写出来而不关心其实际含义,请考虑使用内置的csv module