Question

我有这个数据集：

           Game1    Game2   Game3   Game4     Game5

Player1       2        6        5       2        2

Player2       6        4        1       8        4

Player3       8        3        2       1        5

Player4       4        9        4       7        9

我想为每个玩家计算 5场比赛的总和。

这是我的代码：

import csv f=open('Games','rb') f=csv.reader(f,delimiter=';') lst=list(f) lst import numpy as np myarray = np.asarray(lst) x=myarray[1,1:] #First player y=np.sum(x)

我有错误＆＃34;无法使用灵活类型＆＃34;执行缩减。我真的很陌生，我需要你的帮助。

谢谢

Answer 1

使用numpy的复杂性是一个有两个错误源（和要读取的文档），即python本身以及numpy。

我相信你的问题在于你正在使用所谓的structured (numpy) array。

考虑以下示例：

>>> import numpy as np
>>> a = np.array([(1,2), (4,5)],  dtype=[('Game 1', '<f8'), ('Game 2', '<f8')])
>>> a.sum()
TypeError: cannot perform reduce with flexible type

现在，我首先选择我想要使用的数据：

>>> import numpy as np
>>> a = np.array([(1,2), (4,5)],  dtype=[('Game 1', '<f8'), ('Game 2', '<f8')])
>>> a["Game 1"].sum()
5.0

这就是我想要的。

也许您会考虑使用pandas（python库），或将语言更改为R。

个人意见

即使“numpy”肯定是一个强大的库，我仍然避免将它用于数据科学和其他“活动”，其中程序是围绕“灵活”数据类型设计的。我个人在需要快速和可维护的东西时使用numpy（很容易编写“未来的代码”），但我没有时间编写C程序。

就Pandas而言，它对我们“Python黑客”来说很方便，因为它是“用Python实现的R数据结构”，而“R”（显然）是一种全新的语言。我个人使用R，因为我认为Pandas正在快速发展，这使得编写“未来的代码”很难。

正如评论中所建议的那样（我相信@jorijnsmit），没有必要为“简单”案例引入大型依赖项，例如pandas。下面的简约示例与Python 2和3兼容，使用“典型”Python技巧来处理数据问题。

import csv

## Data-file
data = \
'''
       , Game1, Game2,   Game3,   Game4,   Game5
Player1,  2,    6,       5,       2,     2
Player2,  6,      4 ,      1,       8,      4
Player3,  8,     3 ,      2,    1,     5
Player4,  4,  9 ,   4,     7,    9
'''

# Write data to file
with open('data.csv', 'w') as FILE:
    FILE.write(data)

print("Raw data:")
print(data)

# 1) Read the data-file (and strip away spaces), the result is data by column:
with open('data.csv','rb') as FILE:
  raw = [ [ item.strip() for item in line] \
                      for line in list(csv.reader(FILE,delimiter=',')) if line]

print("Data after Read:")
print(raw)

# 2) Convert numerical data to integers ("float" would also work)
for (i, line) in enumerate(raw[1:], 1):
    for (j, item) in enumerate(line[1:], 1):
        raw[i][j] = int(item)

print("Data after conversion:")
print(raw)

# 3) Use the data...
print("Use the data")
for i in range(1, len(raw)):
  print("Sum for Player %d: %d" %(i, sum(raw[i][1:])) )

for i in range(1, len(raw)):
  print("Total points in Game %d: %d" %(i, sum(list(zip(*raw))[i][1:])) )

输出结果为：

Raw data:

       , Game1, Game2,   Game3,   Game4,   Game5
Player1,  2,    6,       5,       2,     2
Player2,  6,      4 ,      1,       8,      4
Player3,  8,     3 ,      2,    1,     5
Player4,  4,  9 ,   4,     7,    9

Data after Read:
[['', 'Game1', 'Game2', 'Game3', 'Game4', 'Game5'], ['Player1', '2', '6', '5', '2', '2'], ['Player2', '6', '4', '1', '8', '4'], ['Player3', '8', '3', '2', '1', '5'], ['Player4', '4', '9', '4', '7', '9']]
Data after conversion:
[['', 'Game1', 'Game2', 'Game3', 'Game4', 'Game5'], ['Player1', 2, 6, 5, 2, 2], ['Player2', 6, 4, 1, 8, 4], ['Player3', 8, 3, 2, 1, 5], ['Player4', 4, 9, 4, 7, 9]]
Use the data
Sum for Player 1: 17
Sum for Player 2: 23
Sum for Player 3: 19
Sum for Player 4: 33
Total points in Game 1: 20
Total points in Game 2: 22
Total points in Game 3: 12
Total points in Game 4: 18

Answer 2

只要您熟悉dtypes，您仍然可以使用结构化数组。由于您的数据集非常小，以下内容可以作为在您的dtype统一但命名为

时将numpy与列表推导结合使用的示例

dt = [('Game1', '<i4'), ('Game2', '<i4'), ('Game3', '<i4'),
      ('Game4', '<i4'), ('Game5', '<i4')]
a = np.array([(2, 6, 5, 2, 2),
              (6, 4, 1, 8, 4),
              (8, 3, 2, 1, 5),
              (4, 9, 4, 7, 9)], dtype= dt)

nms = a.dtype.names
by_col = [(i, a[i].sum()) for i in nms if a[i].dtype.kind in ('i', 'f')]
by_col
[('Game1', 20), ('Game2', 22), ('Game3', 12), ('Game4', 18), ('Game5', 20)]

by_row = [("player {}".format(i), sum(a[i])) for i in range(a.shape[0])]
by_row
[('player 0', 17), ('player 1', 23), ('player 2', 19), ('player 3', 33)]

在这个例子中，为每个列名单独地获取每个总和将是一个真正的痛苦。这就是... a [i] for i in nms bit是有用的，因为名称列表是由nms = a.dtype.names检索的。因为你正在做一笔“总和”。那么你想将求和限制为只有整数和浮点类型，因此a [i] .dtype.kind部分。

按行进行求和同样容易，但您会注意到我没有使用这种语法，只是稍微不同以避免错误消息

a[0].sum()  # massive failure
....snip out huge error stuff...
TypeError: cannot perform reduce with flexible type
# whereas, this works....
sum(a[0])   # use list/tuple summation

也许＆＃39;灵活＆＃39;数据类型不符合他们的名字。因此，如果这是您的数据进入的方式，您仍然可以使用结构化和重新组合。您可以通过切片和更改dtypes来适应您的目的，简单地重新格式化数据。例如，由于您的数据类型完全相同而且您没有可怕的数据集，因此您可以使用许多方法转换为简单的结构化数组。

b = np.array([list(a[i]) for i in range(a.shape[0])])
b
array([[2, 6, 5, 2, 2],
       [6, 4, 1, 8, 4],
       [8, 3, 2, 1, 5],
       [4, 9, 4, 7, 9]])

b.sum(axis=0)
array([20, 22, 12, 18, 20])

b.sum(axis=1)
array([17, 23, 19, 33])

因此，在处理结构化数组时你有很多选择，并且根据你是否需要使用纯python，numpy，pandas或者混合体，你应该熟悉所有选项。

附录

作为一种捷径，我没有提及采取“观点”的观点。在自然界中构造但具有相同dtype的数组。在上面的例子中，按行或列生成简单数组计算要求的简单方法如下...... 制作了阵列的副本，但没有必要

b = a.view(np.int32).reshape(len(a), -1)
b
array([[2, 6, 5, 2, 2],
       [6, 4, 1, 8, 4],
       [8, 3, 2, 1, 5],
       [4, 9, 4, 7, 9]])
b.dtype
dtype('int32')

b.sum(axis=0)
array([20, 22, 12, 18, 20])

b.sum(axis=1)
array([17, 23, 19, 33])

Answer 3

你根本不需要numpy，只需这样做：

import csv
from collections import OrderedDict

with open('games') as f:
    reader = csv.reader(f, delimiter=';')
    data = list(reader)

sums = OrderedDict()
for row in data[1:]:
    player, games = row[0], row[1:]
    sums[player] = sum(map(int, games))

Answer 4

考虑使用Pandas module：

import pandas as pd

df = pd.read_csv('/path/to.file.csv', sep=';')

结果DataFrame：

In [196]: df
Out[196]:
         Game1  Game2  Game3  Game4  Game5
Player1      2      6      5      2      2
Player2      6      4      1      8      4
Player3      8      3      2      1      5
Player4      4      9      4      7      9

和：

In [197]: df.sum(axis=1)
Out[197]:
Player1    17
Player2    23
Player3    19
Player4    33
dtype: int64

In [198]: df.sum(1).values
Out[198]: array([17, 23, 19, 33], dtype=int64)

不能用灵活型进行缩减

4 个答案: