有条件地从python列表中提取数字

时间:2017-01-19 21:19:38

标签: python numpy

我有一个像

这样的数字列表
20
40
45
60
80

我希望能够说,例如,数字之间的平均距离< 50是12.5。

import numpy as np
from sys import argv
script, pos_file, output = argv
positions = []
with open(pos_file) as f:
    for x in f:
        assert x.strip().split()
        positions.append(x)

position_list= []

for x in positions:
    if x < 50:
        position_list.append(x)

print np.mean[position_list]

这不起作用 - 我认为因为当我打印位置列表时我得到20,40,45,60,80 - 所以我认为它不是将数字视为单个数字因此它无法测试x &LT; 50.我做错了什么?

编辑:看起来数据更像是由以下几行组成:

467,1977,3751,4013,5752,6406,6446,7362,7585,8285,8624,8741,‌​9143,9304,11879,1319‌​7,13460,14401,14785,‌​15117,22264,23714,24‌​294,24534,26053,2695‌​9,27714,29462,35342,‌​36538,36612,37031,39‌​093,42281,42967,4394‌​5

2 个答案:

答案 0 :(得分:1)

您的代码有几个问题:

  • 您不会将其转换为intfloat;
  • 您使用np.mean[..]代替np.mean(..)np.mean无法编写脚本。

解决方案是:

import numpy as np
from sys import argv
script, pos_file, output = argv
positions = []
with open(pos_file) as f:
    for x in f:
        assert x.strip().split()
        positions.append(int(x))

position_list= [x for x in positions if x < 50]

print np.mean(position_list)

修改

但是,根据您的评论,您可能会看到以逗号分隔的列表:

import numpy as np
from sys import argv
script, pos_file, output = argv
positions = []
with open(pos_file) as f:
    for x in f:
        positions += (int(i) for i in x.strip().split())

position_list= [x for x in positions if x < 50]

print np.mean(position_list)

或者:

import numpy as np
from sys import argv
script, pos_file, output = argv
positions = []
with open(pos_file) as f:
    for x in f:
        for i in x.strip().split():
            positions.append(int(i))

position_list= [x for x in positions if x < 50]

print np.mean(position_list)

你也可以像@ Jean-FrançoisFabre所说的那样使用总和除以项目数,所以:

from sys import argv
script, pos_file, output = argv
positions = []
with open(pos_file) as f:
    for x in f:
        for i in x.strip().split():
            positions.append(int(i))

position_list= [x for x in positions if x < 50]

print sum(position_list)/len(position_list)

在这种情况下,您不必导入

答案 1 :(得分:0)

您的代码中有一些错误,其他答案指出了这一点,但我觉得我应该以更清洁的方式为您重写它:

with open(pos_file) as f:
    positions = [int(x) for line in f for x in line.strip().split(',') if int(x) < 50]

print(sum(positions)/len(positions))
  • 您不需要numpy来计算mean,这不是火箭科学
  • assert语句没用。如果一行为空,split()将返回一个空列表,而不是列表推导的问题。
  • 添加的双循环允许读取位于同一行的几个整数
  • 当您只想保留最低的数字
  • 时,没有内存浪费存储数字
  • 利用您对其中一个答案的反馈来确定该列表是以逗号分隔的。现在我意识到可以使用csv模块。

所以csv解决方案:

import csv
with open(pos_file) as f:
    cr = csv.reader(f)
    positions = [int(x) for row in cr for x in row if int(x) < 50]

print(sum(positions)/len(positions))