删除所有非数字数组值

时间:2019-03-31 18:37:44

标签: python arrays python-3.x string pandas

我有一个要清理的数组,其条目如下:

arr = (['1140.0', '-600.0', '-700.6', '5700.45', '(~par)', '(-6.0', '690.6', ....., 
'-----', '5750.65', '#', '-850.0'])

我想清除所有non-numeric值的数组,并保持数组中的顺序以得到如下输出:

arr_clean = (['1140.0', '-600.0', '-700.6', '5700.45', '690.6', '5750.65', '-850.0'])

有些值是负数,所以我不能简单地查看元素的第一个字符是否为非数字,并且某些值中包含数字,但也需要删除-像值(-6.0

我首先转换为一个数据框,以查看是否可以更改pd.to_numeric(df[col])并以这种方式进行清理,但是从数组到df来回切换再到(返回数组是〜 800,000 ,我希望最终输出是数组)。

有一种简单的方法吗?

2 个答案:

答案 0 :(得分:0)

如果float('(-6.0')不是数字,则会抛出异常。使用此功能完全是pythonic(duck typing

arr = (['1140.0', '-600.0', '-700.6', '5700.45', '(~par)', '(-6.0', '690.6', '...',
'-----', '5750.65', '#', '-850.0'])

arr_clean = list()

for elm in arr:
    try:
        float(elm)
        print("could     convert string to float:", elm)
        arr_clean.append(elm)
    except ValueError as e:
        print(e)

print(arr_clean)

这将输出:

could     convert string to float: 1140.0
could     convert string to float: -600.0
could     convert string to float: -700.6
could     convert string to float: 5700.45
could not convert string to float: '(~par)'
could not convert string to float: '(-6.0'
could     convert string to float: 690.6
could not convert string to float: '...'
could not convert string to float: '-----'
could     convert string to float: 5750.65
could not convert string to float: '#'
could     convert string to float: -850.0
['1140.0', '-600.0', '-700.6', '5700.45', '690.6', '5750.65', '-850.0']

答案 1 :(得分:0)

如果我可以假设您的array在数据框中,则可以将pd.to_numericerrors=coerce一起使用,然后再与Dataframe.dropna一起使用:

# Example dataframe which was provided
data = {'Array':['1140.0', '-600.0', '-700.6', '5700.45', '(~par)', '(-6.0', '690.6', '.....', '-----', '5750.65', '#', '-850.0']}

df = pd.DataFrame(data)
print(df)
      Array
0    1140.0
1    -600.0
2    -700.6
3   5700.45
4    (~par)
5     (-6.0
6     690.6
7     .....
8     -----
9   5750.65
10        #
11   -850.0

应用pd.to_numeric

pd.to_numeric(df.Array, errors='coerce').dropna()

0     1140.00
1     -600.00
2     -700.60
3     5700.45
6      690.60
9     5750.65
11    -850.00
Name: Array, dtype: float64