Question

我有一个如下所示的数据集

  time               a_id      b_id        c_id     d_id  probability
  2015-01-02         237       9712        54       38  [0.194255020142]
  2015-01-02         131        481        60       42   [0.23631604522]
  2015-01-02         277       8842        57       46  [0.176149934661]
  2015-01-02         124       3664        95       48  [0.158623758706]

目前，'probability'列的类型为object。我想将它转换为int，以便我可以对它执行一些数学运算。我使用了以下代码

 df_total['probability] = df_total['probability'].astype(int)

但它抛出了一个错误

ValueError: setting an array element with a sequence.

我通过子集化和转换为列表，将概率列转换为numpy数组中的列表。其代码如下所示

probability = probs[:,1:]
probability = probability.tolist()

我得到的是一个列表，其中元素括在括号中？我不明白为什么。

我该如何解决这个问题？

Answer 1

看起来您当前的“概率”列值是一个包含一个元素的列表吗？

尝试类似：

def to_integer(row):
    prob = row['probability'][0] #0th element of the list is the actual float
    return int(prob)
df_total['probability'] = df_total.apply(lambda row: to_integer(row), axis = 1)

Answer 2

鉴于概率当前是十进制形式，将其转换为int将导致值为零（例如int（.99）导致0）。在这个例子中，我假设你想要整数值99。要从每个列表中提取单个值：

df['probability'] = [int(100 * i[0]) if i else None for i in df.probability]

如果缺少任何值，else None部分就在那里。尝试在None上索引i [0]否则会引发错误。

无法将熊猫系列转换为int？正则表达式抛出错误？

2 个答案: