Question

我想在函数中拆分一个字母和一个数字，返回两个值并使用解构赋值将它们分配给变量，如下所示：

def split_string(str):
    if str is not np.nan:
        match = re.search("(\w{1})(\d{1,3})", str)
        if match is not None:
            return match.group(0), match.group(1)
    return None, None

该函数返回所需的结果，例如：

0      (None, None)
1           (C, 85)
2      (None, None)
3          (C, 123)

但是如果我尝试分配结果，我会得到一个ValueError（data是来自CSV的Pandas DataFrame，而data.strings是一列字符串和NaN＆＃39; s）：

a, b = data.strings.apply(split_string)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-108-40dc67dc859d> in <module>()
      6     return None, None
      7 
----> 8 a, b = data.strings.apply(split_string)

ValueError: too many values to unpack (expected 2)

然而，这很好用：

def test(x, y):
    return x, y

a, b = test(1, 2)

我在这里缺少什么？我真的希望能够在一行中处理和分配整个列的返回值。谢谢！

Answer 1

使用所描述的Series字符串定义样本数据框。

>>> data = pd.DataFrame({'strings': ['the', 'test', 'data', np.nan, 'end']})

>>> a = data.strings.apply(split_string)
>>> a
0    (None, None)
1    (None, None)
2    (None, None)
3    (None, None)
4    (None, None)

如果您想在一行中创建两个新列，可以使用zip。

>>> a, b = zip(*data.strings.apply(split_string))
>>> a
(None, None, None, None, None)
>>> b
(None, None, None, None, None)

我们可以将它们直接分配到data作为一行中的新列。

>>> data['a'], data['b'] = zip(*data.strings.apply(split_string))
>>> data
  string     a     b
0    the  None  None
1   test  None  None
2   data  None  None
3    NaN  None  None
4    end  None  None

Answer 2

Apply返回一个系列或数据框。

http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.DataFrame.apply.html

无法通过解构赋值分配函数返回值

2 个答案: