Question

我正在访问pandas数据帧行，因此我得到了pandas系列。我的解析例程接受了namedtuples。是否可以将pandas系列转换为命名元组？

Answer 1

将任何系列转换为命名元组的通用函数

def namedtuple_me(s, name='S'):
    return namedtuple(name, s.index)(*s)

namedtuple_me(pd.Series([1, 2, 3], list('abc')))
S(a=1, b=2, c=3)

为改善实施提供@ juanpa.arrivillaga

import functools
from collections import namedtuple

@functools.lru_cache(maxsize=None)  # add memoization to increase speed
def _get_class(fieldnames, name):
    """Create a new namedtuple class."""
    return namedtuple(name, fieldnames)

def namedtuple_me(series, name='S'):
    """Convert the series to a namedtuple."""
    klass = _get_class(tuple(series.index), name)
    return klass._make(series)

Answer 2

你可以只使用df.itertuples来做你正在做的事情：

In [5]: df
Out[5]:
     c0    c1    c2    c3    c4    c5    c6    c7    c8    c9
0   8.0   2.0   1.0   4.0   4.0   3.0   1.0  19.0   5.0   9.0
1   7.0   7.0   0.0   4.0  14.0   7.0   9.0   0.0   0.0   9.0
2  19.0  10.0   6.0  13.0  12.0  11.0   8.0   4.0  11.0  13.0
3  14.0   0.0  16.0  19.0   3.0   8.0   8.0   9.0  17.0  13.0
4  18.0  16.0  10.0   8.0  15.0   9.0  18.0   9.0   5.0  10.0
5  15.0   7.0  16.0   3.0  18.0  14.0   3.0   6.0   0.0   9.0
6  14.0  14.0  18.0   4.0   4.0   0.0   8.0  15.0   8.0  12.0
7  19.0  16.0  15.0  16.0   1.0  12.0  14.0   1.0  10.0  15.0
8   8.0  17.0  10.0  18.0   7.0  13.0  13.0  12.0   6.0  11.0
9  15.0  13.0  13.0  17.0   2.0   0.0   6.0  10.0   5.0   5.0

In [6]: rows = df.itertuples(name='Row')

In [7]: r0 = next(rows)

In [8]: r0
Out[8]: Row(Index=0, c0=8.0, c1=2.0, c2=1.0, c3=4.0, c4=4.0, c5=3.0, c6=1.0, c7=19.0, c8=5.0, c9=9.0)

In [9]: r0.c0
Out[9]: 8.0

否则，您必须自己完成，例如：

In [10]: from collections import namedtuple

In [11]: df.columns
Out[11]: Index(['c0', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8', 'c9'], dtype='object')

In [12]: Row = namedtuple('Row', df.columns)

In [13]: df.iloc[0]
Out[13]:
c0     8.0
c1     2.0
c2     1.0
c3     4.0
c4     4.0
c5     3.0
c6     1.0
c7    19.0
c8     5.0
c9     9.0
Name: 0, dtype: float64

In [14]: Row(*df.iloc[0])
Out[14]: Row(c0=8.0, c1=2.0, c2=1.0, c3=4.0, c4=4.0, c5=3.0, c6=1.0, c7=19.0, c8=5.0, c9=9.0)

请注意，此版本没有index字段...

Answer 3

处理此问题的另一种方法是，如果手头已有一个Pandas Series，并且正在将其用作函数的输入，请按原样解压缩Series。

>>> df = pd.DataFrame({'name': ['John', 'Sally'], 'date': ['2020-01-01', '2020-02-01'], 'value': ['A', 'B']})
>>> df
    name        date value
0   John  2020-01-01     A
1  Sally  2020-02-01     B
>>> row = df.iloc[0]
>>> type(row)
<class 'pandas.core.series.Series'>
>>> print({**row})  # unpacks as a dictionary
{'name': 'John', 'date': '2020-01-01', 'value': 'A'}
>>> myfunc(**row)   # ergo, unpacks as keyword args

这是因为熊猫Series已经是类似namedtuple的对象（而这正是df.itertuples返回的对象）。

无论如何，对于我要解决的问题，我占用了数据帧的特定行，而不是遍历整个事情，因此，我不必走转换为命名元组的路线。

熊猫系列作为命名元组

3 个答案: