Question

我有一个Pandas DataFrame，其列标有Python元组。

这些列标签元组中可以包含None。

当我尝试使用以下任一方法向数据框添加列时，标记元组中的None将隐式转换为numpy.nan。

方法1 - 使用dataframe[ NewColumn ] = ...语法

添加列

>>> import pandas
>>> df = pandas.DataFrame()
>>> column_label = ( 'foo', None )
>>> df[column_label] = [ 1, 2, 3 ]
>>> df
   (foo, nan)
0           1
1           2
2           3
>>> 
>>> df.columns
Index([(u'foo', nan)], dtype='object')
                ^^^
           Desired to be be None

方法2 - 使用pandas.DataFrame.insert

添加列

>>> import pandas
>>> df = pandas.DataFrame()
>>> df.insert( 0, ( 'foo', None ), [ 1, 2, 3 ] )
>>> df
   (foo, nan)
0           1
1           2
2           3
>>> df.columns
Index([(u'foo', nan)], dtype='object')
                ^^^
             Desired to be None

那么 - 这里发生了什么？

有没有办法使用DataFrame[]或DataFrame.insert语法将标签添加到包含None的元组的现有数据框？

（奇怪的是，如果将包含元组列标签的None直接发送到DataFrame构造函数中，或者使用包含元组的None显式设置columns属性，则保留None，例如：

df = pandas.DataFrame( [ 1, 2, 3 ], columns=[ ( 'foo', None )] )

提供一个数据框，其中( 'foo', None )为列，而不是( 'foo', nan )。

同样地做： df.columns = [（'foo'，None），...]

会将第一个列标签设置为( 'foo', None )）。

Answer 1

DataFrame列和行是不同的。可以通过标题名称访问DataFrame列，因此如果没有更多上下文，则不使用None可能没有意义，即查看＆foo foo＆＃39;列在下面访问。还有一个可选索引。如果省略索引，它将变为连续的整数。

import pandas
headers = ['foo', 'Nada']
foo = [(1,'uno'), (2,'dos'), (3, 'tres')]
indices = ['a', 'b', 'c']
df = pandas.DataFrame(foo, columns=headers, index=indices)
#     foo  Nada
# a    1   uno
# b    2   dos
# c    3  tres

df['foo'] # only foo column of DataFrame (indices are also shown)
# a    1
# b    2
# c    3

df.loc['b'] # the row at b
# Name: foo, dtype: int64
# foo       2
# Nada    dos
# Name: b, dtype: object

df.iloc[0] # the row at integer location 0
# foo       1
# Nada    uno
# Name: a, dtype: object

bar = ['one', 'two', 'three']
df['bar'] = bar # add a new column
#    foo  Nada    bar
# a    1   uno    one
# b    2   dos    two
# c    3  tres  three

包含无元组的标题可能会导致错误并且难以与Pandas一起使用。一种方法可能是将元组序列化为字符串或字符串元组，以便在标题中使用，如if/else in Python's list comprehension?中所述。如果以后需要从头文件中对它们进行反序列化。

column_label = ( 'foo', None )
headers = ['' if x is None else x for x in column_label] # serialize into strings
df = pandas.DataFrame(foo, columns=headers)
#    foo
# 0    1   uno
# 1    2   dos
# 2    3  tres
column_labels_were = tuple([x if x else None for x in df.columns]) # deserialize from strings, if x is false if x is ''
# ('foo', None)

避免Pandas在列元组

1 个答案: