向pandas数据框添加列将使用NA填充它

时间:2014-12-24 10:56:57

标签: python pandas

我有这个pandas数据帧:

          SourceDomain                           1  2         3
0  www.theguardian.com     profile.theguardian.com  1  Directed
1  www.theguardian.com  membership.theguardian.com  2  Directed
2  www.theguardian.com   subscribe.theguardian.com  3  Directed
3  www.theguardian.com            www.google.co.uk  4  Directed
4  www.theguardian.com        jobs.theguardian.com  5  Directed

我想添加一个新列,这是一个像这样创建的pandas系列:

Weights = Weights.value_counts()

但是,当我尝试使用edgesFile[4] = Weights添加新列时,它会用NA填充它而不是值:

          SourceDomain                           1  2         3   4
0  www.theguardian.com     profile.theguardian.com  1  Directed NaN
1  www.theguardian.com  membership.theguardian.com  2  Directed NaN
2  www.theguardian.com   subscribe.theguardian.com  3  Directed NaN
3  www.theguardian.com            www.google.co.uk  4  Directed NaN
4  www.theguardian.com        jobs.theguardian.com  5  Directed NaN

如何添加保留值的新列? 感谢?

达尼

2 个答案:

答案 0 :(得分:1)

您正在获取NaN,因为Weights的索引与edgesFile的索引不匹配。如果您希望Pandas忽略Weights.index并按顺序粘贴,则改为传递基础NumPy数组:

edgesFile[4] = Weights.values

这是一个展示差异的例子:

In [14]: df = pd.DataFrame(np.arange(4)*10, index=list('ABCD'))

In [15]: df
Out[15]: 
    0
A   0
B  10
C  20
D  30

In [16]: s = pd.Series(np.arange(4), index=list('CDEF'))

In [17]: s
Out[17]: 
C    0
D    1
E    2
F    3
dtype: int64

在这里,我们看到Pandas 对齐索引:

In [18]: df[4] = s

In [19]: df
Out[19]: 
    0   4
A   0 NaN
B  10 NaN
C  20   0
D  30   1

在这里,Pandas只需将s中的值粘贴到列中:

In [20]: df[4] = s.values

In [21]: df
Out[21]: 
    0  4
A   0  0
B  10  1
C  20  2
D  30  3

答案 1 :(得分:0)

这是您的问题的一个小例子:

您可以在现有DataFrame中添加具有列名称的新列

>>> df = DataFrame([[1,2,3],[4,5,6]], columns = ['A', 'B', 'C'])
>>> df
   A  B  C
0  1  2  3
1  4  5  6

>>> s = Series([7,8])
>>> s
0    7
1    8
2    9

>>> df['D']=s
>>> df
   A  B  C  D
0  1  2  3  7
1  4  5  6  8

或者,您可以从Series制作DataFrame,然后使用concat

>>> df = DataFrame([[1,2,3],[4,5,6]])
>>> df
   0  1  2
0  1  2  3
1  4  5  6

>>> s = DataFrame(Series([7,8]), columns=['4']) # if you don't provide column name, default name will be 0
>>> s
   0
0  7
1  8

>>> df = pd.concat([df,s], axis=1)
>>> df
   0  1  2  0
0  1  2  3  7
1  4  5  6  8

希望这会有所帮助