在pandas 0.22中设置列值失败

时间:2018-04-24 06:42:54

标签: python pandas

将数据框的列值设置为系列时出现了连线问题。 我正在将一个大的csv文件读入块并处理每个块以获得一个系列,然后将该系列添加到每个块作为新列。 代码如下:

import pandas as pd
from pandas import Series
import math
chunksize=1
index=1
def radec2thetaphi(ra, dec):
    """
    convert equatorial ra, dec in degrees
    to polar theta, phi in radians
    """
    return math.pi/2 - math.radians(dec), math.radians(ra)

def process(chunk,index):
    df = chunk
    hpxSeries= add_position(df.loc[:, 'ra'],df.loc[:, 'dec'],10)
    df['hpx'] = hpxSeries
    # df.assign(hpx=hpxSeries)
    df.to_csv('dataWithHpx'+str(index)+'.csv',
          index=False,
          header=True,
          mode='a',  # append data to csv file
          chunksize=chunksize)


def add_position(ras, decs, max_norder):
   """
   add the HEALPix bin containing the (ra, dec) position
   """
   from healpy import pixelfunc
   ipixs= []
   for index in range(len(ras)):
       theta, phi = radec2thetaphi(ras.iloc[index],decs.iloc[index])
       # theta, phi = radec2thetaphi(ras[index],decs[index])

       ipix = pixelfunc.ang2pix(2 ** max_norder, theta, phi, nest=True)
       ipixs.append(ipix)
   return Series(ipixs)

reader = pd.read_table('test.csv',sep=',',chunksize=chunksize)
for chunk in reader:
    process(chunk,index)
    index += 1

测试数据如下:

objID,ra,dec,raErr,decErr,b,l,htmID,u,g,r,i,z,type
1237656565430026857,261.5740599080846,74.38571382566604,0.01726714203661016,0.020599799518891203,31.743540985047822,105.7795198731553,14618393532272,23.195066,21.844196,20.568333,19.947378,19.467426,6
1237656565430026858,260.9271613912779,74.33926515751666,0.10718089928452412,0.10687692767691707,31.92390256089231,105.77282959387213,14618284549848,23.023682,23.20101,23.082199,21.470964000000002,21.859362,3
1237656565430026859,260.87187786155675,74.33533882669094,0.09668760551492658,0.08869229349592739,31.9393315422538,105.77241962727824,14618284778785,23.964920000000003,22.745125,21.974653,21.001870999999998,20.722258,3
1237656565430026860,260.87006613036715,74.3364524380365,0.17006192913048626,0.1631736869113577,31.939547749705497,105.77383010182928,14618284785373,24.094654000000002,22.723902,21.233057000000002,20.435284,20.2101,3

我的问题是:当我使用pandas 0.18时,代码运行完美,但是当我使用0.22版本时,它设置了' hpx'专栏Nan BESIDES第一块,我无法弄清楚原因。

1 个答案:

答案 0 :(得分:0)

在下面的陈述中

 df['hpx'] = hpxSeries 

您正在为索引' 0'分配一个Series对象。到df栏' hpx'。因此,df [' hpx']仅设置为第一行。而不是设置' hpxSeries'直接,请使用

df['hpx'] = hpxSeries.values