输出中不需要的数组,pandas

时间:2015-08-09 04:34:00

标签: python arrays pandas

我有以下代码几乎完全符合我的要求:

def stateCountAsList(filepath,state):

    import pandas as pd 
    pd.set_option('display.width',200)

    import numpy as np 

    dataFrame = pd.read_csv(filepath,header=0,sep='\t')
    df = dataFrame.iloc[0:638,:]

    dfState = df[df['State']== state]
    yearList = range(1999,2012)
    countsList =[]

    for year in yearList: #for every year in the range 
        if year in dfState['Year'].tolist(): #if the year is in the list of years for the selected state 
            value = dfState[(dfState.Year == year)]
            countsList.append(value.Count.values) 
        else: 
            countsList.append(np.nan.values)
    print countsList 
    return countsList

stateCountAsList('United States Cancer Statistics, 1999-2011 Incidencet.txt' ,'California')

问题是我的输出应该是一个列表,但我到处都得到了数组:

[array([ 5561.]), array([ 5588.]), array([ 6059.]), array([ 6043.]), array([ 5958.]), array([ 6566.]), array([ 7160.]), array([ 6780.]), array([ 7327.]), array([ 7585.]), array([ 7483.]), array([ 7635.]), array([ 7735.])]

如何删除输出中的数组?

2 个答案:

答案 0 :(得分:1)

Panda的Dataframe将其数据存储在numpy数组中。这就是为什么你在输出中看到单词数组的原因。如果要将其转换为普通python列表而不是numpy数组,可以调用tolist()

# untested
for year in yearList: #for every year in the range 
    if year in dfState['Year'].tolist(): #if the year is in the list of years for the selected state 
        value = dfState[(dfState.Year == year)]
        countsList.append(value.Count.values.tolist()) 
    else: 
        countsList.append(np.nan.values.tolist())

答案 1 :(得分:0)

array是由NumPy库创建的数据结构,NumPy库是Python的科学库。可以用类似的方式从数组和列表中检索项目。

由于value.Count.valuesnp.nan.values会返回包含一个项目的数组,因此您可以直接将该项目附加到countsList

countsList.append(value.Count.values[0])
...
countsList.append(np.nan.values[0])

来源:http://docs.scipy.org/doc/numpy/reference/arrays.html