Pandas数据帧错误地包括输出中的数据类型

时间:2015-03-15 20:22:21

标签: python pandas

在iPy Notebook中(见下文)我正在做一些数据按摩,以从CSV文件中提取所需数据。我这样做是通过创建新的pandas数据帧,并且我遇到了一个我以前从未见过的问题 - 在数据框中的每个新字典的末尾,包含了数据类型。

 df_files = glob.glob('/Users/snplabadmin...')

 all_regressors= {'participant':[], 'sooner': [], 'safer':[], 'later':[], 'risky':[]}
#output = {}

for df_file in df_files:
    df = pd.read_csv(df_file)

    participant = df['participant'][0]

    #make sure all response keys are coded as strings
    df['choice_key.keys'] = map(str, df['choice_key.keys']) #convert every item in df['choice_key.keys'] to a string

    #create new column of coded responses
    df['resp']=0  # Initialize to 0 (good for misses, too)
    df['resp'][df['choice_key.keys']=='1'] = 1
    df['resp'][df['choice_key.keys']=='1.0'] = 1  # Left == Sooner/Safer
    df['resp'][df['choice_key.keys']=='2'] = 2
    df['resp'][df['choice_key.keys']=='2.0'] = 2  # Right == Later/Riskier

    #create runs
    run_1 = df[0:36]
    run_2 = df[38:73]
    run_3 = df[74:110]
    run_4 = df[111:147]
    run_5 = df[148:184]
    run_6 = df[185:221]
    runs = [run_1, run_2, run_3, run_4, run_5, run_6]

    #define counter for loop
    counter = 1

    for run in runs:
        counter = counter
        run_numb = participant + str(counter)
        print run_numb
        delays = run[run['delay0_prob1'] == 0] # separate delay trials into dataframe
        probs = run[run['delay0_prob1'] == 1] # separate prob trials into dataframe 

        #parse responses from delay and prob dataframes
        delays_sooner = delays[delays['resp'] == 1]
        #print delays_sooner['ddpd']
        delays_later = delays[delays['resp'] == 2]
        probs_safer = probs[probs['resp'] == 1]
        probs_risky = probs[probs['resp'] == 2]

        sooner = delays_sooner['ResponseTime']
        safer = probs_safer['ResponseTime']
        later = delays_later['ResponseTime']
        risky = probs_risky['ResponseTime']

        all_regressors['sooner'].append(delays_sooner['ResponseTime'])
        all_regressors['safer'].append(probs_safer['ResponseTime'])
        all_regressors['later'].append(delays_later['ResponseTime'])
        all_regressors['risky'].append(probs_risky['ResponseTime'])
        all_regressors['participant'].append(run_numb)

        counter = counter +1

来自' all_regressors'的词典应该只包含一个数字列表,但我会看到:

    tdcs_208p1
8     180.00
13     90.00
15      0.25
26     30.00
27     90.00
Name: ddpd, dtype: float64
tdcs_208p2
71    30
Name: ddpd, dtype: float64
tdcs_208p3
Series([], name: ddpd, dtype: float64)
tdcs_208p4
111    180
124    180
127      7
138     90
146    180
Name: ddpd, dtype: float64
tdcs_208p5
153     90
156    180
179     90
Name: ddpd, dtype: float64
tdcs_208p6
210    1
Name: ddpd, dtype: float64

关于我为什么要获得这些额外输入以及如何摆脱它们的任何想法?我想要的只是数字!

谢谢!

1 个答案:

答案 0 :(得分:0)

只需更改(.value)修正问题,谢谢Ed! all_regressors['sooner'].append(delays_sooner['ResponseTime'].values) all_regressors['safer'].append(probs_safer['ResponseTime'].values) all_regressors['later'].append(delays_later['ResponseTime'].values) all_regressors['risky'].append(probs_risky['ResponseTime'].values)