在iPy Notebook中(见下文)我正在做一些数据按摩,以从CSV文件中提取所需数据。我这样做是通过创建新的pandas数据帧,并且我遇到了一个我以前从未见过的问题 - 在数据框中的每个新字典的末尾,包含了数据类型。
df_files = glob.glob('/Users/snplabadmin...')
all_regressors= {'participant':[], 'sooner': [], 'safer':[], 'later':[], 'risky':[]}
#output = {}
for df_file in df_files:
df = pd.read_csv(df_file)
participant = df['participant'][0]
#make sure all response keys are coded as strings
df['choice_key.keys'] = map(str, df['choice_key.keys']) #convert every item in df['choice_key.keys'] to a string
#create new column of coded responses
df['resp']=0 # Initialize to 0 (good for misses, too)
df['resp'][df['choice_key.keys']=='1'] = 1
df['resp'][df['choice_key.keys']=='1.0'] = 1 # Left == Sooner/Safer
df['resp'][df['choice_key.keys']=='2'] = 2
df['resp'][df['choice_key.keys']=='2.0'] = 2 # Right == Later/Riskier
#create runs
run_1 = df[0:36]
run_2 = df[38:73]
run_3 = df[74:110]
run_4 = df[111:147]
run_5 = df[148:184]
run_6 = df[185:221]
runs = [run_1, run_2, run_3, run_4, run_5, run_6]
#define counter for loop
counter = 1
for run in runs:
counter = counter
run_numb = participant + str(counter)
print run_numb
delays = run[run['delay0_prob1'] == 0] # separate delay trials into dataframe
probs = run[run['delay0_prob1'] == 1] # separate prob trials into dataframe
#parse responses from delay and prob dataframes
delays_sooner = delays[delays['resp'] == 1]
#print delays_sooner['ddpd']
delays_later = delays[delays['resp'] == 2]
probs_safer = probs[probs['resp'] == 1]
probs_risky = probs[probs['resp'] == 2]
sooner = delays_sooner['ResponseTime']
safer = probs_safer['ResponseTime']
later = delays_later['ResponseTime']
risky = probs_risky['ResponseTime']
all_regressors['sooner'].append(delays_sooner['ResponseTime'])
all_regressors['safer'].append(probs_safer['ResponseTime'])
all_regressors['later'].append(delays_later['ResponseTime'])
all_regressors['risky'].append(probs_risky['ResponseTime'])
all_regressors['participant'].append(run_numb)
counter = counter +1
来自' all_regressors'的词典应该只包含一个数字列表,但我会看到:
tdcs_208p1
8 180.00
13 90.00
15 0.25
26 30.00
27 90.00
Name: ddpd, dtype: float64
tdcs_208p2
71 30
Name: ddpd, dtype: float64
tdcs_208p3
Series([], name: ddpd, dtype: float64)
tdcs_208p4
111 180
124 180
127 7
138 90
146 180
Name: ddpd, dtype: float64
tdcs_208p5
153 90
156 180
179 90
Name: ddpd, dtype: float64
tdcs_208p6
210 1
Name: ddpd, dtype: float64
关于我为什么要获得这些额外输入以及如何摆脱它们的任何想法?我想要的只是数字!
谢谢!
答案 0 :(得分:0)
只需更改(.value)修正问题,谢谢Ed! all_regressors['sooner'].append(delays_sooner['ResponseTime'].values) all_regressors['safer'].append(probs_safer['ResponseTime'].values) all_regressors['later'].append(delays_later['ResponseTime'].values) all_regressors['risky'].append(probs_risky['ResponseTime'].values)