这个问题的标题可能不合适......
因此,假设我有以下input.csv
:
Division,id,name
1,3870,name1
1,4537,name2
1,5690,name3
我需要根据id
行进行一些处理,取样如下:
>>> get_data(3870)
[{"matchId": 42, comment: "Awesome match"}, {"matchId": 43, comment: "StackOverflow is quite good"}]
我的目标是输出一个csv,它是第一个之间的连接,以及通过get_data
检索到的相关数据:
Division,id,name,matchId,comment
1,3870,name1,42,Awesome match
1,3870,name1,43,StackOverflow is quite good
1,4537,name2,90,Random value
1,4537,name2,91,Still a random value
1,5690,name3,10,Guess what it is
1,5690,name3,11,A random value
但是,由于某些原因,在整个过程中,整数数据被转换为float:
Division,id,name,matchId,comment
1.0,3870.0,name1,42.0,Awesome match
1.0,3870.0,name1,43.0,StackOverflow is quite good
1.0,4537.0,name2,90.0,Random value
1.0,4537.0,name2,91.0,Still a random value
1.0,5690.0,name3,10.0,Guess what it is
1.0,5690.0,name3,11.0,A random value
这是我的代码的简短版本,我想我错过了一些东西......
input_df = pd.read_csv(INPUT_FILE)
output_df = pd.DataFrame()
for index, row in input_df.iterrows():
matches = get_data(row)
rdict = dict(row)
for m in matches:
m.update(rdict)
output_df = output_df.append(m, ignore_index=True)
# FIXME: this was an attempt to solve the problem
output_df["id"] = output_df["id"].astype(int)
output_df["matchId"] = output_df["matchId"].astype(int)
output_df.to_csv(OUTPUT_FILE, index=False)
如何将每个浮点列转换为整数?
答案 0 :(得分:1)
第一个解决方案是将参数float_format='%.0f'
添加到to_csv
:
print output_df.to_csv(index=False, float_format='%.0f')
Division,comment,id,matchId,name
1,StackOverflow is quite good,3870,43,name1
1,StackOverflow is quite good,4537,43,name2
1,StackOverflow is quite good,5690,43,name3
第二种可能的解决方案是apply
函数convert_to_int
而不是astype
:
print output_df
Division comment id matchId name
0 1 StackOverflow is quite good 3870 43 name1
1 1 StackOverflow is quite good 4537 43 name2
2 1 StackOverflow is quite good 5690 43 name3
print output_df.dtypes
Division float64
comment object
id float64
matchId float64
name object
dtype: object
def convert_to_int(x):
try:
return x.astype(int)
except:
return x
output_df = output_df.apply(convert_to_int)
print output_df
Division comment id matchId name
0 1 StackOverflow is quite good 3870 43 name1
1 1 StackOverflow is quite good 4537 43 name2
2 1 StackOverflow is quite good 5690 43 name3
print output_df.dtypes
Division int32
comment object
id int32
matchId int32
name object
dtype: object