我在Pandas df中有两列我想要操作。首先,我想删除非数字值,例如" High"从专栏"得分"并将剩余的值转换为int(所有数据都作为字符串输入)。接下来我想总结一下"得分"基于unique" measure_id"。我该如何执行这两个功能?
df是:
nationwide_measures = pd.read_sql_query("""select state,
measure_id,
measure_name,
score
from timely_and_effective_care___hospital;""", conn)
我失败的尝试是:
nationwide_measures1 = nationwide_measures.to_numeric(nationwide_measures{:,'score'}, errors='coerce')
答案 0 :(得分:0)
您可以选择分数值为数字的所有nationalwide_measure行,我希望它们采用字符串格式,因此将它们转换为int
,然后使用groupby
根据measure_id汇总分数。
nationwide_measures1 = nationwide_measures[nationwide_measures['score'].str.isalpha() != True]
nationwide_measures1['score'] = pd.to_numeric(nationwide_measures1['score'])
score_sum = nationwide_measures1.groupby('measure_id')['score'].sum()
希望这会有所帮助 更新:如果你想要sum,mean,min,max,std你可以使用.agg即
import numpy as np
score_sum = nationwide_measures1.groupby('measure_id')['score'].agg([pd.np.sum,pd.np.min, pd.np.max, pd.np.mean, pd.np.std])
答案 1 :(得分:0)
删除具有非数字分数值的元组的答案是:
nationwide_measures1 = nationwide_measures[nationwide_measures['score'].astype(str).str.isdigit()]
我在这里找到了: Pandas select only numeric or integer field from dataframe