Pandas查询 - 按唯一列值和聚合选择元组

时间:2017-07-17 15:39:20

标签: python sqlite pandas

我在Pandas df中有两列我想要操作。首先,我想删除非数字值,例如" High"从专栏"得分"并将剩余的值转换为int(所有数据都作为字符串输入)。接下来我想总结一下"得分"基于unique" measure_id"。我该如何执行这两个功能?

df是:

nationwide_measures = pd.read_sql_query("""select state,
          measure_id,
          measure_name,
          score
from timely_and_effective_care___hospital;""", conn)

我失败的尝试是:

 nationwide_measures1 = nationwide_measures.to_numeric(nationwide_measures{:,'score'}, errors='coerce')

2 个答案:

答案 0 :(得分:0)

您可以选择分数值为数字的所有nationalwide_measure行,我希望它们采用字符串格式,因此将它们转换为int,然后使用groupby根据measure_id汇总分数。

nationwide_measures1 = nationwide_measures[nationwide_measures['score'].str.isalpha() != True]
nationwide_measures1['score'] = pd.to_numeric(nationwide_measures1['score'])
score_sum = nationwide_measures1.groupby('measure_id')['score'].sum()

希望这会有所帮助 更新:如果你想要sum,mean,min,max,std你可以使用.agg即

import numpy as np
score_sum = nationwide_measures1.groupby('measure_id')['score'].agg([pd.np.sum,pd.np.min, pd.np.max, pd.np.mean, pd.np.std])

答案 1 :(得分:0)

删除具有非数字分数值的元组的答案是:

nationwide_measures1 = nationwide_measures[nationwide_measures['score'].astype(str).str.isdigit()]

我在这里找到了: Pandas select only numeric or integer field from dataframe