Question

我需要在熊猫中乘以不舍入错误的列（保持总数相同）。

所以我有一个看起来像这样的数据框（称为Combined_df）：

| areaid | districtid | percent | home | job |
|  89012 | 55         | 1.0     | 70   | 20  |
| 123048 | 442        | 0.984496| 100  | 10  |
| 123048 | 34536      | 0.015504| 100  | 10  |

areaid
  -城市内较小的区域
  -例如在区域123048中：居民100人，工作10人

districtid
  -城市内较大的区域
  -例如areaid在两个区442和34536中

我需要计算每个地区有多少人居住和去上班（结果应为整数）。我们可以假设人们在每个区域内是均匀分布的，所以这只是百分比列与家庭/工作列相乘，然后按districtid列进行分组的问题。

我做了什么：

def count_people(percent, people):
    return np.around(percent * people)

result = pd.DataFrame()
result['districtid'] = combined_df['districtid']
result['area_district_home'] = count_people(combined_df['percent'], combined_df['home'])
result['area_district_job'] = count_people(combined_df['percent'], combined_df['job'])
# total residents:
total_home = sum(result.groupby('districtid')['area_district_home'].sum())

但是，如果我将所有居民相加，他们将不等于areaid中的所有居民。我认为这是由于舍入错误。错误将非常小（1900万人口为17 ppl）。

有没有一种方法可以更精确地计算每个地区的居民和工人？此时，我不确定为什么会有此舍入错误，因为如果将0.984496 * 100舍入为98，则应将0.015504 * 100舍入为2，并且总和相等。

Answer 1

Python有一个内置的round（）函数，该函数接受两个数字参数n和ndigits，并返回将n舍入为ndigits的数字。 ndigits参数默认为零，因此将其忽略会导致数字四舍五入为整数。如您所见，round（）可能无法按预期工作。

在熊猫中乘法列而没有舍入错误

1 个答案: