我想做什么

Question

我想做什么

我想报告多个用户的每周拒绝率。我使用for循环来浏览每月数据集以获取每个用户的数字。最终的数据框rates应该类似于：

The end product, rates

描述

我有一个初始数据框（numbers），它只包含ACCEPT，REJECT和REVIEW数字，我添加了这些行和列：

行：总计，拒绝率
列：总计

以下是numbers的样子：

|---|--------|--------|--------|--------|-------------|
|   | Week 1 | Week 2 | Week 3 | Week 4 | Grand Total | 
|---|--------|--------|--------|--------|-------------|
| 0 |  994   |  699   |  529   |   877  |     3099    | 
|---|--------|--------|--------|--------|-------------|
| 1 |   27   |   7    |    8   |   13   |      55     |
|---|--------|--------|--------|--------|-------------|
| 2 |  100   |   86   |   64   |   107  |      357    |
|---|--------|--------|--------|--------|-------------|
| 3 |  1121  |  792   |  601   |  997   |    3511     |
|---|--------|--------|--------|--------|-------------|

索引代表以下值：

0 - 接受
1 - REJECT
2 - REVIEW
3 - TOTAL（接受+拒绝+评论）

我写了两个预定义的函数：

get_decline_rates(df)：在numbers数据框中按周获得下降率。
copy(empty_df, data)：使用＆＃34; double＆＃34;将所有数据传输到新数据帧标题（用于报告目的）。

这是我的代码，我将行和列添加到numbers，然后重新格式化：

# Adding "Grand Total" column and rows
totals = numbers.sum(axis=0) # column sum
numbers = numbers.append(totals, ignore_index=True)
grand_total = numbers.sum(axis=1) # row sum
numbers.insert(len(numbers.columns), "Grand Total", grand_total)

# Adding "Rejection Rate" and re-indexing numbers
decline_rates = get_decline_rates(numbers)
numbers = numbers.append(decline_rates, ignore_index=True)
numbers.index = ["ACCEPT","REJECT","REVIEW","Grand Total","Rejection Rate"]

# Creating a new df with report format requirements 
final = pd.DataFrame(0, columns=numbers.columns, index=["User A"]+list(numbers.index))
final.ix["User A",:] = final.columns

# Copying data from numbers to newly formatted df
copy(final,numbers) 

# Append final df of this user to the final dataframe
rates = rates.append(final)

我使用 Python 3.5.2 和 Pandas 0.19.2 。如果有帮助，请参考以下初始数据集：

Data format

我在日期列上重新采样以按周获取数据。

出了什么问题

这里有趣的部分 - 代码运行正常，我在rates中获得了所有必需的信息。但是，我看到了这条警告信息：

RuntimeWarning：longlong_scalars中遇到的值无效

如果我分解代码并逐行运行，则不会显示此消息。甚至这条消息看起来也很奇怪（ longlong_scalars 甚至意味着什么？）有谁知道这条警告信息是什么意思，以及导致它的原因是什么？

更新

我刚刚运行了一个类似的脚本，它接受完全相同的输入并产生类似的输出（除了我得到每日拒绝率而不是每周）。我得到相同的运行时警告，除了提供更多信息：

RuntimeWarning：longlong_scalars中遇到的值无效

rej_rate = str（int（round（（col.ix [1] /col.ix [3]）* 100）））+＆＃34;％＆＃34;

我怀疑当我试图用我预定义的函数get_decline_rates(df)计算下降率时，肯定会出现问题。可能是由于值的dtype？输入df numbers上的所有列均为int64。

以下是我的预定义功能的代码（输入numbers，可在说明下找到）：

# Description: Get rejection rates for all weeks.
# Parameters: Pandas Dataframe with ACCEPT, REJECT, REVIEW count by week.
# Output: Pandas Series with rejection rates for all days in input df.
def get_decline_rates(df):
    decline_rates = []
    for i in range(len(df.columns)):
        col = df.ix[:,i]

        try:
            rej_rate = str(int(round((col[1]/col[3])*100))) + "%"
        except ValueError:
            rej_rate = "0%"

        decline_rates.append(rej_rate)

    return pd.Series(decline_rates, index=df.columns)

Answer 1

我有相同的RuntimeWarning，在查看数据后，这是因为空分。我没有时间查看您的示例，但您可以查看id = 0或其他一些记录，其中可能出现零分割等。

RuntimeWarning：longlong_scalars

我想做什么

描述

出了什么问题

1 个答案: