我正在尝试从此df中获取以下输出。它是由Django查询构造而成的,该查询被转换为df:
messages = Message.objects.all()
df = pd.DataFrame.from_records(messages.values())
+---+-----------------+------------+---------------------+
| | date_time | error_desc | text |
+---+-----------------+------------+---------------------+
| 0 | 3/31/2019 12:35 | Error msg | Hello there |
| 1 | 3/31/2019 12:35 | | Nothing really here |
| 2 | 4/1/2019 12:35 | Error msg | What if I told you |
| 3 | 4/1/2019 12:35 | | Yes |
| 4 | 4/1/2019 12:35 | Error Msg | Maybe |
| 5 | 4/2/2019 12:35 | | Sure I could |
| 6 | 4/2/2019 12:35 | | Hello again |
+---+-----------------+------------+---------------------+
输出:
+-----------+-------------+--------+-----------------------------+--------------+
| date | Total count | Errors | Greeting (start with hello) | errors/total |
+-----------+-------------+--------+-----------------------------+--------------+
| 3/31/2019 | 2 | 1 | 1 | 50% |
| 4/1/2019 | 3 | 2 | 0 | 66.67% |
| 4/2/2019 | 2 | 0 | 1 | 0% |
+-----------+-------------+--------+-----------------------------+--------------+
我可以使用下面的代码部分地到达那里,但是这样做似乎有些a回。我会根据每个人是否符合条件给他们打上“是” /“否”的标记,然后进行分组。
df['date'] = df['date_time'].dt.date
df['greeting'] = np.where(df["text"].str.lower().str.startswith('hello'), "Yes", "No")
df['error'] = np.where(df["error_desc"].notnull(), "Yes", "No")
df.set_index("date")
.groupby(level="date")
.apply(lambda g: g.apply(pd.value_counts))
.unstack(level=1)
.fillna(0)
这会产生计数,但会在多个“是/否”列中。
在这一点之后我可以做一些操作,但是有没有更有效的方法来计算我想要的输出?
答案 0 :(得分:0)
您可以在多列上使用lambda
:
df.groupby('date').apply(lambda x:
pd.Series({'total_count': len(x),
'error_count': (x['error'] == 'Yes').sum(),
'hello_count': (x['greeting'] == 'Yes').sum()}))
要计算比率:
df['errors/total'] = df['error_count'] / df['total_count']
答案 1 :(得分:0)
这是我尝试过的,为您提供了想要的答案:
df['date_time'] = pd.to_datetime(df['date_time']).dt.date
df1=pd.DataFrame()
df1['total count'] = df['date_time'].groupby(df['date_time']).count()
df1['errors'] = df['error_desc'].groupby(df['date_time']).count()
df1['Greeting'] = df['text'].groupby(df['date_time']).apply(lambda x: x[x.str.lower().str.startswith('hello')].count())
df1['errors/total'] = round(df1['errors']/df1['total count']*100,2)