Question

我对pandas数据框非常陌生，它具有日期时间列和包含文本字符串（标题）的列。每个标题将是一个新行。

我需要在x轴上绘制日期，并且y轴需要包含每个日期标题出现多少次。

例如，一个日期可能包含3个标题。

最简单的方法是什么？我根本不知道该怎么做。也许为每一行添加另一列，其值为“ 1”？如果是这样，您将如何做？

请向我指出可能会有帮助的方向！

谢谢！

我曾尝试在y上绘制计数，但不断出错，我尝试创建一个对行数进行计数的变量，但该变量也不返回任何有用的信息。

我尝试添加一个带有标题计数的列

df_data['headline_count'] = df_data['headlines'].count

然后我按方法尝试了分组

df_data['count'] = df.groupby('headlines')['headlines'].transform('count')

当我使用groupie时，出现

错误

KeyError: 'headlines'

输出应该只是一个图，其中在y轴上绘制的行中，数据帧中的日期重复了多少次（这表明存在多个标题）。并且x轴应该是观察发生的日期。

Answer 1

将Series.value_counts和[['RESGJG', 'PY', 'rock.dsjjgds.cm', '7937973', '20171049979', '201704059739793', '973979i', 'normal'], ['dshhkdhs', 'sdidydakyd2133@10.10.10.1', 'NotPresent', 'sip:+47668384', 'sip:+08779379972', 'sip:+07073873772@10.0.0.1', 'sip:+878379739', 'sip:+937973962'], ['blshahd', 'ctr', 'part', '7973', '67367672', '797397']]列用于Series.sort_index或GroupBy.size的const startRegex = /^"/gm; const endRegex = /"$/gm; str.replace(startRegex, "<<") str.replace(endRegex, ">>")：

const startRegex = /^"/gm;
const endRegex = /"$/gm;
const str = `"Some text "Text in quotes" something more"`

let result = str.replace(startRegex, "<<")
result = result.replace(endRegex, ">>")

console.log(result);

date

最后一次使用Series.plot：

Series

Answer 2

尝试一下：

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

A = pd.DataFrame(columns=["Date", "Headlines"], data=[["01/03/2018","Cricket"],["01/03/2018","Football"],
                                                    ["02/03/2018","Football"],["01/03/2018","Football"],
                                                    ["02/03/2018","Cricket"],["02/03/2018","Cricket"]] )

您的数据如下：

print (A)

       Date Headlines
0   01/03/2018  Cricket
1   01/03/2018  Football
2   02/03/2018  Football
3   01/03/2018  Football
4   02/03/2018  Cricket
5   02/03/2018  Cricket

现在通过操作对其进行分组：

data = A.groupby(["Date","Headlines"]).size()
print(data)

Date        Headlines
01/03/2018  Cricket      1
            Football     2
02/03/2018  Cricket      2
            Football     1
dtype: int64

您现在可以使用以下代码对其进行绘制：

# set width of bar
barWidth = 0.25

# set height of bar
bars1 = data.loc[(data.index.get_level_values('Headlines') =="Cricket")].values
bars2 = data.loc[(data.index.get_level_values('Headlines') =="Football")].values


# Set position of bar on X axis
r1 = np.arange(len(bars1))
r2 = [x + barWidth for x in r1]

# Make the plot
plt.bar(r1, bars1, color='#7f6d5f', width=barWidth, edgecolor='white', label='Cricket')
plt.bar(r2, bars2, color='#557f2d', width=barWidth, edgecolor='white', label='Football')

# Add xticks on the middle of the group bars
plt.xlabel('group', fontweight='bold')
plt.xticks([r + barWidth for r in range(len(bars1))], data.index.get_level_values('Date').unique())

# Create legend & Show graphic
plt.legend()
plt.xlabel("Date")
plt.ylabel("Count")
plt.show()

希望这会有所帮助！

Answer 3

您尝试过吗：

df2 = df_data.groupby(['headlines']).count()

您应该将其结果保存在新的数据框（df2）中，而不要保存在另一列中，因为groupby的结果将不会具有与原始数据框相同的尺寸。

绘制每个日期的出现次数

3 个答案: