Question

我想用groupby标签将pandas数据帧转换为numpy数组。在groupby中，我必须使用regex进行分组，因此使用它的标签很重要。

我的数据格式为：

start_date,is_member 

2014-04-15 00:01,1
2014-04-15 00:01,1
2014-04-15 01:01,1
2014-04-15 01:01,1
2014-04-15 02:02,1
2014-04-15 03:05,1

我尝试过

df = pd.read_csv(filename, header=0)
df = df.groupby(df.start_date.str.extract("^(.*?)\:", expand=False))[['start_date']].count()[['start_date']]
print(df)

数据框的输出为

start_date               
2014-04-15 00           2
2014-04-15 01           2
2014-04-15 02           1
2014-04-15 03           1

我尝试用

转换为numpy数组

numpy_array = df.values

numpy数组的输出只是计数值

[[2]
 [2]
 [1]
 [1]]

我希望将startdate作为一列。

[[2014-04-15 00 2]
 [2014-04-15 01 2]
 [2014-04-15 02 1]
 [2014-04-15 03 1]]

Answer 1

我相信您需要按DataFrame.reset_index将索引转换为列：

#simplify code 
df = df.groupby(df.start_date.str.extract("^(.*?)\:", expand=False))['start_date'].count()

numpy_array = df.rename_axis('index').reset_index().values
print (numpy_array)
[['2014-04-15 00' 2]
 ['2014-04-15 01' 2]
 ['2014-04-15 02' 1]
 ['2014-04-15 03' 1]]

或for pandas 0.24+使用：

numpy_array = df.rename_axis('index').reset_index().to_numpy()

熊猫数据框groupby将标签包含在numpy数组中

1 个答案: