想要将数据帧的Pandas group输出到CSV。尝试了各种StackOverflow解决方案,但它们没有奏效。
Python 3.6.1,Pandas 0.20.1
groupby结果如下:
id month year count
week
0 9066 82 32142 895
1 7679 84 30112 749
2 8368 126 42187 872
3 11038 102 34165 976
4 8815 117 34122 767
5 10979 163 50225 1252
6 8726 142 38159 996
7 5568 63 26143 582
想要一个看起来像
的csvweek count
0 895
1 749
2 872
3 976
4 767
5 1252
6 996
7 582
当前代码:
week_grouped = df.groupby('week')
week_grouped.sum() #At this point you have the groupby result
week_grouped.to_csv('week_grouped.csv') #Can't do this - .to_csv is not a df function.
阅读SO解决方案:
output groupby to csv file pandas
week_grouped.drop_duplicates().to_csv('week_grouped.csv')
结果: AttributeError:无法访问'DataFrameGroupBy'对象的可调用属性'drop_duplicates',请尝试使用'apply'方法
Python pandas - writing groupby output to file
week_grouped.reset_index().to_csv('week_grouped.csv')
结果: AttributeError:“无法访问'DataFrameGroupBy'对象的可调用属性'reset_index',请尝试使用'apply'方法”
答案 0 :(得分:6)
尝试这样做:
week_grouped = df.groupby('week')
week_grouped.sum().reset_index().to_csv('week_grouped.csv')
将整个数据帧写入文件。如果你只想要那两列,
week_grouped = df.groupby('week')
week_grouped.sum().reset_index()[['week', 'count']].to_csv('week_grouped.csv')
以下是原始代码的逐行说明:
# This creates a "groupby" object (not a dataframe object)
# and you store it in the week_grouped variable.
week_grouped = df.groupby('week')
# This instructs pandas to sum up all the numeric type columns in each
# group. This returns a dataframe where each row is the sum of the
# group's numeric columns. You're not storing this dataframe in your
# example.
week_grouped.sum()
# Here you're calling the to_csv method on a groupby object... but
# that object type doesn't have that method. Dataframes have that method.
# So we should store the previous line's result (a dataframe) into a variable
# and then call its to_csv method.
week_grouped.to_csv('week_grouped.csv')
# Like this:
summed_weeks = week_grouped.sum()
summed_weeks.to_csv('...')
# Or with less typing simply
week_grouped.sum().to_csv('...')
答案 1 :(得分:3)
尝试将第二行更改为SELECT u.[UserName], l.*
FROM [LoginStatus] l
JOIN [Users] u ON u.id = l.user_id
并重新运行所有三行。
如果您在自己的Jupyter笔记本单元格中运行week_grouped = week_grouped.sum()
,您将看到语句如何将输出输出到单元格的输出,而不是分配结果回到week_grouped.sum()
。有些pandas方法有week_grouped
个参数(例如inplace=True
),但df.sort_values(by=col_name, inplace=True)
没有。
编辑:每周的数字只会在您的CSV中出现一次吗?如果是这样,这是一个不使用sum
的更简单的解决方案:
groupby
答案 2 :(得分:1)
Pandas groupby会生成很多信息(计数,均值,标准,...)。如果要将它们全部保存在一个csv文件中,首先需要将其转换为常规数据框:
import pandas as pd
...
...
MyGroupDataFrame = MyDataFrame.groupby('id')
pd.DataFrame(MyGroupDataFrame.describe()).to_csv("myTSVFile.tsv", sep='\t', encoding='utf-8')
答案 3 :(得分:0)
我觉得没有必要使用groupby,你可以删除你不想要的列。
df = df.drop(['month','year'],axis==1)
df.reset_index()
df.to_csv('Your path')
答案 4 :(得分:0)
Group By返回键,值对,其中key是组的标识符,值是组本身,即与键匹配的原始df的子集。
在您的示例中,week_grouped = df.groupby('week')
是一组组(pandas.core.groupby.DataFrameGroupBy对象),您可以按如下方式详细探索:
for k, gr in week_grouped:
# do your stuff instead of print
print(k)
print(type(gr)) # This will output <class 'pandas.core.frame.DataFrame'>
print(gr)
# You can save each 'gr' in a csv as follows
gr.to_csv('{}.csv'.format(k))
或者您也可以在分组对象上计算聚合函数
result = week_grouped.sum()
# This will be already one row per key and its aggregation result
result.to_csv('result.csv')
在您的示例中,您需要将函数结果分配给某个变量,因为默认情况下pandas对象是不可变的。
some_variable = week_grouped.sum()
some_variable.to_csv('week_grouped.csv') # This will work
基本上result.csv和week_grouped.csv是相同的