我有一个非常大的CSV文件,我设法按列ID排序,但我无法计算具有该列ID的平均列值。
88741,42.84286022,16.41829224,1
88797,42.78081536,16.40743455,1
88797,42.78081536,16.21153455,1
88823,42.51512511,16.43304948,2
88885,42.88204193,16.12412548,2
87227,42.88204193,16.64223948,3
and so on...
我需要在没有SchoolCode列的情况下获得新的csv,并且每个群集的Lat和Long平均值。而且,数字应该是相同的。我试过熊猫它会把这个错误抛给我。
输出应该是这样的:
Lat,Long,Cluster
<average_lat_forCluster1>,<average_long_forCluster1>,1
<average_lat_forCluster2>,<average_long_forCluster2>,2
<average_lat_forCluster3>,<average_long_forCluster3>,3
and so on...
我的代码:
import pandas as pd
df = pd.read_csv('SortedCluster.csv', names=[
'SchoolCode', 'Lat', 'Long', 'Cluster'])
df2 = df.groupby('Cluster')['Lat','Long'].mean()
df2.to_csv('AverageOutput.csv')
错误:
Traceback (most recent call last):
File "averager.py", line 6, in <module>
df2 = df.groupby('Cluster')['Lat','Long'].mean()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 1306, in mean
return self._cython_agg_general('mean', **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 3974, in _cython_agg_general
how, alt=alt, numeric_only=numeric_only, min_count=min_count)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/groupby/groupby.py", line 4046, in _cython_agg_blocks
raise DataError('No numeric types to aggregate')
pandas.core.base.DataError: No numeric types to aggregate
答案 0 :(得分:0)
我认为如有必要,首先需要将值转换为数字:
if (IPV4Interfaces != null)
{
List<UnicastIPAddressInformation> RoutableIpAddresses =
IPV4Interfaces.Where(IF => IF.NetworkInterfaceType == NetworkInterfaceType.Wireless80211)
.Select(IF => IF.GetIPProperties().UnicastAddresses.Last())
.Where(UniIP => UniIP.IsDnsEligible).ToList();
}
然后按群组汇总df[['Lat','Long']] = df[['Lat','Long']].apply(pd.to_numeric, errors='coerce')
:
mean