这是我的代码:
import pandas as pd
import numpy as np
# read dataframe
df = pd.read_csv("loc-brightkite_totalCheckins.txt", usecols=["location_id", "user"], delim_whitespace=True, names=["user", "check_in_time", "latitude", "longitude", "location_id"])
# remove duplicates (regarding location and user)
df = df.drop_duplicates(subset=["user", "location_id"])
#group by the locations, make each a series of users, count users
distinct_location_users = df.groupby('location_id')['user'].agg(lambda user_series: len(user_series))
# print top 10 locations
top_10 = distinct_location_users.order().tail(11)
print top_10
top_10.plot(kind="bar")
我收到了这个错误:
TypeError Traceback (most recent call last)
<ipython-input-7-5c9c8115e794> in <module>()
6
7 # remove duplicates (regarding location and user)
----> 8 df = df.drop_duplicates(subset=["user", "location_id"])
9
10 #group by the locations, make each a series of users, count users
TypeError: drop_duplicates() got an unexpected keyword argument 'subset'
答案 0 :(得分:5)
正如您在此处所见:http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.drop_duplicates.html
&#34;子集&#34;不是&#34; drop_duplicates&#34;的授权关键字方法
我认为你可以使用&#34; cols&#34;而不是&#34;子集&#34;。
答案 1 :(得分:1)
您正在以错误的方式使用drop_duplicates
功能。看看pandas的drop_duplicates接受了什么参数。
关于Panda drop_duplicates的短搜索会产生Panda中两个drop_duplicates
方法之一的文档(另一个是类系列)。
DataFrame.drop_duplicates(cols=None, take_last=False, inplace=False)