我有多个csv,其中包含一些列,例如:
match_id, start_time, win, leaguename, team, opposing, min
2992096687, 1486840800, True, Captains Draft, 3729377, 2642171, 1453382256
2992217489, 1486845476, True, Captains Draft, 3729377, 2642171, 1453382256
2659805546, 1474478411, False,The BTS, 55 , 2642171, 1454281287
2760844196, 1478440750, True, ESL One 2016, 1883502, 2642171, 1459782261
...and so on
我想加入所有csv,按照' min'在通过' leaguename'对其进行分组时,删除重复的匹配项。
我尝试使用此代码执行此操作:
import pandas as pd
af = pd.read_csv('af.csv',keep_default_na=False,na_values=[""])
dc = pd.read_csv('dc.csv',keep_default_na=False,na_values=[""])
eg = pd.read_csv('eg.csv',keep_default_na=False,na_values=[""])
ehome = pd.read_csv('ehome.csv',keep_default_na=False,na_values=[""])
fnatic = pd.read_csv('fnatic.csv',keep_default_na=False,na_values=[""])
ig = pd.read_csv('ig.csv',keep_default_na=False,na_values=[""])
lgd = pd.read_csv('lgd.csv',keep_default_na=False,na_values=[""])
liquid= pd.read_csv('liquid.csv',keep_default_na=False,na_values=[""])
mvp = pd.read_csv('mvp.csv',keep_default_na=False,na_values=[""])
newbee = pd.read_csv('newbee.csv',keep_default_na=False,na_values=[""])
og = pd.read_csv('og.csv',keep_default_na=False,na_values=[""])
secret = pd.read_csv('secret.csv',keep_default_na=False,na_values=[""])
vp = pd.read_csv('vp.csv',keep_default_na=False,na_values=[""])
wings = pd.read_csv('wings.csv',keep_default_na=False,na_values=[""])
df = pd.concat([af, dc, eg, ehome, fnatic, ig, lgd, liquid, mvp, newbee, og, secret, vp, wings],axis=0).drop_duplicates()
df2 = df.sort_values("min").groupby("leaguename", as_index=False)
df2.to_csv('out.csv')
返回
AttributeError: Cannot access callable attribute 'to_csv' of 'DataFrameGroupBy' objects, try using the 'apply' method
我该如何解决这个问题?
编辑:我尝试使用
df2 = df.apply(pd.DataFrame.sort_values, 'min').groupby("leaguename", as_index=False)
并返回另一个错误:
ValueError: No axis named min for object type <class 'pandas.core.frame.DataFrame'>
然后我试了
df2 = df.apply(pd.DataFrame.sort_values, 'min',axis=0).groupby("leaguename", as_index=False)
它返回
TypeError: apply() got multiple values for argument 'axis'
我还有一个关于重复的小问题。如何删除不完全重复的重复项(合并csv之后)? 例如:
2992217489, 1486845476, True, Captains Draft, 3729377, 2642171, 1453382256
2992217489, 1486845476, False,Captains Draft, 2642171, 3729377, 1453382256
在上面的数据中,第1行和第2行是重复的,因为它是相同的匹配(相同的match_id),但是&#39; team&#39;,&#39;反对&#39;并且&#39;赢得&#39;是不同的数据。