并且正在尝试将在前两列中具有NaNs
的行向左移动,因此,现在右边的值将填充此列。这是我目前正在尝试做的事情:
(注意:match
数据帧是从以下链接下载的:https://www.kaggle.com/hugomathien/soccer)
#original dataframe
<class 'pandas.core.frame.DataFrame'>
Int64Index: 21374 entries, 145 to 25978
Data columns (total 47 columns):
id 21374 non-null int64
country_id 21374 non-null int64
league_id 21374 non-null int64
season 21374 non-null object
stage 21374 non-null int64
date 21374 non-null object
match_api_id 21374 non-null int64
home_team_api_id 21374 non-null int64
away_team_api_id 21374 non-null int64
home_team_goal 21374 non-null int64
away_team_goal 21374 non-null int64
goal 13325 non-null object
shoton 13325 non-null object
shotoff 13325 non-null object
foulcommit 13325 non-null object
card 13325 non-null object
cross 13325 non-null object
corner 13325 non-null object
possession 13325 non-null object
BSA 11856 non-null float64
Home Team 21374 non-null object
Away Team 21374 non-null object
League 21374 non-null object
Country 21374 non-null object
home_player_1 21374 non-null object
home_player_2 21374 non-null object
home_player_3 21374 non-null object
home_player_4 21374 non-null object
home_player_5 21374 non-null object
home_player_6 21374 non-null object
home_player_7 21374 non-null object
home_player_8 21374 non-null object
home_player_9 21374 non-null object
home_player_10 21374 non-null object
home_player_11 21374 non-null object
away_player_1 21374 non-null object
away_player_2 21374 non-null object
away_player_3 21374 non-null object
away_player_4 21374 non-null object
away_player_5 21374 non-null object
away_player_6 21374 non-null object
away_player_7 21374 non-null object
away_player_8 21374 non-null object
away_player_9 21374 non-null object
away_player_10 21374 non-null object
away_player_11 21374 non-null object
winner 21374 non-null object
dtypes: float64(1), int64(9), object(37)
memory usage: 7.8+ MB
创建数据框
columns = match.columns[match.columns.get_loc('home_player_1'):match.columns.get_loc('away_player_1')+1].values
columns = list(columns)
player_appearences = match.groupby(columns[0]).size().reset_index()
player_appearences.rename(columns = {0:"Count_{}".format(player_appearences.columns[0][len(player_appearences.columns[0])-1])}, inplace = True, errors='raise')
player_appearences
for i in range(1,12):
player_appearences2 = match.groupby(columns[i]).size().reset_index()
player_appearences2
player_appearences2.rename(columns = {0:"Count_{}".format(player_appearences2.columns[0][len(player_appearences2.columns[0])-1])}, inplace = True, errors='raise')
player_appearences = player_appearences.merge(right = player_appearences2,how="outer",left_on ="{}".format(player_appearences.columns[0]),right_on = "{}".format(player_appearences2.columns[0]))
player_appearences
#overwrite nans in first column with names in current [i] player column
#select rows where first two columns give nan values
player_appearences.loc[(player_appearences.loc[:,"home_player_1"].isna()==True) & (player_appearences.loc[:,"Count_1"].isna()==True),["home_player_1","Count_1"]] = player_appearences.loc[(player_appearences.loc[:,"home_player_1"].isna()==True) & (player_appearences.loc[:,"Count_1"].isna()==True),["home_player_2","Count_2"]]
当我然后打印player_appearences
时,数据框未更改。我不确定它是什么也不做,还是正在创建原始数据帧的副本。谁能告诉我为什么没有这种方法/建议如果有更好的方法呢?
答案 0 :(得分:1)
使用DataFrame.rename
,则只需要DataFrame.stack
(默认情况下为 dropna = True
)+ DataFrame.unstack
:
df = (df.rename(columns = {'home_player_2':'home_player_1',
'Count_2':'Count_1'}).stack().unstack()
.reindex(columns = df.columns[:2]))
print(df)
home_player_1 Count_1
0 Aaron 1
1 Adam 2
2 Ziggy 3
3 Zoltan 4
或将DataFrame.shift
与DataFrame.where
:
df.where(df.notna(),df.shift(-1,axis = 1)).iloc[:,:2]
home_player_1 Count_1
0 Aaron 1.0
1 Adam 2.0
2 Ziggy 3.0
3 Zoltan 4.0
详细信息
print(df.where(df.notna(),df.shift(-1,axis = 1)))
home_player_1 Count_1 home_player_2 Count_2
0 Aaron 1.0 NaN NaN
1 Adam 2.0 NaN NaN
2 Ziggy 3.0 Ziggy 3.0
3 Zoltan 4.0 Zoltan 4.0
答案 1 :(得分:1)
您可以使用shift(-1, axis=1)
移动列,并使用df[df.home_player_1.isna() & df.Count_1.isna()]
指定要影响的行。您要移动的行应在数据框中重写。
df = pd.DataFrame([['Aaron', 1, None, None],
['Adam', 2, None, None],
[None, None, 'Ziggy', 3],
[None, None, 'Zoltan', 4]],
columns=['home_player_1', 'Count_1', 'home_player_2', 'Count_2'])
home_player_1 Count_1 home_player_2 Count_2
Aaron 1.0 None NaN
Adam 2.0 None NaN
None NaN Ziggy 3.0
None NaN Zoltan 4.0
df[df.home_player_1.isna() & df.Count_1.isna()] = df[df.home_player_1.isna() & df.Count_1.isna()].shift(-1, axis=1)
home_player_1 Count_1 home_player_2 Count_2
Aaron 1.0 None NaN
Adam 2.0 None NaN
Ziggy 3.0 NaN NaN
Zoltan 4.0 NaN NaN