我有一个带有“ Innings”列的df,其中包含以下数据:
Innings
[{'InningID': 297503, 'GameID': 47547, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 3}, {'InningID': 297504, 'GameID': 47547, 'InningNumber': 2, 'AwayTeamRuns': 2, 'HomeTeamRuns': 2}, {'InningID': 297505, 'GameID': 47547, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 2}, {'InningID': 297506, 'GameID': 47547, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297507, 'GameID': 47547, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297508, 'GameID': 47547, 'InningNumber': 6, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297509, 'GameID': 47547, 'InningNumber': 7, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297510, 'GameID': 47547, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297512, 'GameID': 47547, 'InningNumber': 9, 'AwayTeamRuns': 1, 'HomeTeamRuns': None}]
[{'InningID': 297511, 'GameID': 47546, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297513, 'GameID': 47546, 'InningNumber': 2, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297514, 'GameID': 47546, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297515, 'GameID': 47546, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297516, 'GameID': 47546, 'InningNumber': 5, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297517, 'GameID': 47546, 'InningNumber': 6, 'AwayTeamRuns': 1, 'HomeTeamRuns': 3}, {'InningID': 297518, 'GameID': 47546, 'InningNumber': 7, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297519, 'GameID': 47546, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 1}, {'InningID': 297520, 'GameID': 47546, 'InningNumber': 9, 'AwayTeamRuns': 1, 'HomeTeamRuns': 2}]
[{'InningID': 297521, 'GameID': 50022, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297522, 'GameID': 50022, 'InningNumber': 2, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297523, 'GameID': 50022, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 1}, {'InningID': 297524, 'GameID': 50022, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297525, 'GameID': 50022, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297526, 'GameID': 50022, 'InningNumber': 6, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297527, 'GameID': 50022, 'InningNumber': 7, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297528, 'GameID': 50022, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 2}, {'InningID': 297529, 'GameID': 50022, 'InningNumber': 9, 'AwayTeamRuns': 3, 'HomeTeamRuns': 1}]
[{'InningID': 297530, 'GameID': 47556, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297532, 'GameID': 47556, 'InningNumber': 2, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297534, 'GameID': 47556, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297536, 'GameID': 47556, 'InningNumber': 4, 'AwayTeamRuns': 2, 'HomeTeamRuns': 0}, {'InningID': 297542, 'GameID': 47556, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297545, 'GameID': 47556, 'InningNumber': 6, 'AwayTeamRuns': 0, 'HomeTeamRuns': 1}, {'InningID': 297549, 'GameID': 47556, 'InningNumber': 7, 'AwayTeamRuns': 0, 'HomeTeamRuns': 2}, {'InningID': 297554, 'GameID': 47556, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 1}, {'InningID': 297558, 'GameID': 47556, 'InningNumber': 9, 'AwayTeamRuns': 0, 'HomeTeamRuns': None}]
[{'InningID': 297531, 'GameID': 47557, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297533, 'GameID': 47557, 'InningNumber': 2, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297535, 'GameID': 47557, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297537, 'GameID': 47557, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297543, 'GameID': 47557, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297546, 'GameID': 47557, 'InningNumber': 6, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297551, 'GameID': 47557, 'InningNumber': 7, 'AwayTeamRuns': 0, 'HomeTeamRuns': 6}, {'InningID': 297560, 'GameID': 47557, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297566, 'GameID': 47557, 'InningNumber': 9, 'AwayTeamRuns': 0, 'HomeTeamRuns': None}]
[{'InningID': 297539, 'GameID': 47549, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297540, 'GameID': 47549, 'InningNumber': 2, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297544, 'GameID': 47549, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297548, 'GameID': 47549, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297553, 'GameID': 47549, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 5}, {'InningID': 297557, 'GameID': 47549, 'InningNumber': 6, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297561, 'GameID': 47549, 'InningNumber': 7, 'AwayTeamRuns': 3, 'HomeTeamRuns': 0}, {'InningID': 297573, 'GameID': 47549, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297579, 'GameID': 47549, 'InningNumber': 9, 'AwayTeamRuns': 0, 'HomeTeamRuns': None}]
[{'InningID': 297538, 'GameID': 47558, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297541, 'GameID': 47558, 'InningNumber': 2, 'AwayTeamRuns': 2, 'HomeTeamRuns': 0}, {'InningID': 297547, 'GameID': 47558, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297552, 'GameID': 47558, 'InningNumber': 4, 'AwayTeamRuns': 2, 'HomeTeamRuns': 0}, {'InningID': 297556, 'GameID': 47558, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 5}, {'InningID': 297562, 'GameID': 47558, 'InningNumber': 6, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297568, 'GameID': 47558, 'InningNumber': 7, 'AwayTeamRuns': 2, 'HomeTeamRuns': 0}, {'InningID': 297577, 'GameID': 47558, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297584, 'GameID': 47558, 'InningNumber': 9, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}]
[{'InningID': 297550, 'GameID': 47548, 'InningNumber': 1, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297555, 'GameID': 47548, 'InningNumber': 2, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297559, 'GameID': 47548, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 2}, {'InningID': 297567, 'GameID': 47548, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297569, 'GameID': 47548, 'InningNumber': 5, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297578, 'GameID': 47548, 'InningNumber': 6, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297583, 'GameID': 47548, 'InningNumber': 7, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297587, 'GameID': 47548, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297591, 'GameID': 47548, 'InningNumber': 9, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297594, 'GameID': 47548, 'InningNumber': 10, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297597, 'GameID': 47548, 'InningNumber': 11, 'AwayTeamRuns': 0, 'HomeTeamRuns': 1}]
[]
[{'InningID': 297564, 'GameID': 47553, 'InningNumber': 1, 'AwayTeamRuns': 2, 'HomeTeamRuns': 0}, {'InningID': 297572, 'GameID': 47553, 'InningNumber': 2, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297574, 'GameID': 47553, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 1}, {'InningID': 297580, 'GameID': 47553, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297586, 'GameID': 47553, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297590, 'GameID': 47553, 'InningNumber': 6, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297596, 'GameID': 47553, 'InningNumber': 7, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297599, 'GameID': 47553, 'InningNumber': 8, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297603, 'GameID': 47553, 'InningNumber': 9, 'AwayTeamRuns': 0, 'HomeTeamRuns': 2}]
[{'InningID': 297563, 'GameID': 47555, 'InningNumber': 1, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297570, 'GameID': 47555, 'InningNumber': 2, 'AwayTeamRuns': 0, 'HomeTeamRuns': 1}, {'InningID': 297575, 'GameID': 47555, 'InningNumber': 3, 'AwayTeamRuns': 0, 'HomeTeamRuns': 5}, {'InningID': 297582, 'GameID': 47555, 'InningNumber': 4, 'AwayTeamRuns': 0, 'HomeTeamRuns': 3}, {'InningID': 297588, 'GameID': 47555, 'InningNumber': 5, 'AwayTeamRuns': 0, 'HomeTeamRuns': 3}, {'InningID': 297593, 'GameID': 47555, 'InningNumber': 6, 'AwayTeamRuns': 0, 'HomeTeamRuns': 0}, {'InningID': 297595, 'GameID': 47555, 'InningNumber': 7, 'AwayTeamRuns': 1, 'HomeTeamRuns': 0}, {'InningID': 297598, 'GameID': 47555, 'InningNumber': 8, 'AwayTeamRuns': 1, 'HomeTeamRuns': 2}, {'InningID': 297601, 'GameID': 47555, 'InningNumber': 9, 'AwayTeamRuns': 0, 'HomeTeamRuns': None}]
我需要将一局列拆分成许多列(行数可能与行数不同...)。 我尝试了爆炸功能,但由于字段数量随机而无法进行拆分。
最好是创建新的列,例如:
InningID1 : xxx, InningNumber1: x, AwayTeamsRuns1: x, HomeTeamRuns1: x, InningID2 : xxx, InningNumber2: x, AwayTeamsRuns2: x, HomeTeamRuns2: x etc...
我不在乎GameId,因为我已经在df的同一行中包含了此信息。
有什么建议吗?
非常感谢
杰弗里
答案 0 :(得分:1)
您可以尝试此操作,首先清理数据,将'替换为',然后将None转换为“ None”,然后遍历DataFrame,将json字符串转换为dict,使用json_normalize
来转换dict到DataFrame并最终concat
:
df['Innings'] = df['Innings'].str.replace("\'", '"')
df['Innings'] = df['Innings'].str.replace("None", '"None"')
dfs = []
for i in range(0, df.shape[0]):
j_str = df.loc[i]['Innings']
x = json.loads(j_str)
dfs.append(pd.json_normalize(x))
df = pd.concat(dfs).reset_index(drop=['index'])
print(df)
输出:
InningID GameID InningNumber AwayTeamRuns HomeTeamRuns
0 297503 47547 1 0 3
1 297504 47547 2 2 2
2 297505 47547 3 0 2
3 297506 47547 4 0 0
4 297507 47547 5 0 0
.. ... ... ... ... ...
87 297588 47555 5 0 3
88 297593 47555 6 0 0
89 297595 47555 7 1 0
90 297598 47555 8 1 2
91 297601 47555 9 0 None
答案 1 :(得分:0)
这似乎是使用DictVectorizer
import pandas as pd
from sklearn.feature_extraction import DictVectorizer
vectorizer = DictVectorizer()
dicts = df.explode("Innings")["Innings"]
data = vectorizer.fit_transform(dicts).todense()
innings_df = pd.DataFrame(data, columns=vectorizer.feature_names_)
缺少键会自动填充零,请确保这是您想要的行为。
现在,您可以根据需要将innings_df
与原始数据帧(df
)合并/合并