Question

很抱歉，如果我的问题不清楚，很难用简短的单词来表示标题

我有两个长度和结构不同的数据框。一个是简单的单列df年，并且是当年的世界大赛冠军。另一个是更大的df，其中以year和team为复合索引，而各种统计信息类别为不同的列。我想在过程df中添加一个额外的列，称为“获胜者”，如果那支球队是当年的冠军，它将有1或0。我该如何用Python做到这一点？我可以在excel导入到熊猫的过程中手动完成此操作，但是这将永远永久。

这就是我在想的，但不确定它是否会起作用。我想在运行之前寻求帮助，因为将数据永久存储到数据框中并合并它们需要花费很多时间

这是获取我到目前为止的统计数据的代码。优胜者表I

import os
import time
import pandas as pd


dfs = pd.read_html("https://en.wikipedia.org/wiki/List_of_World_Series_champions")
winners = dfs[1].iloc[66:,1:2]


urls = []

for i in range(1969, 2018, 1):
    urls.append('https://www.baseball-reference.com/leagues/MLB/'+str(i)+'.shtml')

clean_tables = []
for url in urls:
    tables = pd.read_html(url, attrs = {'id': 'table'})   
    for table in tables:
        table = table.iloc[:30,:]
        table.set_index(['Year','Tm'], inplace=True)
        clean_tables.append(table)'''

stats = pd.concat(clean_tables)


winners = pd.read_csv("World_Series_winners.csv")

这就是我想做的

for i, j in stats, winners: 
           if i.index == j.index and i.index[1] == j:
               i['winner'] == 1
           else i['winner'] == 0

循环遍历两个不同长度的数据帧，并且其中一个的索引和值均与另一个的复合索引相匹配

0 个答案: