Question

我正在尝试从下面的代码创建2个单独的数据帧：

import pandas as pd

sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')

sportdf = pd.DataFrame(columns = ['Name','Interest'])
sciencedf = pd.DataFrame(columns = ['Name','Interest'])

data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']] 
  
df = pd.DataFrame(data, columns = ['Name', 'Interest'])

s = []
q = []
for i in range(len(df)):
    if df.loc[i,"Interest"] in sport:
        s.append(df.loc[i,"Name"])
        s.append(df.loc[i,"Interest"])
        df_length = len(s)
        sportdf.loc[df_length] = s
        print(df)
    else:
        q.append(df.loc[i,"Name"])
        q.append(df.loc[i,"Interest"])
        df_length = len(q)
        #sciencedf.loc[df_length] = q

sportdf数据帧的预期输出将是一行，该行是“ tom”和“排球”，而sciencedf将是“ nick”，“ chemistry”和“ juli”“ physics”。

但是在上面的代码中，我成功创建了sportdf，但是由于列表q为['nick'，'chemistry'，'juli'，'physics]，所以没有创建sciencedf。我可以用其他方法拆分它，然后添加，但我感觉要使它比实际要难100倍。总结一下：

for every row in df:
if the cell of the 'Interest' column is in the sport tuple:
add the row to the sportdf
if it is not (elif):
add the row to the sciencedf

Answer 1

pandas isin函数是解决方案：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html

以下代码将为您提供帮助

import pandas as pd

sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')
data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']] 
df = pd.DataFrame(data, columns = ['Name', 'Interest'])

# just two lines of isin condition
sciencedf = df.loc[df['Interest'].isin(science)]
sprotdf = df.loc[df['Interest'].isin(sport)]

print(sciencedf)
print(sprotdf)

输出：

   Name   Interest
1  nick  chemistry
2  juli    physics
 
 Name    Interest
0  tom  volleyball

Answer 2

使用您的信息：

import pandas as pd

sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')

data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']]

然后，您可以将列表推导和if子句一起使用，以构建每个数据帧所需的数据：

  
sportdata = [ [name, interest] for name, interest in data if interest in sport]
sciencedata = [ [name, interest] for name, interest in data if interest in science]

之后，您可以照常构建每个数据框：

sportdf = pd.DataFrame(sportdata, columns = ['Name', 'Interest'])
sciencedf = pd.DataFrame(sciencedata, columns = ['Name', 'Interest'])

Answer 3

您可以使用.query方法，下一个解决方案已使用Python 3.7进行了测试。我认为这种解决方案更容易理解。

import pandas as pd
sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')

sportdf = pd.DataFrame(columns = ['Name','Interest'])
sciencedf = pd.DataFrame(columns = ['Name','Interest'])

data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']] 
  
df = pd.DataFrame(data, columns = ['Name', 'Interest'])

# Only two lines
sportdf = df.query(f"Interest == {sport}")
sciencedf = df.query(f"Interest == {science}")

print(sportdf)
print(sciencedf)

输出：

    Name    Interest
0   tom volleyball

    Name    Interest
1   nick    chemistry
2   juli    physics

根据条件创建新的数据框

3 个答案: