根据条件创建新的数据框

时间:2020-10-16 01:03:29

标签: python pandas dataframe

我正在尝试从下面的代码创建2个单独的数据帧:

import pandas as pd

sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')

sportdf = pd.DataFrame(columns = ['Name','Interest'])
sciencedf = pd.DataFrame(columns = ['Name','Interest'])

data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']] 
  
df = pd.DataFrame(data, columns = ['Name', 'Interest'])

s = []
q = []
for i in range(len(df)):
    if df.loc[i,"Interest"] in sport:
        s.append(df.loc[i,"Name"])
        s.append(df.loc[i,"Interest"])
        df_length = len(s)
        sportdf.loc[df_length] = s
        print(df)
    else:
        q.append(df.loc[i,"Name"])
        q.append(df.loc[i,"Interest"])
        df_length = len(q)
        #sciencedf.loc[df_length] = q 

sportdf数据帧的预期输出将是一行,该行是“ tom”和“排球”,而sciencedf将是“ nick”,“ chemistry”和“ juli”“ physics”。

但是在上面的代码中,我成功创建了sportdf,但是由于列表q为['nick','chemistry','juli','physics],所以没有创建sciencedf。我可以用其他方法拆分它,然后添加,但我感觉要使它比实际要难100倍。总结一下:

for every row in df:
if the cell of the 'Interest' column is in the sport tuple:
add the row to the sportdf
if it is not (elif):
add the row to the sciencedf

3 个答案:

答案 0 :(得分:1)

pandas isin函数是解决方案:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html

以下代码将为您提供帮助

import pandas as pd

sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')
data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']] 
df = pd.DataFrame(data, columns = ['Name', 'Interest'])

# just two lines of isin condition
sciencedf = df.loc[df['Interest'].isin(science)]
sprotdf = df.loc[df['Interest'].isin(sport)]

print(sciencedf)
print(sprotdf)

输出:

   Name   Interest
1  nick  chemistry
2  juli    physics
 
 Name    Interest
0  tom  volleyball

答案 1 :(得分:0)

使用您的信息:

import pandas as pd

sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')

data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']] 

然后,您可以将列表推导和if子句一起使用,以构建每个数据帧所需的数据:

  
sportdata = [ [name, interest] for name, interest in data if interest in sport]
sciencedata = [ [name, interest] for name, interest in data if interest in science]
 

之后,您可以照常构建每个数据框:

sportdf = pd.DataFrame(sportdata, columns = ['Name', 'Interest'])
sciencedf = pd.DataFrame(sciencedata, columns = ['Name', 'Interest'])

答案 2 :(得分:0)

您可以使用.query方法,下一个解决方案已使用Python 3.7进行了测试。我认为这种解决方案更容易理解。

import pandas as pd
sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')

sportdf = pd.DataFrame(columns = ['Name','Interest'])
sciencedf = pd.DataFrame(columns = ['Name','Interest'])

data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']] 
  
df = pd.DataFrame(data, columns = ['Name', 'Interest'])

# Only two lines
sportdf = df.query(f"Interest == {sport}")
sciencedf = df.query(f"Interest == {science}")

print(sportdf)
print(sciencedf)

输出:

    Name    Interest
0   tom volleyball

    Name    Interest
1   nick    chemistry
2   juli    physics