我正在尝试从下面的代码创建2个单独的数据帧:
import pandas as pd
sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')
sportdf = pd.DataFrame(columns = ['Name','Interest'])
sciencedf = pd.DataFrame(columns = ['Name','Interest'])
data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']]
df = pd.DataFrame(data, columns = ['Name', 'Interest'])
s = []
q = []
for i in range(len(df)):
if df.loc[i,"Interest"] in sport:
s.append(df.loc[i,"Name"])
s.append(df.loc[i,"Interest"])
df_length = len(s)
sportdf.loc[df_length] = s
print(df)
else:
q.append(df.loc[i,"Name"])
q.append(df.loc[i,"Interest"])
df_length = len(q)
#sciencedf.loc[df_length] = q
sportdf数据帧的预期输出将是一行,该行是“ tom”和“排球”,而sciencedf将是“ nick”,“ chemistry”和“ juli”“ physics”。
但是在上面的代码中,我成功创建了sportdf,但是由于列表q为['nick','chemistry','juli','physics],所以没有创建sciencedf。我可以用其他方法拆分它,然后添加,但我感觉要使它比实际要难100倍。总结一下:
for every row in df:
if the cell of the 'Interest' column is in the sport tuple:
add the row to the sportdf
if it is not (elif):
add the row to the sciencedf
答案 0 :(得分:1)
pandas isin函数是解决方案:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html
以下代码将为您提供帮助
import pandas as pd
sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')
data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']]
df = pd.DataFrame(data, columns = ['Name', 'Interest'])
# just two lines of isin condition
sciencedf = df.loc[df['Interest'].isin(science)]
sprotdf = df.loc[df['Interest'].isin(sport)]
print(sciencedf)
print(sprotdf)
输出:
Name Interest
1 nick chemistry
2 juli physics
Name Interest
0 tom volleyball
答案 1 :(得分:0)
使用您的信息:
import pandas as pd
sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')
data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']]
然后,您可以将列表推导和if子句一起使用,以构建每个数据帧所需的数据:
sportdata = [ [name, interest] for name, interest in data if interest in sport]
sciencedata = [ [name, interest] for name, interest in data if interest in science]
之后,您可以照常构建每个数据框:
sportdf = pd.DataFrame(sportdata, columns = ['Name', 'Interest'])
sciencedf = pd.DataFrame(sciencedata, columns = ['Name', 'Interest'])
答案 2 :(得分:0)
您可以使用.query方法,下一个解决方案已使用Python 3.7进行了测试。我认为这种解决方案更容易理解。
import pandas as pd
sport = ('basketball','volleyball','football')
science = ('biology','chemistry','physics')
sportdf = pd.DataFrame(columns = ['Name','Interest'])
sciencedf = pd.DataFrame(columns = ['Name','Interest'])
data = [['tom', 'volleyball'], ['nick', 'chemistry'], ['juli', 'physics']]
df = pd.DataFrame(data, columns = ['Name', 'Interest'])
# Only two lines
sportdf = df.query(f"Interest == {sport}")
sciencedf = df.query(f"Interest == {science}")
print(sportdf)
print(sciencedf)
输出:
Name Interest
0 tom volleyball
Name Interest
1 nick chemistry
2 juli physics