我有一个数据框df1
Questions Purpose
what is scientific name of <input> scientific name
what is english name of <input> english name
我有2个如下列表:
name1 = ['salt','water','sugar']
name2 = ['sodium chloride','dihydrogen monoxide','sucrose']
我想通过将<input>
替换为列表中的值来创建新的数据框,具体取决于目的。
如果目的是英文名称,请用<input>
中的值替换name2
否则将<input>
替换为name1
。
预期的输出数据框:
Questions Purpose
what is scientific name of salt scientific name
what is scientific name of water scientific name
what is scientific name of sugar scientific name
what is english name of sodium chloride english name
what is english name of dihydrogen monoxide english name
what is english name of sucrose english name
我的努力
questions = []
purposes = []
for i, row in df1.iterrows():
if row['Purpose'] == 'scientific name':
for name in name1:
ques = row['Questions'].replace('<input>', name)
questions.append(ques)
purposes.append(row['Purpose'])
else:
for name in name2:
ques = row['Questions'].replace('<input>', name)
questions.append(ques)
purposes.append(row['Purpose'])
df = pd.DataFrame({'Questions':questions, 'Purpose':purposes})
上面的代码产生预期的输出。但这太慢了,因为我对原始dataframe
有很多疑问。 (我也有多个目的,但目前仅坚持2个。)
我正在寻找一种更有效的解决方案,可以摆脱for
循环。
答案 0 :(得分:2)
您可以执行的一种方法是,通过列表理解遍历Questions
并将<input>
替换为相应的name
。为了使每个Question
重复namesx
中的字段,您可以使用itertools.cycle
:
from itertools import cycle
names = [name1, name2]
new = [[i.replace('<input>', j), purpose]
for row, purpose, name in zip(df.Questions, df.Purpose, names)
for i,j in zip(cycle([row]), name)]
pd.DataFrame(new, columns=df.columns)
Questions Purpose
0 what is scientific name of salt scientific name
1 what is scientific name of water scientific name
2 what is scientific name of sugar scientific name
3 what is english name of sodium chloride english name
4 what is english name of dihydrogen monoxide english name
5 what is english name of sucrose english name
答案 1 :(得分:1)
我使用pd.concat()
做过类似的事情,您可以尝试:
names = name1+name2
df_new = pd.concat([df.loc[df.Purpose.eq('scientific name')]]*len(name1))\
.append(pd.concat([df.loc[df.Purpose.eq('english name')]]*len(name2)),ignore_index=True)
for e,i in enumerate(names):
df_new.Questions.loc[e]=df_new.Questions.loc[e].replace('<input>',i)
print(df_new)
Questions Purpose
0 what is scientific name of salt scientific name
1 what is scientific name of water scientific name
2 what is scientific name of sugar scientific name
3 what is english name of sodium chloride english name
4 what is english name of dihydrogen monoxide english name
5 what is english name of sucrose english name