我有一个CSV文件,其中包含以下数据:
NAME | AGE | COLLEGE | BRANCH | Qualification
-------------------------------------------------------
sai | 21 | FG | CSE | B.Tech
Kiran | 22 | FG | EEE | M.Tech
Anil | 21 | FG | CSE | B.Tech
Ram | 22 | KL | EEE | B.Tech
我用来创建CSV文件的代码:
import pandas as pd
Name=['sai', 'Kiran', 'Anil', 'Ramj']
Age=[21, 22, 21, 22]
college=['FG', 'FG', 'FG', 'KL']
branch=['CSE', 'EEE', 'CSE', 'EEE']
Qualification=['B.Tech', 'M.Tech', 'B.Tech', 'B.Tech']
dict = {'NAME': Name, 'AGE': Age, 'COLLEGE': college, 'BRANCH': branch,
'Qualification': Qualification }
df = pd.DataFrame(dict)
df.to_csv('TESTINGFILE.csv',index=False)
需要执行以下步骤:
步骤1:
根据条件,我需要创建一个重复行。
条件:College = FG,BRANCH = CSE
如果满足条件,则应创建一个重复的行,其分支名称为ECE。
NAME | AGE | COLLEGE | BRANCH | Qualification
-------------------------------------------------------
sai | 21 | FG | CSE | B.Tech
sai | 21 | FG | ECE | B.Tech
Kiran | 22 | FG | EEE | M.Tech
Anil | 21 | FG | CSE | B.Tech
Anil | 21 | FG | ECE | B.Tech
Ram | 22 | KL | EEE | B.Tech
步骤2:
现在具有相同条件( COLLEGE = FG和BRANCH = CSE ),如果满足,则将分支从CSE更改为IT。
最终预期输出:
NAME | AGE | COLLEGE | BRANCH | Qualification
-------------------------------------------------------
sai | 21 | FG | IT | B.Tech
sai | 21 | FG | ECE | B.Tech
Kiran | 22 | FG | EEE | M.Tech
Anil | 21 | FG | IT | B.Tech
Anil | 21 | FG | ECE | B.Tech
Ram | 22 | KL | EEE | B.Tech
有人可以通过使用熊猫编写代码来帮助我吗?
感谢您的帮助!
答案 0 :(得分:1)
首先按条件创建掩码,用mask
替换值,用concat
重复行,并用DataFrame.assign
分配值,最后DataFrame.sort_index
:
mask = (df.COLLEGE == 'FG') & (df.BRANCH == 'CSE')
df.loc[mask, 'BRANCH'] = 'IT'
df = pd.concat([df, df[mask].assign(BRANCH='ECE')]).sort_index().reset_index(drop=True)
print (df)
NAME AGE COLLEGE BRANCH Qualification
0 sai 21 FG IT B.Tech
1 sai 21 FG ECE B.Tech
2 Kiran 22 FG EEE M.Tech
3 Anil 21 FG IT B.Tech
4 Anil 21 FG ECE B.Tech
5 Ramj 22 KL EEE B.Tech
答案 1 :(得分:1)
您可以执行以下操作:
1.首先通过过滤创建子集
2.将值更改为ECE
3.将数据框连接在一起
4.使用np.where
有条件地将值更改为IT
df_dup = df[(df.COLLEGE== 'FG') & (df.BRANCH == 'CSE')]
df_dup['BRANCH'] = 'ECE'
df = pd.concat([df, df_dup])
df['BRANCH'] = np.where((df.COLLEGE== 'FG') & (df.BRANCH == 'ECE'), 'IT', df.BRANCH)
df = df.sort_index().reset_index(drop=True)
print(df)
NAME AGE COLLEGE BRANCH Qualification
0 sai 21 FG CSE B.Tech
1 sai 21 FG IT B.Tech
2 Kiran 22 FG EEE M.Tech
3 Anil 21 FG CSE B.Tech
4 Anil 21 FG IT B.Tech
5 Ramj 22 KL EEE B.Tech