所以我有两个相当大的excel文件,我已经将它们转换成两个数据帧(本周的df和前一周的df2)。在两个数据框中共有128行相同,因此我使用了创建一个新变量:
onlyWon = df.loc[df['Sales stage'] == "Won"]
此后,我尝试创建一个仅包含df2中与onlyWon数据框中的Sales号匹配的值的新数据框。例如,如果我只用一项完成此操作,则代码将是:
df2.loc[df2['Sales No'] == "B3M-RB-03"])
这适用于一列,但是例如当我尝试遍历onlyWon数据框并将数据附加到新的数据框时,就会遇到错误。
我希望它如何工作的示例:
DF2:
+------------------+----------+-------------+-----------+
| Customer | Sales No | Sales Stage | Deal Size |
+------------------+----------+-------------+-----------+
| Stackoverflow | A1 | Identified | 100 |
| Guido van Rossum | B2 | Lost | 1000 |
+------------------+----------+-------------+-----------+
OnlyWon:
+---------------+----------+-------------+-----------+
| Customer | Sales No | Sales Stage | Deal Size |
+---------------+----------+-------------+-----------+
| Stackoverflow | A1 | WON | 100 |
+---------------+----------+-------------+-----------+
新数据框:
+---------------+----------+-------------+-----------+
| Customer | Sales No | Sales Stage | Deal Size |
+---------------+----------+-------------+-----------+
| Stackoverflow | A1 | Identified | 100 |
+---------------+----------+-------------+-----------+
我尝试做的事情
声明一个新的空数据帧(df3),其中包含所有相同的标头,但为空。
从所有“销售编号”中创建列表:
onlyWonSales = []
for salesNo in onlyWon['Sales No']:
onlyWonSales.append(salesNo)
然后遍历列表并追加到新的数据框:
for item in onlyWonSales:
df3 = df3.append(df2.loc[df2['Sales No'] == item)
这会添加很多重复项,并且不起作用(即使它不会产生任何错误(onlyWonSales列表大约为1000,而df3大约为4000))。
答案 0 :(得分:1)
赞:
In [150]: new = pd.merge(df2, onlywon, on=['Sales No'], suffixes=('', '_y'))
In [153]: new.drop(list(new.filter(regex='_y$')), axis=1, inplace=True)
In [154]: new
Out[154]:
Customer Sales No Sales Stage Deal Size
0 Stackoverflow A1 Identified 100
答案 1 :(得分:0)
离开const functionIWant = require('my-package/src/generators/generate-stuff');
然后做onlyWon
query