我有一个这样的数据框
import pandas as pd
df_test = pd.DataFrame({"ID": [912665, 455378, 938724, 557830
],
"Company Name": ["112 ","112 ","SSS","SSS"
],
"Date": ['2018-09-02 00:00:00','2019-02-27 00:00:00','2019-05-05 00:00:00','2018-03-21 00:00:00'
],
"Type": ['Type1','Type2','Type1','Type2'
],
"ngroup": [0, 0,1,1]}
)
df_test
我需要在每个'ngroup'0,1 ...内按日期进行比较(如果需要,还可以对其他任何列进行比较)。
在此示例中,我将第0组和第1组称为ngroup。在每个组中,每个组只有两行。 公司类型称为类型,如类型1和类型2 我需要检查类型1的日期是否大于类型2的日期。如果是,那么我要说例如“类型1首先加入”,如果不是,则类型2首先加入。
在此之后,我还想将其添加到我的初始数据框中,作为新的列状态。
UPD: 所以我的预期结果是喜欢这个
df_test_expected_result = pd.DataFrame({"ID": [912665, 455378, 938724, 557830
],
"Company Name": ["112 ","112 ","SSS","SSS"
],
"Date": ['2018-09-02 00:00:00','2019-02-27 00:00:00','2019-05-05 00:00:00','2018-03-21 00:00:00'
],
"Type": ['Type1','Type2','Type1','Type2'
],
"ngroup": [0, 0,1,1],
"expected_result": ["Type 1 joined first","Type 1 joined first","Type 2 joined first","Type 2 joined first" ]
}
)
df_test_expected_result
达到此结果的最佳方法是什么?
答案 0 :(得分:2)
IIUC,我们需要一个比较布尔值来对每个组进行测试。
#ifndef Bar_hpp
#define Bar_hpp
#include <stdio.h>
#include "Foo.hpp"
class Bar
{
private:
int y;
public:
friend void Foo::addY(Bar&);//use of undeclared identifier 'Foo'
};
#endif /* Bar_hpp */
编辑,只看到您的预期输出,我们可以应用您的第一个条件,然后按组进行转发和回填。
bool_comp = df_test.groupby(['ngroup'])['Date'].transform('min')
df_test["res"] = np.where(
df_test["Date"] <= bool_comp,
df_test["Type"] + " Joined First",
df_test["Type"] + " Joined Later",
)
print(df_test)
ID Company Name Date Type ngroup res
0 912665 112 2018-09-02 Type1 0 Type1 Joined First
1 455378 112 2019-02-27 Type2 0 Type2 Joined Later
2 938724 SSS 2019-05-05 Type1 1 Type1 Joined Later
3 557830 SSS 2018-03-21 Type2 1 Type2 Joined First