如何使用组的最小值填充同一列中的NaN值 - 请参阅下面的df和df2。对于'A'列中的类别'2',我希望有min(20,15)...请帮助:)
public void readEmails() throws Exception{
// mail server connection parameters
String host = "host";
String user = "username";
String pwd = "pwd";
// connect to my pop3 inbox
Properties properties = System.getProperties();
Session session = Session.getDefaultInstance(properties);
Store store = session.getStore("pop3");
store.connect(host, user, pwd);
Folder inbox = store.getFolder("INBOX");
inbox.open(Folder.READ_ONLY);
// get the list of inbox messages
Message[] messages = inbox.getMessages();
if (messages.length == 0) System.out.println("No messages found.");
for (int i = 0; i < messages.length; i++) {
// stop after listing ten messages
if (i > 10) {
System.exit(0);
inbox.close(true);
store.close();
}
final MimeMessageParser mimeMessageParser = new MimeMessageParser((MimeMessage) messages[i]);
mimeMessageParser.parse();
if (mimeMessageParser.hasAttachments()) {
List<DataSource> attachmentList = mimeMessageParser.getAttachmentList();
System.out.println("Number of attachments: " +attachmentList.size());
for (DataSource attachment:attachmentList
) {
System.out.println("Name: "+attachment.getName()+" Content Type: "+attachment.getContentType());
if (attachment.getContentType().equals("message/rfc822")) {
final MimeMessage message = new MimeMessage(null,attachment.getInputStream());
System.out.println("Subject of the attached failure Mail:" + message.getSubject());
}
}
}
System.out.println("Message " + (i + 1));
System.out.println("From : " + messages[i].getFrom()[0]);
System.out.println("Subject : " + messages[i].getSubject());
System.out.println("Sent Date : " + messages[i].getSentDate());
System.out.println();
}
inbox.close(true);
store.close();
}
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": [1,1,2,2,2,3,3,3,3,4,4],
"B": [ np.nan , 10, np.nan, 20, 15, np.nan,np.nan,np.nan,np.nan,40, np.nan]})
In[1]: df
Out[1]:
A B
0 1 NaN
1 1 10.0
2 2 NaN
3 2 20.0
4 2 15.0
5 3 NaN
6 3 NaN
7 3 NaN
8 3 NaN
9 4 40.0
10 4 NaN
答案 0 :(得分:3)
如果要按每个组min
替换所有值,请使用GroupBy.transform
:
df['B'] = df.groupby('A')['B'].transform('min')
print (df)
A B
0 1 10.0
1 1 10.0
2 2 15.0
3 2 15.0
4 2 15.0
5 3 NaN
6 3 NaN
7 3 NaN
8 3 NaN
9 4 40.0
10 4 40.0
如果只想将NaN
替换为min
添加fillna
或使用自定义lambda函数:
df['B'] = df.B.fillna(df.groupby('A')['B'].transform('min'))
替代:
df['B'] = df.groupby('A')['B'].transform(lambda x: x.fillna(x.min()))
print (df)
A B
0 1 10.0
1 1 10.0
2 2 15.0
3 2 20.0
4 2 15.0
5 3 NaN
6 3 NaN
7 3 NaN
8 3 NaN
9 4 40.0
10 4 40.0
答案 1 :(得分:2)
作为一项实验,我想知道我是否可以用Numpy做到这一点。这并不完美,因为它没有处理负值,或者就此而言是零。我可以改变它这样做,但是,这是原型。
b = df.B.values
a = df.A.values
a_, u_ = pd.factorize(a)
_a = a_.max() - a_
maxb = np.nanmax(b)
basis_inc = a_ * maxb
basis_dec = _a * maxb
bnan = np.isnan(b)
bfill_zero = np.where(bnan, maxb + 1, b)
ffill_min = np.minimum.accumulate(bfill_zero + basis_dec) - basis_dec
bfill_min = np.minimum.accumulate((bfill_zero + basis_inc)[::-1])[::-1] - basis_inc
gmin = np.minimum(ffill_min, bfill_min)
df.assign(B=np.where(bnan & (gmin != maxb + 1), gmin, b))
A B
0 1 10.0
1 1 10.0
2 2 15.0
3 2 20.0
4 2 15.0
5 3 NaN
6 3 NaN
7 3 NaN
8 3 NaN
9 4 40.0
10 4 40.0