使用熊猫读取制表符分隔的字段,某些行具有多个制表符

时间:2019-01-09 18:42:55

标签: pandas

我正在尝试使用Pandas阅读制表符分隔的txt文件。该文件如下所示:

data file sample

14.38   14.21   0.8951  5.386   3.312   2.462   4.956   1<p>
14.69   14.49   0.8799  5.563   3.259   3.586   5.219   1<p>
14.11   14.12   0.8911  5.422   3.302   2.723  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;   &nbsp;&nbsp;&nbsp;       5        1<p>

某些行具有额外的标签。如果我使用read_csv或read_fwf,并指定sep ='\ t'。我得到的结果如下所示:

d

0   15.26\t14.84\t0.871\t5.763\t3.312\t2.221\t5.22\t1<p>
1   14.88\t14.57\t0.8811\t5.554\t3.333\t1.018\t4.9 <p>

对于我可以指定哪些参数来解决此问题,您有任何建议吗?谢谢。

解决方案:

使用pd.read_csv(filename,delim_whitespace = True)

2 个答案:

答案 0 :(得分:0)

如果我使用此代码:

public class ReceiveBroadcast extends BroadcastReceiver {
    private static final String TAG = "MyBroadcastReceiver";
    String body, number;

    @Override
    public void onReceive(Context context, Intent intent) {
        Bundle bundle = intent.getExtras();
        if(bundle != null) {
            Object[] obj=(Object[])bundle.get("pdus");
            if(obj!=null){
                for(int i=0; i<obj.length; i++){
                    SmsMessage smsMessage = SmsMessage.createFromPdu((byte[])obj[i]);
                    body = smsMessage.getMessageBody().toString();
                    number = smsMessage.getOriginatingAddress().toString();
                }

                databasePhone.orderByKey().limitToLast(1).addListenerForSingleValueEvent(new ValueEventListener() {
                    @Override
                    public void onDataChange(DataSnapshot dataSnapshot) {
                        for (DataSnapshot readphone : dataSnapshot.getChildren()) {
                            Log.v("tmz", "" + readphone.getKey()); //displays the key for the node
                            String lastphoneNumber = readphone.child("phoneNumber").getValue().toString();
                            String lastIMSINumber = readphone.child("code").getValue().toString();
                            //String lastIMSINumber= "278010401571570";
                            if(lastIMSINumber.equals(imsi)){
                                sendSMSBroadcast();
                                signoutButton.setEnabled(true);
                                statusText.setText("Signed in ");
                                SmsManager sms = SmsManager.getDefault();
                                sms.sendTextMessage(number, null, "Verified " , null, null);
                                Toast.makeText(MainActivity.this, "Phone Number Retrieved "+ lastphoneNumber + " IMSI: " + lastIMSINumber, Toast.LENGTH_LONG).show();
                            } else {
                                Toast.makeText(MainActivity.this, "Code not Verified. Incorrect IMSI. ", Toast.LENGTH_LONG).show();
                            }
                        }
                    }
                    @Override
                    public void onCancelled(DatabaseError databaseError) {}
                });
            }
        }
    }
}

在此文件上:

import pandas as pd
parsed_csv_txt = pd.read_csv("tabbed.txt",sep="\t")
print(parsed_csv_txt)

我得到:

a   b   c   d   e
14.69   2452    982 234 12
14.11   5435    234     12
16.63   1       12  66

我们在这里看到的输出是否有问题?

如果您想要不同的输出,例如:

       a     b      c      d   e
0  14.69  2452  982.0  234.0  12
1  14.11  5435  234.0    NaN  12
2  16.63     1    NaN   12.0  66

使用此代码:

       a     b    c    d     e
0  14.69  2452  982  234  12.0
1  14.11  5435  234   12   NaN
2  16.63     1   12   66   NaN

注意

有关值之间的空白量可变的话题的更长时间讨论,请查看此讨论:Can pandas handle variable-length whitespace as column delimiters

答案 1 :(得分:0)

Pandas read_csv非常通用,可以将其与delim_whitespace = True一起使用以处理可变数量的空白。

df = pd.read_csv(filename, delim_whitespace=True)