从python pandas中的dataframe列搜索匹配的字符串模式

时间:2016-04-20 10:15:19

标签: python regex string pandas

我有一个如下所示的数据框

public class DatabaseHandler extends SQLiteOpenHelper {


    private static final int DATABASE_VERSION = 1;

    private static final String DATABASE_NAME = "TemperatureManager";

    private static final String TABLE_TEMPERATURE_READING = "Temperature";


    private static final String TEMPERATURE_READING = "name";


    public DatabaseHandler(Context context) {
        super(context, DATABASE_NAME, null, DATABASE_VERSION);
    }

    @Override
    public void onCreate(SQLiteDatabase db) {
        String CREATE_CONTACTS_TABLE = "create table " + TABLE_TEMPERATURE_READING + "("
                 + TEMPERATURE_READING + " FLOAT"+")";
        db.execSQL(CREATE_CONTACTS_TABLE);
    }

    // Upgrading database
    @Override
    public void onUpgrade(SQLiteDatabase db, int oldVersion, int newVersion) {
        // Drop older table if existed
        db.execSQL("DROP TABLE IF EXISTS " + TABLE_TEMPERATURE_READING);

        // Create tables again
        onCreate(db);
    }


    /**
     * All CRUD(Create, Read, Update, Delete) Operations
     */

    // Adding new contact
    void addContact(Contact contact) {
        SQLiteDatabase db = this.getWritableDatabase();

        ContentValues values = new ContentValues();
        values.put(TEMPERATURE_READING, contact.getName()); 


        // Inserting Row
        db.insert(TABLE_TEMPERATURE_READING, null, values);
        db.close(); // Closing database connection
    }

    // Getting single contact
    Contact getContact(int id) {
        SQLiteDatabase db = this.getReadableDatabase();

        Cursor cursor = db.query(TABLE_TEMPERATURE_READING, new String[] {
                        TEMPERATURE_READING},null, null, null, null, null);
        if (cursor != null)
            cursor.moveToFirst();

        Contact contact = new Contact(cursor.getFloat(0));
        // return contact
        return contact;
    }

    // Getting All Contactsc
    public ArrayList<Contact> getAllContacts() {
        ArrayList<Contact> contactList = new ArrayList<Contact>();
        // Select All Query
        String selectQuery = "SELECT  * FROM " + TABLE_TEMPERATURE_READING;

        SQLiteDatabase db = this.getWritableDatabase();
        Cursor cursor = db.rawQuery(selectQuery, null);

        // looping through all rows and adding to list
        if (cursor.moveToFirst()) {
            do {

                Contact contact = new Contact();
                //contact.setID(Integer.parseInt(cursor.getString(0)));
                contact.setName(cursor.getFloat(1));
                //contact.setPhoneNumber(cursor.getString(2));
                // Adding contact to list
                contactList.add(contact);
            } while (cursor.moveToNext());
        }

        // return contact list
        return contactList;
    }



    // Getting contacts Count
    public int getContactsCount() {
        String countQuery = "SELECT  * FROM " + TABLE_TEMPERATURE_READING;
        SQLiteDatabase db = this.getReadableDatabase();
        Cursor cursor = db.rawQuery(countQuery, null);

        //cursor.close();

        // return count
        return cursor.getCount();
    }
}

现在我想查询数据帧,以便我可以得到第1,5和第6行:e我想找到| IC |单独或与其他类型的任何组合。

到目前为止,我可以使用

进行精确搜索
 name         genre
 satya      |ACTION|DRAMA|IC|
 satya      |COMEDY|BIOPIC|SOCIAL|
 abc        |CLASSICAL|
 xyz        |ROMANCE|ACTION|DARMA|
 def        |DISCOVERY|SPORT|COMEDY|IC|
 ghj        |IC|

或字符串包含

搜索
df[df['genre'] == '|ACTION|DRAMA|IC|']  ######exact value yields row 1

但我不想要这两个。

 df[df['genre'].str.contains('IC')]  ####yields row 1,2,3,5,6
 # as BIOPIC has IC in that same for CLASSICAL also

所以我的要求是找到具有| IC |的类型(我的字符串搜索失败,因为python将&#39; |&#39; as或operator)

有人建议使用一些注册表或任何方法。感谢ADv。

2 个答案:

答案 0 :(得分:4)

我认为您可以将\添加到regex以进行转义,因为|没有\被解释为OR

  

'|'

     

A | B,其中A和B可以是任意RE,创建一个与A或B匹配的正则表达式。任意数量的RE可以用'|'分隔通过这种方式。这也可以在组内使用(见下文)。扫描目标字符串时,RE由“|”分隔从左到右尝试。当一个模式完全匹配时,接受该分支。这意味着一旦A匹配,B将不会被进一步测试,即使它会产生更长的整体匹配。换句话说,'|'操作员从不贪心。要匹配文字“|”,请使用\ |,或将其括在字符类中,如[|]。

print df['genre'].str.contains(u'\|IC\|')
0     True
1    False
2    False
3    False
4     True
5     True
Name: genre, dtype: bool

print df[df['genre'].str.contains(u'\|IC\|')]
    name                        genre
0  satya            |ACTION|DRAMA|IC|
4    def  |DISCOVERY|SPORT|COMEDY|IC|
5    ghj                         |IC|

答案 1 :(得分:0)

可能是这种结构:

    pd.DataFrame[DataFrame['columnName'].str.contains(re.compile('regex_pattern'))]