使用正则表达式更改数字熊猫

时间:2019-08-18 21:56:11

标签: python regex pandas text replace

背景

我有以下df

import pandas as pd
df = pd.DataFrame({'Text' : ['But the here is \nBase ID: 666666    \nDate is Here 123456 ', 
                                   '999998 For \nBase ID: 123456    \nDate  there', 
                                   'So so \nBase ID: 939393    \nDate hey the 123455 ',],
                      'ID': [1,2,3],
                       'P_ID': ['A','B','C'],

                     })

输出

    ID  P_ID    Text
0   1   A   But the here is \nBase ID: 666666 \nDate is Here 123456
1   2   B   999998 For \nBase ID: 123456 \nDate there
2   3   C   So so \nBase ID: 939393 \nDate hey the 123455

尝试

我已经尝试了以下方法来**BLOCK**\nBase ID:之间的6位数字

\nDate

然后我得到了

df['New_Text'] = df['Text'].str.replace('ID:(.+?)','ID:**BLOCK**')

但是我没有得到想要的东西

所需的输出

  ID P_ID Text New_Text
0               But the here is \nBase ID:**BLOCK**666666 \nDate is Here 123456
1               999998 For \nBase ID:**BLOCK**123456 \nDate there
2               So so \nBase ID:**BLOCK**939393 \nDate hey the 123455

问题

如何调整代码的 ID P_ID Text New_Text 0 But the here is \nBase ID:**BLOCK** \nDate is Here 123456 1 999998 For \nBase ID:**BLOCK** \nDate there 2 So so \nBase ID:**BLOCK** \nDate hey the 123455 部分以获得所需的输出?

3 个答案:

答案 0 :(得分:1)

int NumToDelete;
printf("How much employees do you want to remove?\n");
scanf(" %d", &NumToDelete);
fgetc(stdin);
char Name[NumToDelete][25];
for(int i = 0; i < NumToDelete; i++)
{
    printf("Name: ");
    fgets(Name[i], 25, stdin);
    char BarLoc, NameFinder[25];
    int line = 0;
    FILE *fremove = fopen("Employees.txt", "r");
    do
    {

        if((line % 5) == 0)
        {
            fseek(fremove, 6, SEEK_CUR);
            fgets(NameFinder, 25, fremove);
            if(NameFinder == Name[i])
            {                             //This is not the official code.
                printf("%s", NameFinder); //Just to check if it is working or not.
            }                             //Here it is suppose to be the deleting code. 
        }
        BarLoc = getc(fremove);
        if(BarLoc == '\n')
        {
            line++;
        }
    }while(BarLoc != EOF);

有关使用的正则表达式模式的详细细分,请参见here

答案 1 :(得分:1)

尝试await message_list[message].delete()

regexp匹配最短的字符串,在您的情况下为''

答案 2 :(得分:1)

您可以尝试使用下面的代码来获得所需的输出,

df['New_Text'] = df['Text'].str.replace('ID:\s+[0-9]+','ID:**BLOCK**')

输出:

0    But the here is \nCase ID:**BLOCK**    \nDate is Here 123456 
1    999998 For \nCase ID:**BLOCK**    \nDate  there              
2    So so \nCase ID:**BLOCK**    \nDate hey the 123455           

正则表达式细目:

'\s+' - to indicate space(s)

'[0-9]+' - to specify a number