Question

我有一个很大的数据框（在7列上大约有1500万行），我想替换一些形状不正确的值。

我试图遍历整个数据帧，但是花一整天的时间来改变这些值太长了。我也尝试过使用正则表达式，但是如果字符串与大写的正则表达式不匹配，我将找不到替换方法。

我的数据框列如下所示：
1 : L8_P1_Local 2 : L8 3 : L8_P1_Local 4 : L8 5 : poste2 6 : poste6 7 : poste2 8 : Poste 2 9 : poste_6

编辑：有时poste2和poste6不同，例如Poste 2或poste_2或Poste_2 这个正则表达式会匹配所有内容吗？ [pP] oste [\ s] * [_] * [0-9]

我想做的就像是将L8放在poste2或poste6前面的每一行上，所以就像L8_poste6。我确实在名为numline的变量中有'L8'字符串。

编辑：因为答案在被接受的答案注释中，所以我将其放在此处。

text = numligne +'_\\1' dataframe['row'] = dataframe['row'].str.replace('([pP]oste[ _]*[0-9])', text)

Answer 1

使用pd.Series.str.replace：

s = pd.Series(["1 : L8_P1_Local",
"2 : L8",
"3 : L8_P1_Local",
"4 : L8",
"5 : poste2",
"6 : poste6",
"7 : poste2",])
s.str.replace(' (poste[26])', 'L8_\\1')

输出：

0    1 : L8_P1_Local
1             2 : L8
2    3 : L8_P1_Local
3             4 : L8
4      5 : L8_poste2
5      6 : L8_poste6
6      7 : L8_poste2

有多种方法可以对整个数据框实施此操作，包括（但可能不是最快的）：

for c in df:
    df[c] = df[c].str.replace(' (poste[26])', 'L8_\\1')

Answer 2

我想，对您而言，案文的大小写无关紧要。请检查以下解决方案。

s = pd.DataFrame({'ID':[1,2,3,4,5,6,7,8,9],
                     'Text':['L8_P1_Local','L8','L8_P1_Local','L8','poste2','poste6','poste2','Poste 2','poste_6']})


    def match_it(s):
        s['Text']=s['Text'].str.lower()
        s['Text']=s['Text'].str.replace(' ','')
        for i in range(len(s)):
            if 'poste' in s.loc[i,'Text']:
                s.loc[i,'Text']='l8'+'_'+s.loc[i,'Text']
        return s    

    match_it(s)
#Output



     ID  Text
    0   1   l8_p1_local
    1   2   l8
    2   3   l8_p1_local
    3   4   l8
    4   5   l8_poste2
    5   6   l8_poste6
    6   7   l8_poste2
    7   8   l8_poste2
    8   9   l8_poste_6

Answer 3

如果要添加L8是否不存在，您可以要求熊猫这样做：

因此，我假设您有一个DataFrame（例如df），其中包含一个包含示例数据的列（例如col）：

           col
0  L8_P1_Local
1           L8
2  L8_P1_Local
3           L8
4       poste2
5       poste6
6       poste2

您可以这样做：

df.loc[~df.col.str.match('L8.*'),'col'] = df.loc[
    ~df.col.str.match('L8.*'), 'col'].str.replace('', 'L8_', 1)

获得：

           col
0  L8_P1_Local
1           L8
2  L8_P1_Local
3           L8
4    L8_poste2
5    L8_poste6
6    L8_poste2

如果值不包含字符串，请用正则表达式替换Pandas Dataframe中的值

3 个答案: