Question

我需要有关正则表达式和正则表达式函数的帮助！！！我有一个以';'分隔的CSV文件并且需要用-替换-。数据如下：

79             80;0;RueSaint_Hilaire;Locale;15-25;1;1             
80              81;0;RueSaint_Hilaire;Locale;5-10;5;5             
81                   82;0;RueTaillon;Locale;10-15;1;1             
82                   83;0;RueTanguay;Locale;10-15;2;2             
83                   84;0;RueTanguay;Locale;15-25;2;2             
84                    85;0;RueTanguay;Locale;5-10;3;3

例如，我需要将15-25替换为15_25。

到目前为止，我已经尝试过：

df.replace('-','_', inplace=True)

或者这个：

df_obj = df.select_dtypes(['object'])
df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
df.replace('-','_', inplace=True)
print(df)

没有成功。任何正则表达式或替换向导都可以为这个小问题带来一些启发？

非常感谢您！

Answer 1

默认情况下为regex=False。因此，使用您现有的代码，将正则表达式替换为regex=True和inplace=True。请参阅replace

df.replace('-', '_',regex=True, inplace=True)
print(df)

Answer 2

这是我能想到的最简单的实现：

with open(<PATH TO FILE>, 'r') as fileIn:
    data = fileIn.read()
    print("\nOriginal data: \n", data)
    data = data.replace('-', '_')
    print("Modified data: \n", data)

这给出了：

Original data:
80,0,RueSaint-Hilaire,Locale,15-25,1,1
81,0,RueSaint-Hilaire,Locale,10-May,5,5

Modified data:
80,0,RueSaint_Hilaire,Locale,15_25,1,1
81,0,RueSaint_Hilaire,Locale,10_May,5,5

Answer 3

通常，我会去

df['Col'] = df['Col'].str.replace('-', '_')

Answer 4

以下是有关熊猫的常见问题：https://stackoverflow.com/tags/pandas/info

将lambda应用于数据框，如下所示：

df['foo'] = df['foo'].apply(lambda x: x.replace('_', '-'))

Answer 5

如果您需要专门更改数字之间的-，请执行以下操作：

import re

regex = r"(\d+)-(\d+)"

test_str = ("79             80;0;RueSaint_Hilaire;Locale;15-25;1;1         \n"
    "80              81;0;RueSaint_Hilaire;Locale;5-10;5;5         \n"
    "81                   82;0;RueTaillon;Locale;10-15;1;1         \n"
    "82                   83;0;RueTanguay;Locale;10-15;2;2         \n"
    "83                   84;0;RueTanguay;Locale;15-25;2;2         \n"
    "84                    85;0;RueTanguay;Locale;5-10;3;3  ")

subst = "$1_$2"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

您可以在此处检查正则表达式：https://regex101.com/r/DGrm7V/1

熊猫DF：如何用分隔符（； CSV格式）将“-”替换为“ _”

5 个答案: