Question

在删除数字和括号以及Python中的内容时遇到问题。建议使用str.replace。然而，这里的挑战是数字不是确定的数字。我只知道我需要删除任何数字，但我不确定它是什么。对于括号，我只知道我需要remove（）以及里面的内容。但是，里面的内容也各不相同。例如，如果我有以下数据集：

    import pandas as pd
    a = pd.Series({'Country':'China 1', 'Capital': 'Bei Jing'})
    b = pd.Series({'Country': 'United States (of American)', 'Capital': 'Washington'})
    c = pd.Series({'Country': 'United Kingdom (of Great Britain and Northern Ireland)', 'Capital': 'London'})
    d = pd.Series({'Country': 'France 2', 'Capital': 'Paris'})
    e = pd.DataFrame([a,b,c,d])

现在在“国家/地区”栏目中，值为“中国1＆＃39;”，“美国”（美国）＆＃39;英国（＆＃39;的...）＆＃39;和＆＃39;法国2＆＃39;。更换/删除后，我想删除所有数字和括号以及内容，并希望列国家/地区的值为“中国”，“美国＆＃39;”，＃ 39;英国＆＃39;，＆＃39;法国＆＃39;。

Answer 1

您可以在str.replace使用regex。

series1.str.replace("^([a-zA-Z]+(?:\s+[a-zA-Z]+)*).*", r"\1")

请参阅demo.You可以替换为您自己的系列和其他修改。

https://regex101.com/r/lIScpi/2

您也可以直接修改框架。

a = pd.Series({'Country': 'China 1', 'Capital': 'Bei Jing'})
b = pd.Series({'Country': 'United States (of American)', 'Capital': 'Washington'})
c = pd.Series({'Country': 'United Kingdom (of Great Britain and Northern Ireland)', 'Capital': 'London'})
d = pd.Series({'Country': 'France 2', 'Capital': 'Paris'})
e = pd.DataFrame([a, b, c, d])
print e
e['Country'] = e['Country'].str.replace("^([a-zA-Z]+(?:\s+[a-zA-Z]+)*).*", r"\1")
print e

replace之前的输出。

  Capital                                            Country
0    Bei Jing                                            China 1
1  Washington                        United States (of American)
2      London  United Kingdom (of Great Britain and Northern ...
3       Paris                                           France 2

replace

之后的输出

  Capital         Country
0    Bei Jing           China
1  Washington   United States
2      London  United Kingdom
3       Paris          France

删除字符串后的数字和（）以及内部的内容

1 个答案: