Question

我有一个如下数据框：

column1    column2     column3    column4
6,546      543,254,32  (443,326)  (32,000)
4,554      432,885     (88,974)    77,332
n.a        -           5,332       -
...        ...        ...         ...

# this df stretches for over 500 rows, and all columns could potentially have 
# values within brackets, 'n.a', '-'

我遇到的问题是将( , )中的所有值替换为-443326 ，即删除括号和逗号

我知道我可以执行df.replace('n.a', numpy.nan, inplace=True)，如果匹配，这将相应地替换值。

但是，df.replace('(', numpy.nan, inplace=True)的相同功能不起作用。

我尝试过使用循环来解决我的问题：

for i in df.columns():
    df[i] = df[i].str.replace('(', '-')
    df[i] = df[i].str.replace(')', '')
    df[i] = df[i].str.replace(',', '')

这似乎有效，但它给了我一个警告信息：

SettingWithCopyWarning: 
    A value is trying to be set on a copy of a slice from a DataFrame.
    Try using .loc[row_indexer,col_indexer] = value instead

如何

Answer 1

这是一种略有不同的方法：

In [89]: df.replace(r'[^\d\.]+', '', regex=True).apply(pd.to_numeric, errors='coerce')
Out[89]:
   column1     column2  column3  column4
0   6546.0  54325432.0   443326  32000.0
1   4554.0    432885.0    88974  77332.0
2      NaN         NaN     5332      NaN

Answer 2

d = {
    ',': '',
    '\(([\d,]+)\)': r'-\1',
    'n.a': 'nan',
    '^-$': 'nan',
}

df.replace(d, regex=True).astype(float)

   column1     column2   column3  column4
0   6546.0  54325432.0 -443326.0 -32000.0
1   4554.0    432885.0  -88974.0  77332.0
2      NaN         NaN    5332.0      NaN

如果您只想解决(stuff)问题

d = {
    '\(([\d,]+)\)': r'-\1',
}

df.replace(d, regex=True)

  column1     column2   column3  column4
0   6,546  543,254,32  -443,326  -32,000
1   4,554     432,885   -88,974   77,332
2     n.a           -     5,332        -

Answer 3

链接替换。

df['column_name'] = (df.loc[:, 'column_name'].replace('[)]', '', regex=True)
                            .replace('[(]', '-', regex=True).astype(float))

删除括号的python pandas，以及df值

3 个答案: