用Nan代替

时间:2019-09-16 01:32:15

标签: pandas

我正在尝试替换占位符“。”总收入列中包含NaN的字符串。这是用于创建df的代码。

raw_data = {'Rank': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 
        'Company': ['Microsoft', 'Oracle', "IBM", 'SAP', 'Symantec', 'EMC', 'VMware', 'HP', 'Salesforce.com', 'Intuit'],
        'Company_HQ': ['USA', 'USA', 'USA', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA'], 
        'Software_revenue': ['$62,014', '$29,881', '$29,286', '$18,777', '$6,138', '$5,844', '$5,520', '$5,082', '$4,820', '$4,324'], 
        'Total_revenue': ['93,456', '38,828', '92,793', '23,289', '6,615', ".", '6,035', '110,577', '5,274', '4,573'],
        'Percent_revenue_total': ['66.36%', '76.96%', '31.56%', '80.63%', '92.79%', '23.91%', '91.47%', '4.60%', '91.40%', '94.55%']}
df = pd.DataFrame(raw_data, columns = ['Rank', 'Company', 'Company_HQ', 'Software_revenue', 'Total_revenue', 'Percent_revenue_total'])
df

我尝试使用:

import numpy as np

df['Total_revenue'] = df['Total_revenue'].replace('.', np.nan, regex=True)
df

但是,这会将整个列替换为Nan,而不仅仅是占位符'。值。

5 个答案:

答案 0 :(得分:0)

您只需要修复regex=False。因为当您将其设置为True时,您假设传入的是正则表达式,因此将其设置为False会将模式视为文字字符串(这是我认为您想要的): / p>

import pandas as pd
raw_data = {'Rank': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 
        'Company': ['Microsoft', 'Oracle', "IBM", 'SAP', 'Symantec', 'EMC', 'VMware', 'HP', 'Salesforce.com', 'Intuit'],
        'Company_HQ': ['USA', 'USA', 'USA', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA'], 
        'Software_revenue': ['$62,014', '$29,881', '$29,286', '$18,777', '$6,138', '$5,844', '$5,520', '$5,082', '$4,820', '$4,324'], 
        'Total_revenue': ['93,456', '38,828', '92,793', '23,289', '6,615', ".", '6,035', '110,577', '5,274', '4,573'],
        'Percent_revenue_total': ['66.36%', '76.96%', '31.56%', '80.63%', '92.79%', '23.91%', '91.47%', '4.60%', '91.40%', '94.55%']}
df = pd.DataFrame(raw_data, columns = ['Rank', 'Company', 'Company_HQ', 'Software_revenue', 'Total_revenue', 'Percent_revenue_total'])

import numpy as np

df['Total_revenue'] = df['Total_revenue'].replace('.', np.nan, regex=False)
print(df)

输出:

  Rank         Company Company_HQ Software_revenue Total_revenue Percent_revenue_total
0     1       Microsoft        USA          $62,014        93,456                66.36%
1     2          Oracle        USA          $29,881        38,828                76.96%
2     3             IBM        USA          $29,286        92,793                31.56%
3     4             SAP    Germany          $18,777        23,289                80.63%
4     5        Symantec        USA           $6,138         6,615                92.79%
5     6             EMC        USA           $5,844           NaN                23.91%
6     7          VMware        USA           $5,520         6,035                91.47%
7     8              HP        USA           $5,082       110,577                 4.60%
8     9  Salesforce.com        USA           $4,820         5,274                91.40%
9    10          Intuit        USA           $4,324         4,573                94.55%

答案 1 :(得分:0)

我在这里又走了一步,将列类型更改为数字,因此您也可以将if用于计算。

.item img {
  transition: all 0.5s ease;
}

.item img:hover,
.item img:active {
  transform: scale(1.25);
  margin-bottom: 30px;
}

答案 2 :(得分:0)

.是正则表达式中的特殊字符,表示任何字符。您需要对其进行转义以使正则表达式将其视为常规字符

df['Total_revenue'].replace('\.', np.nan, regex=True)

Out[52]:
0     93,456
1     38,828
2     92,793
3     23,289
4      6,615
5        NaN
6      6,035
7    110,577
8      5,274
9      4,573
Name: Total_revenue, dtype: object

在您的情况下,您应该使用mask

df['Total_revenue'].mask(df['Total_revenue'].eq('.'))

Out[58]:
0     93,456
1     38,828
2     92,793
3     23,289
4      6,615
5        NaN
6      6,035
7    110,577
8      5,274
9      4,573
Name: Total_revenue, dtype: object

答案 3 :(得分:0)

我认为,由于用户要更改“”,因此不需要“替换”。整个到南。 Inistead这也将起作用。查找带有“。”的行。并为它分配nan

df.loc[df['Total_revenue']==".", 'Total_revenue'] = np.nan

答案 4 :(得分:0)

您可以在下面尝试将您的要求应用于DataFrame

df.replace('.', np.nan)

或您要在特定列中使用df['Total_revenue']而不是df

以下是输出:

     Rank       Company Company_HQ Software_revenue Total_revenue Percent_revenue_total
0     1       Microsoft        USA          $62,014        93,456                66.36%
1     2          Oracle        USA          $29,881        38,828                76.96%
2     3             IBM        USA          $29,286        92,793                31.56%
3     4             SAP    Germany          $18,777        23,289                80.63%
4     5        Symantec        USA           $6,138         6,615                92.79%
5     6             EMC        USA           $5,844           NaN                23.91%
6     7          VMware        USA           $5,520         6,035                91.47%
7     8              HP        USA           $5,082       110,577                 4.60%
8     9  Salesforce.com        USA           $4,820         5,274                91.40%
9    10          Intuit        USA           $4,324         4,573                94.55%