我正在尝试替换占位符“。”总收入列中包含NaN的字符串。这是用于创建df的代码。
raw_data = {'Rank': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Company': ['Microsoft', 'Oracle', "IBM", 'SAP', 'Symantec', 'EMC', 'VMware', 'HP', 'Salesforce.com', 'Intuit'],
'Company_HQ': ['USA', 'USA', 'USA', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
'Software_revenue': ['$62,014', '$29,881', '$29,286', '$18,777', '$6,138', '$5,844', '$5,520', '$5,082', '$4,820', '$4,324'],
'Total_revenue': ['93,456', '38,828', '92,793', '23,289', '6,615', ".", '6,035', '110,577', '5,274', '4,573'],
'Percent_revenue_total': ['66.36%', '76.96%', '31.56%', '80.63%', '92.79%', '23.91%', '91.47%', '4.60%', '91.40%', '94.55%']}
df = pd.DataFrame(raw_data, columns = ['Rank', 'Company', 'Company_HQ', 'Software_revenue', 'Total_revenue', 'Percent_revenue_total'])
df
我尝试使用:
import numpy as np
df['Total_revenue'] = df['Total_revenue'].replace('.', np.nan, regex=True)
df
但是,这会将整个列替换为Nan,而不仅仅是占位符'。值。
答案 0 :(得分:0)
您只需要修复regex=False
。因为当您将其设置为True
时,您假设传入的是正则表达式,因此将其设置为False
会将模式视为文字字符串(这是我认为您想要的): / p>
import pandas as pd
raw_data = {'Rank': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Company': ['Microsoft', 'Oracle', "IBM", 'SAP', 'Symantec', 'EMC', 'VMware', 'HP', 'Salesforce.com', 'Intuit'],
'Company_HQ': ['USA', 'USA', 'USA', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
'Software_revenue': ['$62,014', '$29,881', '$29,286', '$18,777', '$6,138', '$5,844', '$5,520', '$5,082', '$4,820', '$4,324'],
'Total_revenue': ['93,456', '38,828', '92,793', '23,289', '6,615', ".", '6,035', '110,577', '5,274', '4,573'],
'Percent_revenue_total': ['66.36%', '76.96%', '31.56%', '80.63%', '92.79%', '23.91%', '91.47%', '4.60%', '91.40%', '94.55%']}
df = pd.DataFrame(raw_data, columns = ['Rank', 'Company', 'Company_HQ', 'Software_revenue', 'Total_revenue', 'Percent_revenue_total'])
import numpy as np
df['Total_revenue'] = df['Total_revenue'].replace('.', np.nan, regex=False)
print(df)
输出:
Rank Company Company_HQ Software_revenue Total_revenue Percent_revenue_total
0 1 Microsoft USA $62,014 93,456 66.36%
1 2 Oracle USA $29,881 38,828 76.96%
2 3 IBM USA $29,286 92,793 31.56%
3 4 SAP Germany $18,777 23,289 80.63%
4 5 Symantec USA $6,138 6,615 92.79%
5 6 EMC USA $5,844 NaN 23.91%
6 7 VMware USA $5,520 6,035 91.47%
7 8 HP USA $5,082 110,577 4.60%
8 9 Salesforce.com USA $4,820 5,274 91.40%
9 10 Intuit USA $4,324 4,573 94.55%
答案 1 :(得分:0)
.item img {
transition: all 0.5s ease;
}
.item img:hover,
.item img:active {
transform: scale(1.25);
margin-bottom: 30px;
}
答案 2 :(得分:0)
.
是正则表达式中的特殊字符,表示任何字符。您需要对其进行转义以使正则表达式将其视为常规字符
df['Total_revenue'].replace('\.', np.nan, regex=True)
Out[52]:
0 93,456
1 38,828
2 92,793
3 23,289
4 6,615
5 NaN
6 6,035
7 110,577
8 5,274
9 4,573
Name: Total_revenue, dtype: object
在您的情况下,您应该使用mask
df['Total_revenue'].mask(df['Total_revenue'].eq('.'))
Out[58]:
0 93,456
1 38,828
2 92,793
3 23,289
4 6,615
5 NaN
6 6,035
7 110,577
8 5,274
9 4,573
Name: Total_revenue, dtype: object
答案 3 :(得分:0)
我认为,由于用户要更改“”,因此不需要“替换”。整个到南。 Inistead这也将起作用。查找带有“。”的行。并为它分配nan
df.loc[df['Total_revenue']==".", 'Total_revenue'] = np.nan
答案 4 :(得分:0)
您可以在下面尝试将您的要求应用于DataFrame
df.replace('.', np.nan)
或您要在特定列中使用df['Total_revenue']
而不是df
以下是输出:
Rank Company Company_HQ Software_revenue Total_revenue Percent_revenue_total
0 1 Microsoft USA $62,014 93,456 66.36%
1 2 Oracle USA $29,881 38,828 76.96%
2 3 IBM USA $29,286 92,793 31.56%
3 4 SAP Germany $18,777 23,289 80.63%
4 5 Symantec USA $6,138 6,615 92.79%
5 6 EMC USA $5,844 NaN 23.91%
6 7 VMware USA $5,520 6,035 91.47%
7 8 HP USA $5,082 110,577 4.60%
8 9 Salesforce.com USA $4,820 5,274 91.40%
9 10 Intuit USA $4,324 4,573 94.55%