我正尝试对表示应用程序已下载次数的一系列字符串进行重新分类,因为它不显示原始下载次数。我必须将20个字符串分成7个不同的项目,然后将它们放在名为“下载”的新列中。
我尝试编辑括号和括号。我以前错误地使用了np.apply。
from pandas import DataFrame
fear = pd.read_csv('googleplaystore.csv', encoding='latin')
n_ratings = {'Install':['0+', '1+', '5+', '10+', '50+', '100+', '500+', '1,000+', '5,000+',
'10,000+', '50,000+', '100,000+', '500,000+', '1,000,000+', '5,000,000+', '10,000,000+',
'50,000,000+', '100,000,000+', '500,000,000+', '1,000,000,000+']}
df = DataFrame(n_ratings, columns=['Install'])
df['downloads'] = df['Install'].apply(lambda x: '0-1k' if x.isin(['0+', '1+', '5+', '10+', '50+', '100+', '500+'])
df['downloads'] = df['Install'].apply(lambda x: '1k-100k' if x.isin(['1,000+', '5,000+', '10,000+', '50,000+']))
df['downloads'] = df['Install'].apply(lambda x: '100k-1M' if x.isin(['100,000+', '500,000+'])
df['downloads'] = df['Install'].apply(lambda x: '1M-10M' if x.isin(['1,000,000+', '5,000,000+'])
df['downloads'] = df['Install'].apply(lambda x: '10M-100M' if x.isin(['10,000,000+', '50,000,000+'])
df['downloads'] = df['Install'].apply(lambda x: '100M-1B' if x.isin(['100,000,000+', '500,000,000+'])
df['downloads'] = df['Install'].apply(lambda x: '> 1B' if x.isin(['1,000,000,000+'])
答案 0 :(得分:1)
您不需要apply
或if-else
。只需使用np.select
即可通过conditions
,并根据这些条件通过choices
:
conditions = (
df['Install'].isin(['0+', '1+', '5+', '10+', '50+', '100+', '500+']),
df['Install'].isin(['1,000+', '5,000+', '10,000+', '50,000+']),
df['Install'].isin(['100,000+', '500,000+']),
df['Install'].isin(['1,000,000+', '5,000,000+']),
df['Install'].isin(['10,000,000+', '50,000,000+']),
df['Install'].isin(['100,000,000+', '500,000,000+']),
df['Install'].isin(['1,000,000,000+'])
)
choices = ['0-1k', '1k-100k', '100k-1M', '1M-10M', '10M-100M', '100M-1B', '> 1B']
df['downloads'] = np.select(conditions, choices, default='unknown')
print(df)
Install downloads
0 0+ 0-1k
1 1+ 0-1k
2 5+ 0-1k
3 10+ 0-1k
4 50+ 0-1k
5 100+ 0-1k
6 500+ 0-1k
7 1,000+ 1k-100k
8 5,000+ 1k-100k
9 10,000+ 1k-100k
10 50,000+ 1k-100k
11 100,000+ 100k-1M
12 500,000+ 100k-1M
13 1,000,000+ 1M-10M
14 5,000,000+ 1M-10M
15 10,000,000+ 10M-100M
16 50,000,000+ 10M-100M
17 100,000,000+ 100M-1B
18 500,000,000+ 100M-1B
19 1,000,000,000+ > 1B
答案 1 :(得分:1)
如果您真的想使用apply,则可以只定义一个函数来检查一个块中的所有条件。
def classify(x):
if x in ['0+', '1+', '5+', '10+', '50+', '100+', '500+']:
return '0-1k'
elif x in ['1,000+', '5,000+', '10,000+', '50,000+']:
return '1k-100k'
elif x in ['100,000+', '500,000+']:
return '100k-1M'
elif x in ['1,000,000+', '5,000,000+']:
return '1M-10M'
elif x in ['10,000,000+', '50,000,000+']:
return '10M-100M'
elif x in ['100,000,000+', '500,000,000+']:
return '100M-1B'
elif x in ['1,000,000,000+']:
return '> 1B'
else:
return 'error'
df['Downloads'] = df['Install'].apply(classify)
Install Downloads
0 0+ 0-1k
1 1+ 0-1k
2 5+ 0-1k
3 10+ 0-1k
4 50+ 0-1k
5 100+ 0-1k
6 500+ 0-1k
7 1,000+ 1k-100k
8 5,000+ 1k-100k
9 10,000+ 1k-100k
10 50,000+ 1k-100k
11 100,000+ 100k-1M
12 500,000+ 100k-1M
13 1,000,000+ 1M-10M
14 5,000,000+ 1M-10M
15 10,000,000+ 10M-100M
16 50,000,000+ 10M-100M
17 100,000,000+ 100M-1B
18 500,000,000+ 100M-1B
19 1,000,000,000+ > 1B