如何使用我的.apply()函数中的新列解决讨厌的语法错误

时间:2019-03-29 22:23:36

标签: python pandas numpy dataframe

我正尝试对表示应用程序已下载次数的一系列字符串进行重新分类,因为它不显示原始下载次数。我必须将20个字符串分成7个不同的项目,然后将它们放在名为“下载”的新列中。

我尝试编辑括号和括号。我以前错误地使用了np.apply。

from pandas import DataFrame 

fear = pd.read_csv('googleplaystore.csv', encoding='latin')

n_ratings = {'Install':['0+', '1+', '5+', '10+', '50+', '100+', '500+', '1,000+', '5,000+', 
             '10,000+', '50,000+', '100,000+', '500,000+', '1,000,000+', '5,000,000+', '10,000,000+', 
             '50,000,000+', '100,000,000+', '500,000,000+', '1,000,000,000+']}
df = DataFrame(n_ratings, columns=['Install'])  

df['downloads'] = df['Install'].apply(lambda x: '0-1k' if x.isin(['0+', '1+', '5+', '10+', '50+', '100+', '500+'])

df['downloads'] = df['Install'].apply(lambda x: '1k-100k' if x.isin(['1,000+', '5,000+', '10,000+', '50,000+']))

df['downloads'] = df['Install'].apply(lambda x: '100k-1M' if x.isin(['100,000+', '500,000+'])

df['downloads'] = df['Install'].apply(lambda x: '1M-10M' if x.isin(['1,000,000+', '5,000,000+'])

df['downloads'] = df['Install'].apply(lambda x: '10M-100M' if x.isin(['10,000,000+', '50,000,000+'])

df['downloads'] = df['Install'].apply(lambda x: '100M-1B' if x.isin(['100,000,000+', '500,000,000+'])

df['downloads'] = df['Install'].apply(lambda x: '> 1B' if x.isin(['1,000,000,000+'])

2 个答案:

答案 0 :(得分:1)

您不需要applyif-else。只需使用np.select即可通过conditions,并根据这些条件通过choices

conditions = (
    df['Install'].isin(['0+', '1+', '5+', '10+', '50+', '100+', '500+']),
    df['Install'].isin(['1,000+', '5,000+', '10,000+', '50,000+']),
    df['Install'].isin(['100,000+', '500,000+']),
    df['Install'].isin(['1,000,000+', '5,000,000+']),
    df['Install'].isin(['10,000,000+', '50,000,000+']),
    df['Install'].isin(['100,000,000+', '500,000,000+']),
    df['Install'].isin(['1,000,000,000+'])
)

choices = ['0-1k', '1k-100k', '100k-1M', '1M-10M', '10M-100M', '100M-1B', '> 1B']

df['downloads'] = np.select(conditions, choices, default='unknown')

print(df)
           Install downloads
0               0+      0-1k
1               1+      0-1k
2               5+      0-1k
3              10+      0-1k
4              50+      0-1k
5             100+      0-1k
6             500+      0-1k
7           1,000+   1k-100k
8           5,000+   1k-100k
9          10,000+   1k-100k
10         50,000+   1k-100k
11        100,000+   100k-1M
12        500,000+   100k-1M
13      1,000,000+    1M-10M
14      5,000,000+    1M-10M
15     10,000,000+  10M-100M
16     50,000,000+  10M-100M
17    100,000,000+   100M-1B
18    500,000,000+   100M-1B
19  1,000,000,000+      > 1B

答案 1 :(得分:1)

如果您真的想使用apply,则可以只定义一个函数来检查一个块中的所有条件。

def classify(x):
    if x in ['0+', '1+', '5+', '10+', '50+', '100+', '500+']:
        return '0-1k'
    elif x in ['1,000+', '5,000+', '10,000+', '50,000+']:
        return '1k-100k'
    elif x in ['100,000+', '500,000+']:
        return '100k-1M' 
    elif x in ['1,000,000+', '5,000,000+']:
        return '1M-10M'
    elif x in ['10,000,000+', '50,000,000+']:
        return '10M-100M' 
    elif x in ['100,000,000+', '500,000,000+']:
        return '100M-1B'
    elif x in ['1,000,000,000+']:
        return '> 1B'
    else:
        return 'error'

df['Downloads'] = df['Install'].apply(classify)

           Install Downloads
0               0+      0-1k
1               1+      0-1k
2               5+      0-1k
3              10+      0-1k
4              50+      0-1k
5             100+      0-1k
6             500+      0-1k
7           1,000+   1k-100k
8           5,000+   1k-100k
9          10,000+   1k-100k
10         50,000+   1k-100k
11        100,000+   100k-1M
12        500,000+   100k-1M
13      1,000,000+    1M-10M
14      5,000,000+    1M-10M
15     10,000,000+  10M-100M
16     50,000,000+  10M-100M
17    100,000,000+   100M-1B
18    500,000,000+   100M-1B
19  1,000,000,000+      > 1B