Question

如何使用Pandas和Python执行此操作？我正在使用Jupyter Notebook。

我有一个这样的数据框：

master

我想从那里将值拆分成他们自己的列，如下所示：

    col
0   2017 / something
1   $5.91 (× 1)
2   Premium
3   2017 / anotherthing
4   $16.0 (× 1)
5   Business

然后从收入列中删除括号值和美元符号，以便我最终得到：

    col                    revenue        plan
0   2017 / something       $5.91 (× 1)    Premium
1   2017 / anotherthing    $16.0 (× 1)    Business

Answer 1

In [113]: (df[['col']].replace(r'\s*\([^\)]*\)', '', regex=True)
     ...:             .set_index(np.arange(len(df))//3)
     ...:             .set_index(np.arange(len(df))%3, append=True)['col']
     ...:             .unstack())
     ...:
Out[113]:
                     0      1         2
0     2017 / something  $5.91   Premium
1  2017 / anotherthing  $16.0  Business

Answer 2

使用pd.MultiIndex.from_arrays和__divmod__
我们使用3的值，因为我们想要得到的3列。

d = df.set_index(
    pd.MultiIndex.from_arrays(np.arange(len(df)).__divmod__(3))
).col.unstack().rename(columns={0: 'col', 1: 'revenue', 2: 'plan'})

d.assign(revenue=d.revenue.str.extract('\$(.*) \(', expand=False))

                   col revenue      plan
0     2017 / something    5.91   Premium
1  2017 / anotherthing    16.0  Business

Answer 3

改编其他部分解决方案，使用更清洁的解决方案，以获得操作请求的输出。

# make dataframe 
df = pd.DataFrame(columns=['col'], data=['2017 / something', '$5.91 (× 1)', 'Premium', '2017 / anotherthing', '$16.0 (× 1)', 'Business'])

# break into 3 columns(per piRSquared's solution) and rename
df = df.set_index(
    pd.MultiIndex.from_arrays(np.arange(len(df)).__divmod__(3))
    ).col.unstack().rename(columns={0: 'col', 1: 'revenue', 2: 'plan'})

# strip parenthesis values and dollar signs
df.revenue = df.revenue.replace(r'\s*\([^\)]*\)', '', regex=True).str.strip('$')
print(df)

输出：

                   col revenue      plan
0     2017 / something    5.91   Premium
1  2017 / anotherthing    16.0  Business

Pandas df将值从一列拆分为各自的列

3 个答案: