使用熊猫将列拆分为csv

时间:2019-02-12 14:46:24

标签: python pandas csv

一个简单的问题。

我有一个CSV格式,包含很多列。我有1列名为:美食,具有很多价值。

app.config[‘PROPAGATE_EXCEPTIONS’] = True

我想从此CSV格式创建一个新的CSV格式,共有2列: - 名称 -美食(通过拆分第一个CSV)

这是我创建的脚本,我仅选择两列对我感兴趣的列:名称和美食

name,Cuisine
Real Talent Cafe,"Italian, American, Pizza, Mediterranean, European, Fusion"
Dogma,"International, Mediterranean, Barbecue, Spanish, Fusion"
Taberna El Callejon,"Mediterranean, European, Spanish"
Astor,"International, Mediterranean, European, Fusion"
La Gaditana Castellana,"Spanish, Seafood, International, Diner, Wine Bar"

我得到以下错误:

# -*- coding: utf-8 -*-
from itertools import chain
import numpy as np
import pandas as pd

df = pd.read_csv('res_madrid.csv', usecols=['name','Cuisine'])
items_count = df["Cuisine"].str.count(",") +1

pd.DataFrame({"name": np.repeat(df["name"], items_count),
    "Cuisine": list(chain.from_iterable(df["Cuisine"].str.split(",")))})

请注意,如果您进行测试,复制我分享给您的数据,它将起作用... 当我加载包含更多列的CSV文件并且使用“ usecols”参数时,就会出现此问题。

预期结果如下:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.6/site-packages/numpy/core/fromnumeric.py", line 471, in repeat
    return _wrapfunc(a, 'repeat', repeats, axis=axis)
  File "/usr/lib64/python3.6/site-packages/numpy/core/fromnumeric.py", line 56, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
  File "/usr/lib64/python3.6/site-packages/pandas/core/series.py", line 1157, in repeat
    new_index = self.index.repeat(repeats)
  File "/usr/lib64/python3.6/site-packages/pandas/core/indexes/base.py", line 862, in repeat
    return self._shallow_copy(self._values.repeat(repeats))
ValueError: count < 0

编辑:由于我在Cuisine列中的值为空,因此出现错误。我如何避免这种情况?

感谢您的帮助:) 问候 亚历山大

3 个答案:

答案 0 :(得分:1)

data = pd.read_csv(#path to txt file)

数据

                     name                                            Cuisine
0        Real Talent Cafe  Italian, American, Pizza, Mediterranean, Europ...
1                   Dogma  International, Mediterranean, Barbecue, Spanis...
2     Taberna El Callejon                   Mediterranean, European, Spanish
3                   Astor     International, Mediterranean, European, Fusion
4  La Gaditana Castellana   Spanish, Seafood, International, Diner, Wine Bar

使用

data.set_index('name')['Cuisine'].apply(lambda x: x.split(',')).apply(pd.Series).stack().reset_index().drop('level_1', axis=1)
data.columns = ['name', 'cusisine']

输出

 data.head()


               name        cusisine
0  Real Talent Cafe         Italian
1  Real Talent Cafe        American
2  Real Talent Cafe           Pizza
3  Real Talent Cafe   Mediterranean
4  Real Talent Cafe        European

答案 1 :(得分:1)

怎么样

pd.concat([Series(row['name'], row['Cuisine'].split(','))              
                for index, row in df.iterrows()]).reset_index()

然后,您只需要重命名列

答案 2 :(得分:0)

如果您要的解决方案不包含apply并列出内容,则可以执行以下操作:

pd.DataFrame(df.Cuisine.str.split(',').values.tolist(), index=df.Name)\
.stack().reset_index().drop('level_1', axis=1)