一个简单的问题。
我有一个CSV格式,包含很多列。我有1列名为:美食,具有很多价值。
app.config[‘PROPAGATE_EXCEPTIONS’] = True
我想从此CSV格式创建一个新的CSV格式,共有2列: - 名称 -美食(通过拆分第一个CSV)
这是我创建的脚本,我仅选择两列对我感兴趣的列:名称和美食:
name,Cuisine
Real Talent Cafe,"Italian, American, Pizza, Mediterranean, European, Fusion"
Dogma,"International, Mediterranean, Barbecue, Spanish, Fusion"
Taberna El Callejon,"Mediterranean, European, Spanish"
Astor,"International, Mediterranean, European, Fusion"
La Gaditana Castellana,"Spanish, Seafood, International, Diner, Wine Bar"
我得到以下错误:
# -*- coding: utf-8 -*-
from itertools import chain
import numpy as np
import pandas as pd
df = pd.read_csv('res_madrid.csv', usecols=['name','Cuisine'])
items_count = df["Cuisine"].str.count(",") +1
pd.DataFrame({"name": np.repeat(df["name"], items_count),
"Cuisine": list(chain.from_iterable(df["Cuisine"].str.split(",")))})
请注意,如果您进行测试,复制我分享给您的数据,它将起作用... 当我加载包含更多列的CSV文件并且使用“ usecols”参数时,就会出现此问题。
预期结果如下:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.6/site-packages/numpy/core/fromnumeric.py", line 471, in repeat
return _wrapfunc(a, 'repeat', repeats, axis=axis)
File "/usr/lib64/python3.6/site-packages/numpy/core/fromnumeric.py", line 56, in _wrapfunc
return getattr(obj, method)(*args, **kwds)
File "/usr/lib64/python3.6/site-packages/pandas/core/series.py", line 1157, in repeat
new_index = self.index.repeat(repeats)
File "/usr/lib64/python3.6/site-packages/pandas/core/indexes/base.py", line 862, in repeat
return self._shallow_copy(self._values.repeat(repeats))
ValueError: count < 0
编辑:由于我在Cuisine列中的值为空,因此出现错误。我如何避免这种情况?
感谢您的帮助:) 问候 亚历山大
答案 0 :(得分:1)
data = pd.read_csv(#path to txt file)
数据
name Cuisine
0 Real Talent Cafe Italian, American, Pizza, Mediterranean, Europ...
1 Dogma International, Mediterranean, Barbecue, Spanis...
2 Taberna El Callejon Mediterranean, European, Spanish
3 Astor International, Mediterranean, European, Fusion
4 La Gaditana Castellana Spanish, Seafood, International, Diner, Wine Bar
使用
data.set_index('name')['Cuisine'].apply(lambda x: x.split(',')).apply(pd.Series).stack().reset_index().drop('level_1', axis=1)
data.columns = ['name', 'cusisine']
输出
data.head()
name cusisine
0 Real Talent Cafe Italian
1 Real Talent Cafe American
2 Real Talent Cafe Pizza
3 Real Talent Cafe Mediterranean
4 Real Talent Cafe European
答案 1 :(得分:1)
怎么样
pd.concat([Series(row['name'], row['Cuisine'].split(','))
for index, row in df.iterrows()]).reset_index()
然后,您只需要重命名列
答案 2 :(得分:0)
如果您要的解决方案不包含apply
并列出内容,则可以执行以下操作:
pd.DataFrame(df.Cuisine.str.split(',').values.tolist(), index=df.Name)\
.stack().reset_index().drop('level_1', axis=1)