更改pandas数据帧的数据结构

时间:2016-09-27 14:10:23

标签: pandas

我有这个样本数据......

import pandas as pd

from StringIO import StringIO

stock_list="""EAN code, name, stock
, MONIN Syrups,
12345, Monin Mojito Mint Syrup 250 ml, 100
, BONNE MAMAN,
7890. Bonne Maman Strawberry Preserve 370g, 200
6543, Bonne Maman Raspberry 370g, 150"""

audit = pd.read_csv(StringIO(stock_list), sep="," )

如果EAN代码是"不是数字"那它实际上就是产品类型。所以产品名称" MONIN糖浆"应该移动到以下产品的类型列,直到下一个NaN。 最终列表看起来像这样......

expected_list="""type,  EAN code,   name,   stock
MONIN Syrups,   12345,  Monin Mojito Mint Syrup 250 ml, 100
BONNE MAMAN,    7890,   Bonne Maman Strawberry Preserve 370g,   200
BONNE MAMAN,    6543,   Bonne Maman Raspberry 370g, 150"""

pd.read_csv(StringIO(expected_list), sep="," )

如何获取当前" stock_list"数据框并改变它看起来像expected_list?

1 个答案:

答案 0 :(得分:3)

Error report - SQL Error: ORA-00904: : invalid identifier 00904. 00000 - "%s: invalid identifier" *Cause: *Action: 列复制到name列,将元素清除为NaN并type

ffill()

输出:

import pandas as pd

from io import StringIO

stock_list="""EAN code, name, stock
, MONIN Syrups,
12345, Monin Mojito Mint Syrup 250 ml, 100
, BONNE MAMAN,
7890, Bonne Maman Strawberry Preserve 370g, 200
6543, Bonne Maman Raspberry 370g, 150"""


audit = pd.read_csv(StringIO(stock_list), sep=",", skipinitialspace=True)


audit["type"] = audit["name"]

mask = ~audit["EAN code"].isnull()
audit.loc[mask, "type"] = np.nan
audit["type"].ffill(inplace=True)
res = audit.loc[mask].reset_index(drop=True)
print(res)