我认为我的代码很接近,但真的搞砸了。基本上,我有一排销售数据。其中一个变量是收据编号,而另一个是包含单位和产品作为字符串的字符串。
我想为每个独特的产品添加一个新行,并保留该行的相关收据编号。例如,带有产品L和Q的收据A的条目应该成为两个单独的条目:带有L的收据A和带有Q的收据A.所以我试图将字符串拆分成变量并在下创建一个新条目相同的收据标识符。
无论如何,但是在我完成的时候,我意外地复制了一些条目。任何帮助a)使它生成正确的输出和b)更漂亮/更简单的代码? (我也不需要旧的变量描述)。
非常感谢
import pandas as pd
import numpy as np
df = pd.DataFrame({"Date": ["9/26/17 2:33 PM", "9/26/17 2:23 PM", "9/26/17 2:22 PM"], "Receipt number": ["1-1002","1-1001","1-1000"], "Description": ["1 x Capacino, 2 x Americano","1 x Americano","1 x Latte"]})
df
df2 = df['Description'].str.split(',').apply(pd.Series, 1).stack()
df2.index = df2.index.droplevel(-1)
df2.name = 'Product'
df = df.join(df2)
df.join(df['Product'].str.split(' x ', 1, expand=True).rename(columns={0:'Units', 1:'Product Name'}))
答案 0 :(得分:1)
嗯,你很亲密。除了最后一行。这是我做的,跟着你的代码:
import pandas as pd
import numpy as np
df = pd.DataFrame({"Date": ["9/26/17 2:33 PM", "9/26/17 2:23 PM", "9/26/17 2:22 PM"], "Receipt number": ["1-1002","1-1001","1-1000"], "Description": ["1 x Capacino, 2 x Americano","1 x Americano","1 x Latte"]})
df2 = df['Description'].str.split(',').apply(pd.Series, 1).stack()
df2.index = df2.index.droplevel(-1)
df2.name = 'Product'
df = df.join(df2)
# Here is where I diverge
df['Units'] = df['Product'].apply(lambda x: int(x.split(' x ')[0]))
df['Product'] = df['Product'].apply(lambda x: x.split(' x ')[-1])
df = df.drop('Description', axis=1)
<强>结果强>
Date Receipt number Product Units
0 9/26/17 2:33 PM 1-1002 Capacino 1
0 9/26/17 2:33 PM 1-1002 Americano 2
1 9/26/17 2:23 PM 1-1001 Americano 1
2 9/26/17 2:22 PM 1-1000 Latte 1