re和pandas,重塑名单

时间:2017-07-29 16:35:53

标签: python list pandas

我有一个格式为:

的列表列表
testing_set = ["001,P01", "002,P01,P02", "003,P01,P02,P09", "004,P01,P03"]

我使用re重新格式化列表:

[in] test_set1 = [ re.split(r',', line, maxsplit=5) for line in testing_set]

[out] ["001","P01"]

如何创建索引为(transaction_id)“001,002,003,004”的数据框,并在列(product_id)中列出每行的p值。

1 个答案:

答案 0 :(得分:0)

这可以这样做,

testing_set = ["001,P01","002,P01,P02","003,P01,P02,P09","004,P01,P03"]

test_set1 = [re.split(r',', line, maxsplit=1) for line in testing_set]
#change maxsplit to 1______________________^

df =pd.DataFrame(test_set1,columns=['transaction_id','product_id'])
df.set_index(['transaction_id'],inplace=True)
df['product_id'] = df['product_id'].apply(lambda row: row.split(','))

它为您提供了这样的数据框

                     Product_id
transaction_id                 
001                       [P01]
002                  [P01, P02]
003             [P01, P02, P09]
004                  [P01, P03]