我有一个格式为:
的列表列表testing_set = ["001,P01", "002,P01,P02", "003,P01,P02,P09", "004,P01,P03"]
我使用re
重新格式化列表:
[in] test_set1 = [ re.split(r',', line, maxsplit=5) for line in testing_set]
[out] ["001","P01"]
如何创建索引为(transaction_id)“001,002,003,004”的数据框,并在列(product_id)中列出每行的p值。
答案 0 :(得分:0)
这可以这样做,
testing_set = ["001,P01","002,P01,P02","003,P01,P02,P09","004,P01,P03"]
test_set1 = [re.split(r',', line, maxsplit=1) for line in testing_set]
#change maxsplit to 1______________________^
df =pd.DataFrame(test_set1,columns=['transaction_id','product_id'])
df.set_index(['transaction_id'],inplace=True)
df['product_id'] = df['product_id'].apply(lambda row: row.split(','))
它为您提供了这样的数据框
Product_id
transaction_id
001 [P01]
002 [P01, P02]
003 [P01, P02, P09]
004 [P01, P03]