我正在尝试使用pandas.melt重塑数据
这是我的txt文件
2017/11/14(Tue)
23:20 Aditya Laksana S. hahaha
23:20 Aditya Laksana S. [Sticker]
23:20 Veronika Xaveria [Sticker]
2017/12/14(Thu)
24:12 Veronika Xaveria xxxxxxxx
24:14 Aditya Laksana S. weeee
24:15 Aditya Laksana S. [Sticker]
我希望数据看起来像
2017/11/14(Tue) 23:20 Aditya Laksana S. hahaha
2017/11/14(Tue) 23:20 Aditya Laksana S. [Sticker]
2017/11/14(Tue) 23:20 Veronika Xaveria [Sticker]
2017/12/14(Thu) 24:12 Veronika Xaveria xxxxxxxx
2017/12/14(Thu) 24:14 Aditya Laksana S. weeee
2017/12/14(Thu) 24:15 Aditya Laksana S. [Sticker]
答案 0 :(得分:1)
如果我了解您要查找的内容以及当前数据框的实际外观,我想您可以按日期拆分数据框并使用update
,那么我认为这并不是迭代时最有效的解决方案通过dfs的镜头。
假设这个df,我也假设它不是多索引,因为您没有指定它是:
0 1
0 2017/11/14(Tue) NaN
1 23:20 Aditya Laksana S. hahaha
2 23:20 Aditya Laksana S. [Sticker]
3 23:20 Veronika Xaveria [Sticker]
4 2017/12/14(Thu) NaN
5 24:12:00 Veronika Xaveria xxxxxxxx
6 24:14:00 Aditya Laksana S. weeee
7 24:15:00 Aditya Laksana S. [Sticker]
然后:
# find the index of the dates assuming that they follow the below format
idx = list(df[df[0].str.contains('Mon|Tue|Wed|Thu|Fri|Sat|Sun')].index)
# find all the values in idx
values = list(df.iloc[idx, 0].values)
# split your dataframe on idx
# this assumes that the first row contains a date
dfs = np.split(df,idx[1:])
# update your df using list comprehension
df[0].update(pd.concat([values[i] +' '+ dfs[i][0] for i in range(len(dfs))]))
# drop nulls
df.dropna()
0 1
1 2017/11/14(Tue) 23:20 Aditya Laksana S. hahaha
2 2017/11/14(Tue) 23:20 Aditya Laksana S. [Sticker]
3 2017/11/14(Tue) 23:20 Veronika Xaveria [Sticker]
5 2017/12/14(Thu) 24:12:00 Veronika Xaveria xxxxxxxx
6 2017/12/14(Thu) 24:14:00 Aditya Laksana S. weeee
7 2017/12/14(Thu) 24:15:00 Aditya Laksana S. [Sticker]