Question

我有一些要重新格式化为的csv数据

除了数据A和B，我想使用Data1-4作为列名并使用Value 1-4作为值来更改此格式

我有数百万行，我不想在其中循环。我正在使用python数据框。

请建议执行此操作的最佳方法，因为成千上万次循环将花费大量时间，而我想以性能方面的最佳方式来完成任务。

我正在尝试做的更多示例数据：

Answer 1

如果输入为Series，具有3级MultiIndex，则使用Series.unstack：

print (type(s))
<class 'pandas.core.series.Series'>
print (s.index.nlevels)
3

df = s.unstack(fill_value=0)

或者如果输入为4列DataFrame，则首先通过向前填充来补正前2列中的缺失值，然后使用DataFrame.set_index通过Series.unstack进行整形：

print (type(df))
<class 'pandas.core.frame.DataFrame'>
print (len(df.columns))
4

df.columns = ['Col1','Col2','Col3','Col4']
cols = ['Col1','Col2']
df[cols] = df[cols].ffill()
df = df.set_index(['Col1','Col2','Col3'])['Col4'].unstack(fill_value=0)

使用python数据帧格式化csv数据

1 个答案: