我正在研究python 2.7,pandas(版本0.18.1)数据帧。我必须在数据框中添加多个列。为此,我试图以下面的方式解决
data
是我的数据框(列大小:超过50),大小约为800MB
我的样本数据如下所示
+---+---+----+----+---+---------+---+----+----+---+----------+
| a | b | c | d | e | f | g | h | i | j | discount |
+---+---+----+----+---+---------+---+----+----+---+----------+
| 0 | 1 | 100| | | 65497.6 | | | | | 0 |
| 0 | 1 | | | | 73882.8 | | | | | 0 |
| 1 | 0 | | | | 88588 | | 22 | | | 0 |
| 0 | 1 | | | | 106480 | | 20 | 10 | | 0 |
| 1 | | | | | 52500 | | | | | 0 |
| 0 | | 20 | 10 | | 22997.5 | | | | | 0 |
| | | | | | | | | | | 0 |
| 0 | 1 | | 20 | | 0 | | | | | 0 |
| 0 | 0 | | | | 10520 | | | | | 0 |
+---+---+----+----+---+---------+---+----+----+---+----------+
方法1:
for metricColumn in metricColumns:
if metricColumn not in dataColumns:
data[metricColumn] = 0
方法2:
df = pd.DataFrame()
for metricColumn in metricColumns:
if metricColumn not in dataColumns:
df[metricColumn] = 0
data[df.columns] = df
data[df.columns] = 0
方法3:
appendColumns = []
for metricColumn in metricColumns:
if metricColumn not in dataColumns:
appendColumns.append(metricColumns)
data = pd.concat([data,pd.DataFrame(columns=appendColumns)])
所有这3种方式需要花费大量时间和内存,即大约10分钟和50GB内存
如果我有任何错误,请帮助我如何改进或告诉我