我使用大尺寸(48K行,最多数十列)DataFrames。在他们操纵的某个时刻,我需要对列值进行成对减法,我想知道是否有更有效的方法来做到这一点而不是我正在做的那个(见下文)。
我目前的代码:
# Matrix is the pandas DataFrame containing all the data
comparison_df = pandas.DataFrame(index=matrix.index)
combinations = itertools.product(group1, group2)
for observed, reference in combinations:
observed_data = matrix[observed]
reference_data = matrix[reference]
comparison = observed_data - reference_data
name = observed + "_" + reference
comparison_df[name] = comparison
由于数据可能很大(我在排列测试中也使用了这段代码),我很想知道它是否可以稍微优化一下。
编辑:根据要求,这是典型数据集的样本
ID A1 A2 A3 B1 B2 B3
Ku8QhfS0n_hIOABXuE 6.343 6.304 6.410 6.287 6.403 6.279
fqPEquJRRlSVSfL.8A 6.752 6.681 6.680 6.677 6.525 6.739
ckiehnugOno9d7vf1Q 6.297 6.248 6.524 6.382 6.316 6.453
x57Vw5B5Fbt5JUnQkI 6.268 6.451 6.379 6.371 6.458 6.333
如果“A”组为group1
和“B”group2
,则每个ID行的典型结果是,每列都有一对(例如,A1_B1,A2_B1) ,A3_B1 ...)对应于上面生成的配对,包含每行ID的减法。
答案 0 :(得分:2)
DataFrame
列itertools.combinations()
您可以使用itertools.combinations()
创建列组合,并根据这些对评估减法和新名称:
import pandas as pd
from cStringIO import StringIO
import itertools as iter
matrix = pd.read_csv(StringIO('''ID,A1,A2,A3,B1,B2,B3
Ku8QhfS0n_hIOABXuE,6.343,6.304,6.410,6.287,6.403,6.279
fqPEquJRRlSVSfL.8A,6.752,6.681,6.680,6.677,6.525,6.739
ckiehnugOno9d7vf1Q,6.297,6.248,6.524,6.382,6.316,6.453
x57Vw5B5Fbt5JUnQkI,6.268,6.451,6.379,6.371,6.458,6.333''')).set_index('ID')
print 'Original DataFrame:'
print matrix
print
# Create DataFrame to fill with combinations
comparison_df = pd.DataFrame(index=matrix.index)
# Create combinations of columns
for a, b in iter.combinations(matrix.columns, 2):
# Subtract column combinations
comparison_df['{}_{}'.format(a, b)] = matrix[a] - matrix[b]
print 'Combination DataFrame:'
print comparison_df
Original DataFrame:
A1 A2 A3 B1 B2 B3
ID
Ku8QhfS0n_hIOABXuE 6.343 6.304 6.410 6.287 6.403 6.279
fqPEquJRRlSVSfL.8A 6.752 6.681 6.680 6.677 6.525 6.739
ckiehnugOno9d7vf1Q 6.297 6.248 6.524 6.382 6.316 6.453
x57Vw5B5Fbt5JUnQkI 6.268 6.451 6.379 6.371 6.458 6.333
Combination DataFrame:
A1_A2 A1_A3 A1_B1 A1_B2 A1_B3 A2_A3 A2_B1 A2_B2 A2_B3 A3_B1 A3_B2 A3_B3 B1_B2 B1_B3 B2_B3
ID
Ku8QhfS0n_hIOABXuE 0.039 -0.067 0.056 -0.060 0.064 -0.106 0.017 -0.099 0.025 0.123 0.007 0.131 -0.116 0.008 0.124
fqPEquJRRlSVSfL.8A 0.071 0.072 0.075 0.227 0.013 0.001 0.004 0.156 -0.058 0.003 0.155 -0.059 0.152 -0.062 -0.214
ckiehnugOno9d7vf1Q 0.049 -0.227 -0.085 -0.019 -0.156 -0.276 -0.134 -0.068 -0.205 0.142 0.208 0.071 0.066 -0.071 -0.137
x57Vw5B5Fbt5JUnQkI -0.183 -0.111 -0.103 -0.190 -0.065 0.072 0.080 -0.007 0.118 0.008 -0.079 0.046 -0.087 0.038 0.125