Question

我正在使用这样的数据框，但是更大，并且具有更大的区域。我试图通过它们的名称对行中的value求和。 R或C区域的总和进入total列，而M个区域的总和进入total1。

输入：

total，total1是所需的输出。

ID  Zone1   CHC1    Value1  Zone2     CHC2  Value2  Zone3   CHC3    Value3  total   total1
 1  R5B     100      10       C2        0     20      R10A   2       5        35       0
 1  C2       95      20      M2-6       5      6      R5B    7       3        23       6       
 3  C2       40      4        C4       60      6       0     6       0        10       0
 3  C1       100     8         0        0      0       0    100      0        8        0
 5  M1-5     10      6       M2-6      86     15       0     0       0        0        21

Answer 1

您可以将filter用于<button id="playPauseBtn">Play</button>和Zones的数据帧：

Values

如果要检查子字符串，则用z = df.filter(like='Zone') v = df.filter(like='Value')由contains创建boolean DataFrame s：

apply

最后按每行where m1 = z.apply(lambda x: x.str.contains('R|C')) m2 = z.apply(lambda x: x.str.contains('M')) #for check strings #m1 = z == 'R2' #m2 = z.isin(['C1', 'C4'])和v进行过滤：

sum

Answer 2

Solution1（简单的代码，但速度较慢且灵活性较差）

total = []
total1 = []

for i in range(df.shape[0]):
    temp = df.iloc[i].tolist()
    if "R2" in temp:
        total.append(temp[temp.index("R2")+1])
    else:
        total.append(0)
    if ("C1" in temp) & ("C4" in temp):
        total1.append(temp[temp.index("C1")+1] + temp[temp.index("C4")+1])
    else:
        total1.append(0)

df["Total"] = total
df["Total1"] = total1

解决方案2（比solution1更快，更易于自定义，但可能占用大量内存）

# columns to use
cols = df.columns.tolist()
zones = [x for x in cols if x.startswith('Zone')]
vals = [x for x in cols if x.startswith('Value')]

# you can customize here
bucket1 = ['R2']
bucket2 = ['C1', 'C4']
thresh = 2 # "OR": 1, "AND": 2

original = df.copy()

# bucket1 check
for zone in zones:
    df.loc[~df[zone].isin(bucket1), cols[cols.index(zone)+1]] = 0

original['Total'] = df[vals].sum(axis=1)
df = original.copy()

# bucket2 check
for zone in zones:
    df.loc[~df[zone].isin(bucket2), cols[cols.index(zone)+1]] = 0

df['Check_Bucket'] = df[zones].stack().reset_index().groupby('level_0')[0].apply(list)
df['Check_Bucket'] = df['Check_Bucket'].apply(lambda x: len([y for y in x if y in bucket2]))
df['Total1'] = df[vals].sum(axis=1)
df.loc[df.Check_Bucket < thresh, 'Total1'] = 0
df.drop('Check_Bucket', axis=1, inplace=True)

当我将原始数据帧扩展到10万行时，解决方案1占用11.4 s ± 82.1 ms per loop，而解决方案2占用3.53 s ± 29.8 ms per loop。区别在于解决方案2不会在行方向上循环。

按字符串名称总结熊猫

2 个答案: