我的df看起来像这样:
names col1 col2 col3 total total_col1 total_col2
bbb 1 1 0 2 DF1, DF2 DF1
ccc 1 0 0 1 DF1
zzz 0 1 1 2 DF2
qqq 0 1 0 1 DF1, Df2
rrr 0 0 1 1
我希望计算每个total_col#
中的数字并添加另一个full total col
,以便输出为:
names col1 col2 col3 total total_full total_col1 total_col2
bbb 1 1 0 2 5 2 1
ccc 1 0 0 1 2 1
zzz 0 1 1 2 3 1
qqq 0 1 0 1 3 2
rrr 0 0 1 1
所以每个total col
对其中的DF数进行求和,total full
将这些col与total
col相加。
pandas可以吗?
答案 0 :(得分:0)
您可以使用:
#filter columns for replacement
cols = df.columns[df.columns.str.startswith('total_')]
#split and get length of lists, write back
df[cols] = df[cols].apply(lambda x: x.str.split(',').str.len())
#add new column to position next total column
df.insert(df.columns.get_loc('total') + 1, 'total_full', df.filter(like='total').sum(axis=1))
print (df)
names col1 col2 col3 total total_full total_col1 total_col2
0 bbb 1 1 0 2 5.0 2.0 1.0
1 ccc 1 0 0 1 2.0 1.0 NaN
2 zzz 0 1 1 2 3.0 NaN 1.0
3 qqq 0 1 0 1 3.0 NaN 2.0
4 rrr 0 0 1 1 1.0 NaN NaN
答案 1 :(得分:0)
您可以使用
totals = df.filter(regex=r'^total_col')
counts = (totals.stack().str.count(',')+1).unstack()
# total_col1 total_col2
# 0 2.0 1.0
# 1 1.0 NaN
# 2 NaN 1.0
# 3 NaN 2.0
计算总计列中的字符串数。
要将非NaN值排序到每行的末尾,您可以使用
counts_array = np.sort(counts.values, axis=1)
counts = pd.DataFrame(counts_array, columns=counts.columns, index=counts.index)
import numpy as np
import pandas as pd
nan = np.nan
df = pd.DataFrame({'col1': [1, 1, 0, 0, 0],
'col2': [1, 0, 1, 1, 0],
'col3': [0, 0, 1, 0, 1],
'names': ['bbb', 'ccc', 'zzz', 'qqq', 'rrr'],
'total': [2, 1, 2, 1, 1],
'total_col1': ['DF1, DF2', 'DF1', nan, nan, nan],
'total_col2': ['DF1', nan, 'DF2', 'DF1, Df2', nan]})
totals = df.filter(regex=r'^total_col')
counts = (totals.stack().str.count(',')+1).unstack()
counts_array = np.sort(counts.values, axis=1)
counts = pd.DataFrame(counts_array, columns=counts.columns, index=counts.index)
df[totals.columns] = counts
df['total_full'] = df.filter(regex=r'^total').sum(axis=1)
print(df)
产量
col1 col2 col3 names total total_col1 total_col2 total_full
0 1 1 0 bbb 2 1.0 2.0 5.0
1 1 0 0 ccc 1 1.0 NaN 2.0
2 0 1 1 zzz 2 1.0 NaN 3.0
3 0 1 0 qqq 1 2.0 NaN 3.0
4 0 0 1 rrr 1 NaN NaN 1.0