我从我的grep命令得到一个输出,如下所示。
grep -r GFD . | cut -d: -f2
out_GFD_994 NSE_FO_BHP_1703 -9425 6800 361.45 11900 359.96 5100 0.34% 6137085.0 -15.36
out_GFD_994 NSE_FO_BHP_1704 15651 -6800 360.38 6800 362.04 13600 26.66% 7374430.0 21.22
out_GFD_994 NSE_FO_TLS_1703 -4996 2000 603.57 5000 602.68 3000 0.46% 4825900.0 -10.35
out_GFD_994 NSE_FO_TLS_1704 4480 -2000 605.71 3000 606.44 5000 29.62% 4849350.0 9.24
out_GFD_994 NSE_FO_MQG_1703 -11717 -20000 148.64 50000 148.64 70000 0.46% 17837250.0 -6.57
out_GFD_994 NSE_FO_MQG_1704 17213 20000 149.29 75000 149.39 55000 36.11% 19413500.0 8.87
out_GFD_Part2 NSE_FO_BHP_1703 -17597 -20000 0 0 39.25 20000 0.07% 785000.0 -224.17
out_GFD_Part2 NSE_FO_BHP_1704 14481 20000 39.6 20000 0 0 1.38% 792000.0 182.84
out_GFD_Part2 NSE_FO_TLS_1703 28312 1200 643.93 16800 645.52 15600 0.54% 20888220.0 13.55
out_GFD_Part2 NSE_FO_TLS_1704 -23813 -1200 647.91 16800 646.87 18000 34.11% 22528620.0 -10.57
out_GFD_Part2 NSE_FO_MQG_1703 -133456 8800 1029.33 25300 1025.86 16500 0.55% 42968915.0 -31.06 unhedged
out_GFD_Part2 NSE_FO_MQG_1704 141534 -7700 1031.26 33000 1033.85 40700 49.62% 76109605.0 18.60
我需要根据第二列值作为关键字进行清除/合并/取消堆叠(无论哪个听起来合适)。 因此输出数据转换为(相应值的总和)
out_GFD_994 out_GFD_Part2 NSE_FO_BHP_1703 -9425-17597 6800-20000 361.45 11900 359.96 5100 0.34% 6137085.0 -15.36
out_GFD_994 out_GFD_Part2 NSE_FO_BHP_1704 15651+14481 -6800+20000 360.38 6800 362.04 13600 26.66% 7374430.0 21.22
out_GFD_994 out_GFD_Part2 NSE_FO_TLS_1703 -4996+28312 2000+1200 603.57 5000 602.68 3000 0.46% 4825900.0 -10.35
out_GFD_994 out_GFD_Part2 NSE_FO_TLS_1704 4480-23813 -2000-1200 605.71 3000 606.44 5000 29.62% 4849350.0 9.24
out_GFD_994 out_GFD_Part2 NSE_FO_MQG_1703 -11717-133456 -20000+8800 148.64 50000 148.64 70000 0.46% 17837250.0 -6.57
out_GFD_994 out_GFD_Part2 NSE_FO_MQG_1704 17213+141534 20000-7700 149.29 75000 149.39 55000 36.11% 19413500.0 8.87
(only 2 columns shown in expected output format)
我可以为列命名并将其加载为pandas数据帧,如果这为解决此问题铺平了道路。
更新1:
我现在只处理5列,并将其加载到我的pandas数据框中,如下所示
>>> df
grep_string key val1 val2 val3
0 out_GFD_994 NSE_FO_BHP_1703 -9425 6800 361.45
1 out_GFD_994 NSE_FO_BHP_1704 15651 -6800 360.38
2 out_GFD_994 NSE_FO_TLS_1703 -4996 2000 603.57
3 out_GFD_994 NSE_FO_TLS_1704 4480 -2000 605.71
4 out_GFD_994 NSE_FO_MQG_1703 -11717 -20000 148.64
5 out_GFD_994 NSE_FO_MQG_1704 17213 20000 149.29
6 out_GFD_Part2 NSE_FO_BHP_1703 -17597 -20000 0.00
7 out_GFD_Part2 NSE_FO_BHP_1704 14481 20000 39.60
8 out_GFD_Part2 NSE_FO_TLS_1703 28312 1200 643.93
9 out_GFD_Part2 NSE_FO_TLS_1704 -23813 -1200 647.91
10 out_GFD_Part2 NSE_FO_MQG_1703 -133456 8800 1029.33
11 out_GFD_Part2 NSE_FO_MQG_1704 141534 -7700 1031.26
如何使用键列
进行(求和)合并更新2:
将汇总的列值添加到日志文件中,如下所示:
NSE_FO_BHP_1703_MAXLONGPOS = 200000
NSE_FO_BHP_1703_MAXSHORTPOS = 200000
NSE_FO_BHP_1703_MAXLONGEXPOSURE = 250000
NSE_FO_BHP_1703_MAXSHORTEXPOSURE = 250000
NSE_FO_BHP_1704_MAXLONGPOS = 200000
NSE_FO_BHP_1704_MAXSHORTPOS = 200000
NSE_FO_BHP_1704_MAXLONGEXPOSURE = 250000
NSE_FO_BHP_1704_MAXSHORTEXPOSURE = 250000
NSE_FO_TLS_1703_MAXLONGPOS = 100000
NSE_FO_TLS_1703_MAXSHORTPOS = 100000
NSE_FO_TLS_1703_MAXLONGEXPOSURE = 200000
NSE_FO_TLS_1703_MAXSHORTEXPOSURE = 200000
NSE_FO_TLS_1704_MAXLONGPOS = 100000
NSE_FO_TLS_1704_MAXSHORTPOS = 100000
NSE_FO_TLS_1704_MAXLONGEXPOSURE = 200000
NSE_FO_TLS_1704_MAXSHORTEXPOSURE = 200000
NSE_FO_MQG_1703_MAXLONGPOS = 300000
NSE_FO_MQG_1703_MAXSHORTPOS = 300000
NSE_FO_MQG_1703_MAXLONGEXPOSURE = 400000
NSE_FO_MQG_1703_MAXSHORTEXPOSURE = 400000
NSE_FO_DEF_1704_MAXLONGPOS = 300000
NSE_FO_MQG_1704_MAXSHORTPOS = 300000
NSE_FO_MQG_1704_MAXLONGEXPOSURE = 400000
NSE_FO_MQG_1704_MAXSHORTEXPOSURE = 400000
我们可以通过将它们映射到子字符串来添加我们在df(比如列d)中得到的求和输出值,以将其添加/减去上述文件。例如,我们在d栏中得到-13200。我们有NSE_FO_BHP_1703_MAXLONGPOS = 200000
。在a文件中,将其更改为213200
并更改NSE_FO_BHP_1703_MAXSHORTPOS to 186800
。更改MAXLONGEXPOSURE and MAXSHORTEXPOSURE to 263200 and 236800
。
答案 0 :(得分:1)
您可以groupby
使用由dict comprehension
创建的词典agg
print (df)
0 1 2 3 4 5 6 \
0 out_GFD_994 NSE_FO_BHP_1703 -9425 6800 361.45 11900 359.96
1 out_GFD_994 NSE_FO_BHP_1704 15651 -6800 360.38 6800 362.04
2 out_GFD_994 NSE_FO_TLS_1703 -4996 2000 603.57 5000 602.68
3 out_GFD_994 NSE_FO_TLS_1704 4480 -2000 605.71 3000 606.44
4 out_GFD_994 NSE_FO_MQG_1703 -11717 -20000 148.64 50000 148.64
5 out_GFD_994 NSE_FO_MQG_1704 17213 20000 149.29 75000 149.39
6 out_GFD_Part2 NSE_FO_BHP_1703 -17597 -20000 0.00 0 39.25
7 out_GFD_Part2 NSE_FO_BHP_1704 14481 20000 39.60 20000 0.00
8 out_GFD_Part2 NSE_FO_TLS_1703 28312 1200 643.93 16800 645.52
9 out_GFD_Part2 NSE_FO_TLS_1704 -23813 -1200 647.91 16800 646.87
10 out_GFD_Part2 NSE_FO_MQG_1703 -133456 8800 1029.33 25300 1025.86
11 out_GFD_Part2 NSE_FO_MQG_1704 141534 -7700 1031.26 33000 1033.85
7 8 9 10
0 5100 0.34% 6137085.0 -15.36
1 13600 26.66% 7374430.0 21.22
2 3000 0.46% 4825900.0 -10.35
3 5000 29.62% 4849350.0 9.24
4 70000 0.46% 17837250.0 -6.57
5 55000 36.11% 19413500.0 8.87
6 20000 0.07% 785000.0 -224.17
7 0 1.38% 792000.0 182.84
8 15600 0.54% 20888220.0 13.55
9 18000 34.11% 22528620.0 -10.57
10 16500 0.55% 42968915.0 -31.06
11 40700 49.62% 76109605.0 18.60
。最后从第一列创建另外2 split
:
#sum all columns without first,second and 9 column with percentage
d = {x:'sum' for x in df if not x in [0,1,8]}
#add custom function for first column
d.update({0:'|'.join})
print (d)
{0: <built-in method join of str object at 0x0000000001180AE8>, 2: 'sum',
3: 'sum', 4: 'sum', 5: 'sum', 6: 'sum', 7: 'sum', 9: 'sum', 10: 'sum'}
df = df.groupby(1).agg(d).reset_index()
df[[-2,-1]] = df.pop(0).str.split('|', expand=True)
#change order of columns
df = df.sort_index(axis=1)
#reset column names to default (0,1...)
df.columns = np.arange(len(df.columns))
print (df)
0 1 2 3 4 5 \
0 out_GFD_994 out_GFD_Part2 NSE_FO_BHP_1703 -27022 -13200 361.45
1 out_GFD_994 out_GFD_Part2 NSE_FO_BHP_1704 30132 13200 399.98
2 out_GFD_994 out_GFD_Part2 NSE_FO_MQG_1703 -145173 -11200 1177.97
3 out_GFD_994 out_GFD_Part2 NSE_FO_MQG_1704 158747 12300 1180.55
4 out_GFD_994 out_GFD_Part2 NSE_FO_TLS_1703 23316 3200 1247.50
5 out_GFD_994 out_GFD_Part2 NSE_FO_TLS_1704 -19333 -3200 1253.62
6 7 8 9 10
0 11900 399.21 25100 6922085.0 -239.53
1 26800 362.04 13600 8166430.0 204.06
2 75300 1174.50 86500 60806165.0 -37.63
3 108000 1183.24 95700 95523105.0 27.47
4 21800 1248.20 18600 25714120.0 3.20
5 19800 1253.31 23000 27377970.0 -1.33
df.columns = list('abcdefghijk')
print (df)
a b c d e f g \
0 out_GFD_994 NSE_FO_BHP_1703 -9425 6800 361.45 11900 359.96
1 out_GFD_994 NSE_FO_BHP_1704 15651 -6800 360.38 6800 362.04
2 out_GFD_994 NSE_FO_TLS_1703 -4996 2000 603.57 5000 602.68
3 out_GFD_994 NSE_FO_TLS_1704 4480 -2000 605.71 3000 606.44
4 out_GFD_994 NSE_FO_MQG_1703 -11717 -20000 148.64 50000 148.64
5 out_GFD_994 NSE_FO_MQG_1704 17213 20000 149.29 75000 149.39
6 out_GFD_Part2 NSE_FO_BHP_1703 -17597 -20000 0.00 0 39.25
7 out_GFD_Part2 NSE_FO_BHP_1704 14481 20000 39.60 20000 0.00
8 out_GFD_Part2 NSE_FO_TLS_1703 28312 1200 643.93 16800 645.52
9 out_GFD_Part2 NSE_FO_TLS_1704 -23813 -1200 647.91 16800 646.87
10 out_GFD_Part2 NSE_FO_MQG_1703 -133456 8800 1029.33 25300 1025.86
11 out_GFD_Part2 NSE_FO_MQG_1704 141534 -7700 1031.26 33000 1033.85
h i j k
0 5100 0.34% 6137085.0 -15.36
1 13600 26.66% 7374430.0 21.22
2 3000 0.46% 4825900.0 -10.35
3 5000 29.62% 4849350.0 9.24
4 70000 0.46% 17837250.0 -6.57
5 55000 36.11% 19413500.0 8.87
6 20000 0.07% 785000.0 -224.17
7 0 1.38% 792000.0 182.84
8 15600 0.54% 20888220.0 13.55
9 18000 34.11% 22528620.0 -10.57
10 16500 0.55% 42968915.0 -31.06
11 40700 49.62% 76109605.0 18.60
使用自定义列名称的解决方案:
d = {x:'sum' for x in df if not x in ['a','b', 'i']}
#add custom function for first column
d.update({'a':'|'.join})
print (d)
{'e': 'sum', 'k': 'sum', 'a': <built-in method join of str object at 0x0000000001180AE8>,
'f': 'sum', 'd': 'sum', 'g': 'sum', 'j': 'sum', 'c': 'sum', 'h': 'sum'}
df = df.groupby('b').agg(d).reset_index()
df1 = df.pop('a').str.split('|', expand=True)
df1.columns = ['out_' + str(x) for x in df1.columns]
df = pd.concat([df1, df],axis=1)
print (df)
out_0 out_1 b e k f \
0 out_GFD_994 out_GFD_Part2 NSE_FO_BHP_1703 361.45 -239.53 11900
1 out_GFD_994 out_GFD_Part2 NSE_FO_BHP_1704 399.98 204.06 26800
2 out_GFD_994 out_GFD_Part2 NSE_FO_MQG_1703 1177.97 -37.63 75300
3 out_GFD_994 out_GFD_Part2 NSE_FO_MQG_1704 1180.55 27.47 108000
4 out_GFD_994 out_GFD_Part2 NSE_FO_TLS_1703 1247.50 3.20 21800
5 out_GFD_994 out_GFD_Part2 NSE_FO_TLS_1704 1253.62 -1.33 19800
d g j c h
0 -13200 399.21 6922085.0 -27022 25100
1 13200 362.04 8166430.0 30132 13600
2 -11200 1174.50 60806165.0 -145173 86500
3 12300 1183.24 95523105.0 158747 95700
4 3200 1248.20 25714120.0 23316 18600
5 -3200 1253.31 27377970.0 -19333 23000
#content-inside {
width:100%;
max-width:inherit !important;
padding:0 !important;
}