我有一个输入文件,如下所示,需要按顺序排列,键值需要按升序排列,而不存在的键需要在最后输出。 我正在以所需格式排列数据,但缺少订单。
我尝试过使用sort()方法但是它显示“list没有属性排序”。 请提出解决方案,并建议是否需要进行任何修改。
输入文件:
3=1388|4=1388|5=IBM|8=157.75|9=88929|1021=1500|854=n|388=157.75|394=157.75|474=157.75|1584=88929|444=20160713|459=93000546718000|461=7|55=93000552181000|22=89020|400=157.75|361=0.73|981=0|16=1468416600.6006|18=1468416600.6006|362=0.46
3=1388|4=1388|5=IBM|8=157.73|9=100|1021=0|854=p|394=157.73|474=157.749977558|1584=89029|444=20160713|459=93001362639104|461=26142|55=93001362849000|22=89120|361=0.71|981=0|16=1468416601.372|18=1468416601.372|362=0.45
3=1388|4=1388|5=IBM|8=157.69|9=100|1021=600|854=p|394=157.69|474=157.749910415|1584=89129|444=20160713|459=93004178882560|461=27052|55=93004179085000|22=89328|361=0.67|981=1|16=1468416604.1916|18=1468416604.1916|362=0.43
我试过的代码:
import pandas as pd
import numpy as np
df = pd.read_csv('inputfile', index_col=None, names=['text'])
s = df.text.str.split('|')
ds = [dict(w.split('=', 1) for w in x) for x in s]
p = pd.DataFrame.from_records(ds)
p1 = p.replace(np.nan,'n/a', regex=True)
st = p1.stack(level=0,dropna=False)
dfs = [g for i,g in st.groupby(level=0)]
#print st
i = 0
while i < len(dfs):
#index of each column
print ('\nindex[%d]'%i)
for (_,k),v in dfs[i].iteritems():
print k,'\t',v
i = i + 1
输出获取:
index[0]
1021 1500
1584 88929
16 1468416600.6006
18 1468416600.6006
22 89020
3 1388
361 0.73
362 0.46
388 157.75
394 157.75
4 1388
400 157.75
444 20160713
459 93000546718000
461 7
474 157.75
5 IBM
55 93000552181000
8 157.75
854 n
9 88929
981 0
index[1]
1021 0
1584 89029
16 1468416601.372
18 1468416601.372
22 89120
3 1388
361 0.71
362 0.45
388 n/a
394 157.73
4 1388
400 n/a
444 20160713
459 93001362639104
461 26142
474 157.749977558
5 IBM
55 93001362849000
8 157.73
854 p
9 100
981 0
预期产出:
index[0]
3 1388
4 1388
5 IBM
8 157.75
9 88929
16 1468416600.6006
18 1468416600.6006
22 89020
55 93000552181000
361 0.73
362 0.46
388 157.75
394 157.75
400 157.75
444 20160713
459 93000546718000
461 7
474 157.75
854 n
981 0
1021 1500
1584 88929
index[1]
3 1388
4 1388
5 IBM
8 157.75
9 88929
16 1468416600.6006
18 1468416600.6006
22 89020
55 93000552181000
361 0.73
362 0.46
394 157.75
444 20160713
459 93000546718000
461 7
474 157.75
854 n
981 0
1021 1500
1584 88929
388 n/a
400 n/a
答案 0 :(得分:0)
下面:
import pandas as pd
import numpy as np
df = pd.read_csv('inputfile', index_col=None, names=['text'])
s = df.text.str.split('|')
ds = [dict(w.split('=', 1) for w in x) for x in s]
p1 = pd.DataFrame.from_records(ds).fillna('n/a')
st = p1.stack(level=0,dropna=False)
for k, v in st.groupby(level=0):
print(k, v.sort_index())
答案 1 :(得分:0)
用
替换你的ds行ds = [{int(pair[0]): pair[1] for pair in [w.split('=', 1) for w in x]} for x in s]
将索引转换为整数,以便按数字排序
要在结尾输出n / a值,您可以使用pandas选择首先输出nonnull值,然后输出空值,例如:
for (ix, series) in p.iterrows():
print('\nindex[%d]' % ix)
output_series(ix, series[pd.notnull])
output_series(ix, series[pd.isnull].fillna('n/a'))
不过,您还可以简化堆栈,分组,打印到:
for (ix, series) in p1.iterrows():
print('\nindex[%d]' % ix)
for tag, value in series.iteritems():
print(tag, '\t', value)
所以整个脚本变成了:
def output_series(ix, series):
for tag, value in series.iteritems():
print(tag, '\t', value)
df = pd.read_csv('inputfile', index_col=None, names=['text'])
s = df.text.str.split('|')
ds = [{int(pair[0]): pair[1] for pair in [w.split('=', 1) for w in x]} for x in s]
p = pd.DataFrame.from_records(ds)
for (ix, series) in p.iterrows():
print('\nindex[%d]' % ix)
output_series(ix, series[pd.notnull])
output_series(ix, series[pd.isnull].fillna('n/a'))