将数据串联的数据分成一列数据框

时间:2017-01-30 07:29:12

标签: python json pandas

我是python的初学者。我正在开发一个项目,我有以下模式的数据:

json文件中的数据如下所示:

“price_time”:[1398823200,1403154000,1403247600,1403301600,1403380800], “PRICE_VALUE”:[901,909,918,927,936],], “salesRank_value”:[2176,2318,2192,1801,1829]

df.head()命令如下所示:

>>> df.head() 
                                            1974-12-11 20:55:21
price_time    [1398823200, 1403154000, 1403247600, 140330160...
price_value   [901, 909, 918, 927, 936, 945, 954, 963, 972, ...
rating_time                                        [1475972640]
rating_value                                               [43]
review_count  [6558, 6560, 6561, 6562, 6564, 6566, 6568, 656...

df = pd.read_json('results.json')
In [] : print(df.head()) 
output : 
price_time       [1398823200, 1403154000, 1403247600, 140330160...
price_value      [901, 909, 918, 927, 936, 945, 954, 963, 972, ...
salesRank_value  [2176, 2318, 2192, 1801, 1829, 2207, 1757, 177...

我想将这些数据转换为以下模式:

price_time   price_value  salesRank_value
1398823200   901          2176
1403154000   909          2318
1403247600   918          2192

依旧...... 我写的代码在这里,但我无法得到理想的结果:

import pandas as pd


df1={}
df1['price_time'] = df.loc['price_time']
df1['price_value'] = df.loc['price_value']
print(df1)

output:
{'price_value': 1974-12-11 20:55:21    [901, 909, 918, 927, 936, 945, 954, 963, 972, ...
Name: price_value, dtype: object, 'price_time': 1974-12-11 20:55:21    [1398823200, 1403154000, 1403247600, 140330160...
Name: price_time, dtype: object}

2 个答案:

答案 0 :(得分:0)

price_time = [1398823200, 1403154000, 1403247600, 140330160]
price_value  =  [901, 909, 918, 927]
salesRank_value = [2176, 2318, 2192, 1801]

listdata = zip(price_time,price_value,salesRank_value)
print listdata

答案 1 :(得分:0)

我猜你在单个字符串中有数据(行由换行符区分)或在文件中然后你可以使用下面的一个字符串。 假设单个字符串变量data = df.head()中的数据如下所示:

'price_time       [1398823200, 1403154000, 1403247600]\nprice_value      [901, 909, 918]\nsalesRank_value  [2176, 2318, 2192]'

您可以使用以下内容获取所需的数组:

array=[a.split() for a in data.replace("[","").replace(",","").replace("]","").split('\n')]

输出(2D数组,每个内部数组包含每一行,第一个元素作为行名称并保留为数据):

[['price_time', '1398823200', '1403154000', '1403247600'], ['price_value', '901', '909', '918'], ['salesRank_value', '2176', '2318', '2192']]

如果您拥有文件data.txt中的数据,请执行以下操作:

price_time       [1398823200, 1403154000, 1403247600]
price_value      [901, 909, 918]
salesRank_value  [2176, 2318, 2192]

然后使用以下内容:

array=[line.replace("[","").replace(",","").replace("]","").split() for line in open('data.txt')]

再次输出二维数组中的输出:

[['price_time', '1398823200', '1403154000', '1403247600'], ['price_value', '901', '909', '918'], ['salesRank_value', '2176', '2318', '2192']]

对于您提供的json文件数据:

"price_time":[1398823200,1403154000,1403247600,1403301600,1403380800],"price_value":[901,909,918,927,936],"salesRank_value":[2176,2318,2192,1801,1829]

使用它而不需要pandas:

array=[b.split() for b in  open('data.json').read().replace('"',"").replace(":["," ").replace("],","\n").replace(","," ").replace("]","").split('\n')]
print array

(有一种更简洁的方法来删除非字母数字字符,但因为我需要格式化字符串,因为我想要使用它) 像早期的2D数组中的输出:

[['price_time', '1398823200', '1403154000', '1403247600', '1403301600', '1403380800'], ['price_value', '901', '909', '918', '927', '936'], ['salesRank_value', '2176', '2318', '2192', '1801', '1829']]

以表格形式查看结果:

for z in range(len(array[0])):
 temp=''
 for y in range(len(array)):
  temp+=array[y][z]+'\t'
 temp+='\n'
 print temp

输出:

price_time      price_value     salesRank_value

1398823200      901     2176

1403154000      909     2318

1403247600      918     2192

1403301600      927     1801

1403380800      936     1829

要获得更漂亮的输出,请使用:

s = [[str(e) for e in row] for row in array]
lens = [max(map(len, col)) for col in zip(*s)]
fmt = ' '.join('{{:{}}}'.format(x) for x in lens)
table = [fmt.format(*row) for row in s]
print '\n'.join(table)

输出:

price_time      1398823200 1403154000 1403247600 1403301600 1403380800
price_value     901        909        918        927        936
salesRank_value 2176       2318       2192       1801       1829