如何从html系列(字符串)导入DataFrame?

时间:2019-11-28 00:31:31

标签: python string pandas dataframe

我有一段包含一系列代码的html代码。我已将字符串格式更改为使用熊猫系列所需的字符串格式

s = {"2014-12-31":["price":385000,"count":3],"2013-12-31":["price":380000,"count":2],"2010-12-31":["price":400000,"count":2],"2019-10-31":["price":null,"count":null]}

如何将其放入数据框?

我尝试过

df = pd.Series(s)

我什至尝试删除“ price”和“ count”:均未成功。当然,必须有一种简单的方法可以从字符串中导入系列,就好像它最初只是被定义为系列一样。我想念什么?

2 个答案:

答案 0 :(得分:0)

从这里...

s = '{"2014-12-31":["price":385000,"count":3],"2013-12-31":["price":380000,"count":2],"2010-12-31":["price":400000,"count":2],"2019-10-31":["price":null,"count":null]}'

如果我从数据中删除列标题

s = re.sub('"price":','',s)
s = re.sub('"count":','',s)

然后这可行...(需要导入json)

df = pd.DataFrame(json.loads(s))

这是结果数据框...

2014-12-31  2013-12-31  2010-12-31  2019-10-31
0   385000  380000  400000  None
1   3   2   2   None

还有

df.T

为此

    0   1
2014-12-31  385000  3
2013-12-31  380000  2
2010-12-31  400000  2
2019-10-31  None    None

答案 1 :(得分:0)

import pandas as pd

priceSeries = pd.Series([385000,380000,400000], index= ["2014-12-31","2013-12-31","2010-12-31"])
countSeries = pd.Series([3,2,2], index= ["2014-12-31","2013-12-31","2010-12-31"])

s = pd.DataFrame({"price": priceSeries,"count":countSeries})

s