熊猫:如何将行分成多列?

时间:2018-12-20 17:18:01

标签: pandas

我的文本格式数据就是这样

[{"title": "System and Method for Maskless Direct Write Lithography", "lang": "en", "year": 2015, "references": ["354c172f-d877-4e60-a7eb-c1b1cf03ce4d", "76cf1064-b2b2-4245-940b-4e25dab9d41d"], "abstract": "A system and method for maskless direct write lithography are disclosed. The method includes receiving a plurality of pixels that represent an integrated circuit (IC) layout; identifying a first subset of the pixels that are suitable for a first compression method; and identifying a second subset of the pixels that are suitable for a second compression method. The method further includes compressing the first and second subset using the first and second compression method respectively, resulting in compressed data. The method further includes delivering the compressed data to a maskless direct writer for manufacturing a substrate. In embodiments, the first compression method uses a run-length encoding and the second compression method uses a dictionary-based encoding. Due to the hybrid compression method, the compressed data can be decompressed with a data rate expansion ratio sufficient for high-volume IC manufacturing.", "url": ["http://www.freepatentsonline.com/y2016/0211117.html", "http://www.google.com/patents/US20160211117", "https://www.google.de/patents/US20160211117"], "id": "0000002e-c2f2-4e25-9341-60d39130ac7a", "fos": ["Electronic engineering", "Computer hardware", "Engineering", "Engineering drawing"]}]

我想像这样

title                   lang      year     id

System and Method for    eng       2015   0000002e-c2f2-4e25-9341-60d39130ac7a
Maskless Direct 
Write Lithography         

1 个答案:

答案 0 :(得分:0)

将您的JSON数据另存为字符串data

data = """
[{"title": "System and Method for Maskless Direct Write Lithography", "lang": "en", "year": 2015, "references": ["354c172f-d877-4e60-a7eb-c1b1cf03ce4d", "76cf1064-b2b2-4245-940b-4e25dab9d41d"], "abstract": "A system and method for maskless direct write lithography are disclosed. The method includes receiving a plurality of pixels that represent an integrated circuit (IC) layout; identifying a first subset of the pixels that are suitable for a first compression method; and identifying a second subset of the pixels that are suitable for a second compression method. The method further includes compressing the first and second subset using the first and second compression method respectively, resulting in compressed data. The method further includes delivering the compressed data to a maskless direct writer for manufacturing a substrate. In embodiments, the first compression method uses a run-length encoding and the second compression method uses a dictionary-based encoding. Due to the hybrid compression method, the compressed data can be decompressed with a data rate expansion ratio sufficient for high-volume IC manufacturing.", "url": ["http://www.freepatentsonline.com/y2016/0211117.html", "http://www.google.com/patents/US20160211117", "https://www.google.de/patents/US20160211117"], "id": "0000002e-c2f2-4e25-9341-60d39130ac7a", "fos": ["Electronic engineering", "Computer hardware", "Engineering", "Engineering drawing"]}]
"""

df = pd.read_json(data, orient='records')
df[['title', 'lang', 'year']]
# Output:
#                                                      title lang  year
# 0  System and Method for Maskless Direct Write Lithography  en   2015

如果您的数据位于文件中,则可以从文件中读取data

with open('file.txt') as f:
    data = f.read()