我想从文件中读取数据到 DataFrame 中。但是此文件是一种特殊格式。包含如此多的行:
year = [1, 2, 3]
age = [4, 5, 6]
这是指向特殊文件的链接:https://github.com/cuongpiger/Py-for-ML-DS-DV/blob/master/Matplotlib/Chap6_data/dulieu_year_gap_pop_life.txt
答案 0 :(得分:2)
如果需要所有<cfoutput>
<cfdocument name="myBook" format="PDF">
<cfloop from="1" to="200" index="i">
<h1>"Gandhi"redirects here. For the third prime minister of India, see Indira Gandhi. For other uses, see Gandhi (disambiguation).
MahātmāMohandas Karamchand GandhiStudio photograph of Mohandas K. Gandhi, London, 1931.Born Mohandas Karamchand Gandhi2 October 1869Porbandar, Kathiawar Agency, British-ruled IndiaDied 30 January 1948 (aged 78)New Delhi, IndiaCause of death Assassination (gunshot)
Monuments Raj Ghat,Gandhi SmritiNationality IndianOther names Mahatma Gandhi, Bapu ji, Gandhi jiEducation Bachelor of LawsAlma mater University College London[1]Inner TempleOccupation LawyerPoliticianActivistWriterYears active 1893–1948Era British RajKnown for Indian Independence Movement,Nonviolent resistanceNotable work
</h1>
</cfloop>
</cfdocument>
</cfoutput>
<cfpdf action="write" source="myBook" destination="res.pdf" overwrite="yes" saveoption="linear">
值,请创建Series词典,然后将DataFrame
传递给DataFrame
构造函数以获取解析列表:
ast.literal_eval
仅使用2列:
import ast
d = {}
with open('dulieu_year_gap_pop_life.txt') as file:
splitted = file.readlines()
for x in splitted:
h, data = x.strip().split(' = ')
d[h] = pd.Series(ast.literal_eval(data))
df = pd.DataFrame(d)
print (df)
year pop gdp_cap life_exp life_exp1950
0 1950 2.53 974.580338 43.828 28.80
1 1951 2.57 5937.029526 76.423 55.23
2 1952 2.62 6223.367465 72.301 43.08
3 1953 2.67 4797.231267 42.731 30.02
4 1954 2.71 12779.379640 75.320 62.48
.. ... ... ... ... ...
146 2096 10.81 NaN NaN NaN
147 2097 10.82 NaN NaN NaN
148 2098 10.83 NaN NaN NaN
149 2099 10.84 NaN NaN NaN
150 2100 10.85 NaN NaN NaN
[151 rows x 5 columns]
答案 1 :(得分:1)
由于输入文件中列表的长度不同,因此不能将它们放在一个DataFrame
中。对于前两个长度相同的列表,以下方法将起作用:
import requests
url = 'https://raw.githubusercontent.com/cuongpiger/Py-for-ML-DS-DV/master/Matplotlib/Chap6_data/dulieu_year_gap_pop_life.txt'
response = requests.get(url)
a = response.content.decode('utf-8')
df = pd.DataFrame()
for i in a.splitlines()[:2]:
df[i.split()[0]] = [x.replace(']','').replace('[','').replace(',','') for x in i.split()[2:]]
df
Out:
year pop
0 1950 2.53
1 1951 2.57
2 1952 2.62
3 1953 2.67
4 1954 2.71
.. ... ...
146 2096 10.81
147 2097 10.82
148 2098 10.83
149 2099 10.84
150 2100 10.85
[151 rows x 2 columns]
答案 2 :(得分:0)
借助正则表达式:
import pandas as pd
import re
file = open('dulieu_year_gap_pop_life.txt','r')
# Empty Dataframe
df = pd.DataFrame()
for line in file.readlines():
group = re.match('(.*) = (.*)',line)
df[group[1]] = pd.Series(eval(group[2]))