我还是熊猫新手。是否可以在循环线路时启动并附加到Pandas数据帧?我的尝试如下,但它创建了一个包含1列而不是6列的数据框。将修改后的输入保存到csv文件然后用Pandas读取该csv文件会更容易吗?我现在可能会这样做。谢谢!
import requests
import pandas as pd
url = 'https://raw.githubusercontent.com/23andMe/yhaplo/master/input/isogg.2016.01.04.txt'
r = requests.get(url)
for i, line in enumerate(r.text.splitlines()):
l = line.strip().split('\t')
## The header is on the first line.
if i == 0:
df = pd.DataFrame([s.strip() for s in l])
## Lines with 6 columns.
elif len(l) == 6:
df = df.append(pd.DataFrame([s.strip() for s in l]))
## Lines with 7 columns.
elif len(l) == 7:
df = df.append(pd.DataFrame([l[i].strip() for i in (0, 2, 3, 4, 5, 6)]))
答案 0 :(得分:2)
您可以将整个文件作为csv流加载到Dataframe,而不必遍历每一行。
import requests
import pandas as pd
import csv
url = 'https://raw.githubusercontent.com/23andMe/yhaplo/master/input/isogg.2016.01.04.txt'
r = requests.get(url)
df = pd.DataFrame(list(csv.reader(r.text.splitlines(), delimiter='\t')))
<强>更新强>
现在应该可以了。
for i, line in enumerate(r.text.splitlines()):
l = line.strip().split('\t')
## The header is on the first line.
if i == 0:
df = pd.DataFrame(columns = [s.strip() for s in l])
## Lines with 6 columns.
elif len(l) == 6:
df = df.append(pd.DataFrame(columns=df.columns,data=[[s.strip() for s in l]]))
## Lines with 7 columns.
elif len(l) == 7:
df = df.append(pd.DataFrame(columns=df.columns, data=[[l[i].strip() for i in (0, 2, 3, 4, 5, 6)]]))
答案 1 :(得分:0)
受this answer的启发,我选择了这个解决方案:
import requests
import pandas as pd
url = 'https://raw.githubusercontent.com/23andMe/yhaplo/master/input/isogg.2016.01.04.txt'
r = requests.get(url)
table = []
for i, line in enumerate(r.text.splitlines()):
l = line.strip().split('\t')
## The first line is the header.
if i == 0:
table.append([s.strip() for s in l])
## Rows with 6 colums.
elif len(l) == 6:
table.append([s.strip() for s in l])
## Rows with 7 columns.
elif len(l) == 7:
table.append([l[i].strip() for i in (0, 2, 3, 4, 5, 6)])
## Skip rows with neither 6 nor 7 columns.
else:
pass
df = pd.DataFrame(table)