Question

我还是熊猫新手。是否可以在循环线路时启动并附加到Pandas数据帧？我的尝试如下，但它创建了一个包含1列而不是6列的数据框。将修改后的输入保存到csv文件然后用Pandas读取该csv文件会更容易吗？我现在可能会这样做。谢谢！

import requests
import pandas as pd

url = 'https://raw.githubusercontent.com/23andMe/yhaplo/master/input/isogg.2016.01.04.txt'
r = requests.get(url)
for i, line in enumerate(r.text.splitlines()):
    l = line.strip().split('\t')
    ## The header is on the first line.
    if i == 0:
        df = pd.DataFrame([s.strip() for s in l])
    ## Lines with 6 columns.
    elif len(l) == 6:
        df = df.append(pd.DataFrame([s.strip() for s in l]))
    ## Lines with 7 columns.
    elif len(l) == 7:
        df = df.append(pd.DataFrame([l[i].strip() for i in (0, 2, 3, 4, 5, 6)]))

Answer 1

您可以将整个文件作为csv流加载到Dataframe，而不必遍历每一行。

import requests
import pandas as pd
import csv

url = 'https://raw.githubusercontent.com/23andMe/yhaplo/master/input/isogg.2016.01.04.txt'
r = requests.get(url)
df = pd.DataFrame(list(csv.reader(r.text.splitlines(), delimiter='\t')))

<强>更新

现在应该可以了。

for i, line in enumerate(r.text.splitlines()):
    l = line.strip().split('\t')
    ## The header is on the first line.
    if i == 0:
        df = pd.DataFrame(columns = [s.strip() for s in l])
    ## Lines with 6 columns.
    elif len(l) == 6:
        df = df.append(pd.DataFrame(columns=df.columns,data=[[s.strip() for s in l]]))
    ## Lines with 7 columns.
    elif len(l) == 7:
        df = df.append(pd.DataFrame(columns=df.columns, data=[[l[i].strip() for i in (0, 2, 3, 4, 5, 6)]]))

Answer 2

受this answer的启发，我选择了这个解决方案：

import requests
import pandas as pd

url = 'https://raw.githubusercontent.com/23andMe/yhaplo/master/input/isogg.2016.01.04.txt'
r = requests.get(url)
table = []
for i, line in enumerate(r.text.splitlines()):
    l = line.strip().split('\t')
    ## The first line is the header.
    if i == 0:
        table.append([s.strip() for s in l])
    ## Rows with 6 colums.
    elif len(l) == 6:
        table.append([s.strip() for s in l])
    ## Rows with 7 columns.
    elif len(l) == 7:
        table.append([l[i].strip() for i in (0, 2, 3, 4, 5, 6)])
    ## Skip rows with neither 6 nor 7 columns.
    else:
        pass
df = pd.DataFrame(table)

在循环线上时附加到Pandas数据帧？

2 个答案: