I'm trying to convert wiki page table to dataframe. Headings are shifted to the
right, 'Launches' should be there were it is now 'Successes'.
我使用了skiprows选项,但它没有用。
df = pd.read_html(r'https://en.wikipedia.org/wiki/2018_in_spaceflight',skiprows=[1,2])[7]
df2 = df[df.columns[1:5]]
1 2 3 4
0 Launches Successes Failures Partial failures
1 India 1 1 0
2 Japan 3 3 0
3 New Zealand 1 1 0
4 Russia 3 3 0
5 United States 8 8 0
6 24 23 0 1
答案 0 :(得分:1)
问题是原始表的第一列中有合并的单元格。如果要完全解析它,则应编写解析器。暂时,您可以尝试:
df = pd.read_html(r'https://en.wikipedia.org/wiki/2018_in_spaceflight', header=0)[7]
df.columns = [""] + list(df.columns[:-1])
df.iloc[-1] = [""] + list(df.iloc[-1][:-1])