Question

我已经使用以下代码完成了网页抓取：

Number = soup.find('th',text = "Number of samples").find_next_sibling("td").text


for x in range(1,int(Number)+1):            #loop of function to parse the data format I want
    item = item_text.split('tooltip')[x].split("class")[0].replace('"','').replace(',','').replace(':','').replace("<br>"," ").replace("/","").replace("\\","")
    #print(item) 

    TESTDATA=StringIO(item)

    df = pd.read_csv(TESTDATA, sep=" ",header=None) 
    print(df)

现在结果如下：

                0     1   2      3    4         5   6      7     8    9   \
0  TCGA-KK-A7B3-01A  Male NaN  Stage  not  reported NaN  Alive  FPKM  5.5  
       10    11   12    13      14
0  Living  days  899  (2.5  years)
               0     1    2      3    4         5   6      7     8     9   \
0  TCGA-G9-6347-01A  Male NaN  Stage  not  reported NaN  Alive  FPKM  14.2 
       10    11    12    13      14
0  Living  days  2089  (5.7  years) 
...

现在的问题是如何将这些独立的数据帧组合成一个数据帧，以便更容易保存到整个csv文件中？

谢谢

Answer 1

使用pd.concat

all_dataframes = []

for x in range(1,int(Number)+1):
    ....

    df = pd.read_csv(TESTDATA, sep=" ",header=None) 
    all_dataframes.append(df)

concat_df = pd.concat(all_dataframes)

如何在for循环中组合数据帧

1 个答案: