将数据帧写入.xlsx太慢

时间:2016-08-17 03:42:41

标签: pandas

我有一个40MB的数据框'dfScore'我写信给.xlsx。 enter image description here 代码如下,

 <?php

    require 'config.php';


    $form_type = $_POST['form_type'];

    if ($form_type == 'MCCV-F2'){

        $region = $_POST['region'];
        $province = $_POST['province'];
        $municipality = $_POST['municipality'];
        $barangay = $_POST['barangay'];
        $period = $_POST['period'];
        $form_type = $_POST['form_type'];

            echo "NON COMPLIANT IN EDUCATION<br>";
            echo "<br><br>MUNICIPALITY: ".$municipality;
            echo "<br><br>BARANGAY: ".$barangay;
            echo "<br><br>PERIOD: ".$period;
?>

        <form name="get_forms_f2" action="" method="post">
            <br><br>
            <center><table border = 1 style =2 width=1800>
            <tr>
                <td><center><b>Household ID </center></td>
                <td><center><b>Member ID </center></td>
                <td><center><b>Name</center></td>
                <td><center><b>Sex</center></td>
                <td><center><b>HH Status</center></td>
                <td><center><b>Grade Level </center></td>
                <td><center><b>School ID</center></td>
                <td><center><b>Name Of Dominant School</center></td>
                <td><center><b>CV Remarks</center></td>
                <td><center><b>Reason</center></td>
                <td><center><b>Other Reason</center></td>
                <td><center><b>Intervention</center></td>
            </tr>

            <?php   

                $sql = "SELECT A.family_id, A.barangay, A.person_id, A.gender, A.family_status, A.current_grade_level,
                A.school_facility_id, A.school_facility_name, A.municipality, CONCAT(B.last_name, ', ',B.first_name) as 'name',
                B.person_id,B.cv_remarks, B.reason, B.other_reason, B.intervention, B.status FROM roster AS A RIGHT JOIN compliance AS B ON A.person_id = B.person_id 
                WHERE B.period='$period' AND B.form_type='$form_type' AND A.municipality='$municipality' AND A.barangay='$barangay'";
                $query=$conn->prepare($sql);
                $query->execute();
                $result= $query->fetchALL(PDO::FETCH_ASSOC);

                $count=(int)$query->rowCount();

                    foreach ($result as $row){

                        $person_id[] = $row['person_id'];

                        echo "<tr>";
                        echo "<td>".$row['family_id']."</td>";
                        echo "<td>".$row['person_id']."</td>"; 
                        echo "<td>".$row['name']."</td>";
                        echo "<td>".$row['gender']."</td>";
                        echo "<td>".$row['family_status']."</td>";  
                        echo "<td>".$row['current_grade_level']."</td>";
                        echo "<td>".$row['school_facility_id']."</td>";
                        echo "<td>".$row['school_facility_name']."</td>";
                        echo "<td><input type='text' name='cv_remarks[]' value='".$row['cv_remarks']."'></td>";

                        echo "<td><select name='reason[]'>";

                                if (is_null($row['reason'])){

                                    $sql2= "SELECT reason_code, reason_desc FROM reasons WHERE form_type ='2' ORDER BY reason_code ASC";
                                    echo "<option value=''>SELECT REASON FOR Non-Compliance</option>";
                                    foreach($conn->query($sql2) as $row2){
                                        echo "<option value='".$row2['reason_desc']."'>".$row2['reason_code']." - ".$row2['reason_desc']."</option>";
                                        }
                                }

                                if (!is_null($row['reason'])){


                                    $sql2= "SELECT reason_code, reason_desc FROM reasons WHERE form_type ='2' ORDER BY reason_code ASC";
                                    echo "<option value='".$row['reason']."'>".$row['reason']." (SELECTED)"."</option>";
                                        foreach($conn->query($sql2) as $row2){
                                        echo "<option value='".$row2['reason_desc']."'>".$row2['reason_code']." - ".$row2['reason_desc']."</option>";
                                        }
                                }

                        echo "</select></td>";

                        echo "<td><input type='text' name='other_reason[]' value='".$row['other_reason']."'></td>";
                        echo "<td><input type='text' name='intervention[]' value='".$row['intervention']."'></td>"; 
                        echo "</tr>";
                    }

        }
            ?>
            </table></center><br><br>

            <input type="submit" name="submit" value="Save Data">

        <?php

        $sql3 = "UPDATE compliance SET reason='{$reason}' WHERE person_id='{$person_id}' AND form_type='$form_type' AND period='$period'";
                $query = $conn->prepare($sql3);
                $query->execute();
        ?>
        </form>

代码writer = pandas.ExcelWriter('test.xlsx', engine='xlsxwriter') dfScore.to_excel(writer,sheet_name='Sheet1') writer.save() 花费差不多一个小时,dfScore.to_excel需要一个小时。这是正常的吗?有不到10分钟的好方法吗?

我已经在stackoverflow中搜索了,但似乎有些建议没有解决我的问题。

2 个答案:

答案 0 :(得分:1)

为什么不把它保存为.csv? 我在个人笔记本电脑上使用过较重的DataFrames,写同xlsx也有同样的问题。

your_dataframe.to_csv('my_file.csv',encoding='utf-8',columns=list_of_dataframe_columns)

然后您可以使用MS Excel或在线转换器将其简单地转换为.xlsx。

答案 1 :(得分:0)

  

代码dfScore.to_excel花了将近一个小时,代码writer.save()需要一个小时。这是正常的吗?

听起来有点太高了。我运行了一个XlsxWriter测试,编写1,000,000行x 5列,耗时约100秒。时间将根据测试机器的CPU和内存而有所不同,但1小时慢了36倍,这似乎不对。

注意,Excel和XlsxWriter每个工作表仅支持1,048,576行,因此您实际上会丢弃3/4的数据并浪费时间去做。

  

有不到10分钟的好方法吗?

对于纯XlsxWriter程序pypy提供了很好的加速。例如,用pypy重新运行我的1,000,000行x 5列测试用例,时间从99.15秒到16.49秒。我不知道熊猫是否与pypy一起工作。