透视DataFrame的问题-Python-ValueError:传递的值的长度为X索引表示Y

时间:2019-03-25 22:35:50

标签: python dataframe stack pivot melt

我正在尝试重塑以前融化的熊猫数据框。问题是在var_name部分中重复了我想作为列的名称。

这是当前的示例:

+-----------------+-----------+---------------+-----+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+
| Duration_survey | Q1_gender |    Q2_age     | ….. |          Valdidation          |                                                                           categories                                                                            | judgements |
+-----------------+-----------+---------------+-----+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+
|             657 | Male      | Older than 40 |     | En la variedad hay placer.    | Timing - First Click                                                                                                                                            |     12.085 |
|             480 | Male      | 31-40         |     | en la variedad esta el placer | Timing - First Click                                                                                                                                            |     10.777 |
|             657 | Male      | Older than 40 |     | En la variedad hay placer.    | Timing - Last Click                                                                                                                                             |     12.085 |
|             480 | Male      | 31-40         |     | en la variedad esta el placer | Timing - Last Click                                                                                                                                             |     10.777 |
|             657 | Male      | Older than 40 |     | En la variedad hay placer.    | Timing - Page Submit                                                                                                                                            |     12.899 |
|             480 | Male      | 31-40         |     | en la variedad esta el placer | Timing - Page Submit                                                                                                                                            |     11.906 |
|             657 | Male      | Older than 40 |     | En la variedad hay placer.    | Timing - Click Count                                                                                                                                            |          1 |
|             480 | Male      | 31-40         |     | en la variedad esta el placer | Timing - Click Count                                                                                                                                            |          1 |
|             657 | Male      | Older than 40 |     | En la variedad hay placer.    | Anyways, despite the urgency it’s fraught. Just check out media #twitter’s reaction to the ambiguity around who gets to spend this money. #cdnmedia #journalism |          8 |
|             480 | Male      | 31-40         |     | en la variedad esta el placer | Anyways, despite the urgency it’s fraught. Just check out media #twitter’s reaction to the ambiguity around who gets to spend this money. #cdnmedia #journalism |          7 |
+-----------------+-----------+---------------+-----+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+

*请注意:这是简化版本-如下面的代码所示,还有更多列。

这是我根据先前的示例最后想要得到的:

+-----------------+-----------+---------------+-----+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------------------+----------------------+----------------------+------------+
| Duration_survey | Q1_gender |    Q2_age     | ….. |          Valdidation          |                                                                           categories                                                                            | Timing - First Click | Timing - Last Click | Timing - Page Submit | Timing - Click Count | judgements |
+-----------------+-----------+---------------+-----+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------------------+----------------------+----------------------+------------+
|             657 | Male      | Older than 40 |     | En la variedad hay placer.    | Anyways, despite the urgency it’s fraught. Just check out media #twitter’s reaction to the ambiguity around who gets to spend this money. #cdnmedia #journalism |               12.085 |              12.085 |               12.899 |                    1 |          8 |
|             480 | Male      | 31-40         |     | en la variedad esta el placer | Anyways, despite the urgency it’s fraught. Just check out media #twitter’s reaction to the ambiguity around who gets to spend this money. #cdnmedia #journalism |               10.777 |              10.777 |               11.906 |                    1 |          7 |
+-----------------+-----------+---------------+-----+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------------------+----------------------+----------------------+------------+

*“类别”列中的推文数量也超过1000。

我认为我可以使用数据透视图首先将所有tweet用作列,但是Timing-First Click,Timing-Last Click,Timing-Page Submit,Timing-Click Count会变成我想要的形状,然后我将它们融化在将这4个与Timing相关的列指定为id_vars时再次向下移动以保持其形状。但是我什至都没有-旋转不起作用:

#first melt
    df_clean = pd.melt(df,
                       id_vars=['Duration_survey', 'Q1_gender', 'Q2_age', 'Q3_country', 'Q4_level_of_study',
                                'Q4_level_of_study_other',
                                'Q5_native_lang', 'Q6_second_lang', 'Q7_english_test', 'Q8_english_test_name',
                                'Q8_english_test_name_other', 'Q9_english_test_time', 'Q10_english_test_result',
                                'Q11_IELTS_test_result',
                                'Q12_twitter_usage', 'Valdidation'], var_name='categories', value_name='judgements')
#clear for empty judgements
    df_wo_na = df_clean.dropna(subset=['judgements'])

#pivot 
    df_p = df_wo_na.pivot(index=['Duration_survey', 'Q1_gender', 'Q2_age', 'Q3_country', 'Q4_level_of_study',
                                    'Q4_level_of_study_other',
                                    'Q5_native_lang', 'Q6_second_lang', 'Q7_english_test', 'Q8_english_test_name',
                                    'Q8_english_test_name_other', 'Q9_english_test_time', 'Q10_english_test_result',
                                    'Q11_IELTS_test_result',
                                    'Q12_twitter_usage', 'Valdidation'], columns='categories', values='judgements')

所以这是我失败的地方,它使我出错。 ValueError:传递的值的长度为5202,索引表示16

有人知道如何解决吗?

提前谢谢

0 个答案:

没有答案