我正在尝试重塑以前融化的熊猫数据框。问题是在var_name部分中重复了我想作为列的名称。
这是当前的示例:
+-----------------+-----------+---------------+-----+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+
| Duration_survey | Q1_gender | Q2_age | ….. | Valdidation | categories | judgements |
+-----------------+-----------+---------------+-----+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+
| 657 | Male | Older than 40 | | En la variedad hay placer. | Timing - First Click | 12.085 |
| 480 | Male | 31-40 | | en la variedad esta el placer | Timing - First Click | 10.777 |
| 657 | Male | Older than 40 | | En la variedad hay placer. | Timing - Last Click | 12.085 |
| 480 | Male | 31-40 | | en la variedad esta el placer | Timing - Last Click | 10.777 |
| 657 | Male | Older than 40 | | En la variedad hay placer. | Timing - Page Submit | 12.899 |
| 480 | Male | 31-40 | | en la variedad esta el placer | Timing - Page Submit | 11.906 |
| 657 | Male | Older than 40 | | En la variedad hay placer. | Timing - Click Count | 1 |
| 480 | Male | 31-40 | | en la variedad esta el placer | Timing - Click Count | 1 |
| 657 | Male | Older than 40 | | En la variedad hay placer. | Anyways, despite the urgency it’s fraught. Just check out media #twitter’s reaction to the ambiguity around who gets to spend this money. #cdnmedia #journalism | 8 |
| 480 | Male | 31-40 | | en la variedad esta el placer | Anyways, despite the urgency it’s fraught. Just check out media #twitter’s reaction to the ambiguity around who gets to spend this money. #cdnmedia #journalism | 7 |
+-----------------+-----------+---------------+-----+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+
*请注意:这是简化版本-如下面的代码所示,还有更多列。
这是我根据先前的示例最后想要得到的:
+-----------------+-----------+---------------+-----+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------------------+----------------------+----------------------+------------+
| Duration_survey | Q1_gender | Q2_age | ….. | Valdidation | categories | Timing - First Click | Timing - Last Click | Timing - Page Submit | Timing - Click Count | judgements |
+-----------------+-----------+---------------+-----+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------------------+----------------------+----------------------+------------+
| 657 | Male | Older than 40 | | En la variedad hay placer. | Anyways, despite the urgency it’s fraught. Just check out media #twitter’s reaction to the ambiguity around who gets to spend this money. #cdnmedia #journalism | 12.085 | 12.085 | 12.899 | 1 | 8 |
| 480 | Male | 31-40 | | en la variedad esta el placer | Anyways, despite the urgency it’s fraught. Just check out media #twitter’s reaction to the ambiguity around who gets to spend this money. #cdnmedia #journalism | 10.777 | 10.777 | 11.906 | 1 | 7 |
+-----------------+-----------+---------------+-----+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------------------+----------------------+----------------------+------------+
*“类别”列中的推文数量也超过1000。
我认为我可以使用数据透视图首先将所有tweet用作列,但是Timing-First Click,Timing-Last Click,Timing-Page Submit,Timing-Click Count会变成我想要的形状,然后我将它们融化在将这4个与Timing相关的列指定为id_vars时再次向下移动以保持其形状。但是我什至都没有-旋转不起作用:
#first melt
df_clean = pd.melt(df,
id_vars=['Duration_survey', 'Q1_gender', 'Q2_age', 'Q3_country', 'Q4_level_of_study',
'Q4_level_of_study_other',
'Q5_native_lang', 'Q6_second_lang', 'Q7_english_test', 'Q8_english_test_name',
'Q8_english_test_name_other', 'Q9_english_test_time', 'Q10_english_test_result',
'Q11_IELTS_test_result',
'Q12_twitter_usage', 'Valdidation'], var_name='categories', value_name='judgements')
#clear for empty judgements
df_wo_na = df_clean.dropna(subset=['judgements'])
#pivot
df_p = df_wo_na.pivot(index=['Duration_survey', 'Q1_gender', 'Q2_age', 'Q3_country', 'Q4_level_of_study',
'Q4_level_of_study_other',
'Q5_native_lang', 'Q6_second_lang', 'Q7_english_test', 'Q8_english_test_name',
'Q8_english_test_name_other', 'Q9_english_test_time', 'Q10_english_test_result',
'Q11_IELTS_test_result',
'Q12_twitter_usage', 'Valdidation'], columns='categories', values='judgements')
所以这是我失败的地方,它使我出错。 ValueError:传递的值的长度为5202,索引表示16
有人知道如何解决吗?
提前谢谢