Question

I am trying to transform my data-frame from wide format to long format. I have seen many questions already posted here regarding this, but it is not quite what I am looking for / I do not see how to apply it to my problem.
The data-frames share some columns like Name, SharedVal etc. but also have columns the other dataset does not have.
What I want to achieve: Merge these two dataframes based on the UserId, but per UserID have as many rows as there are MeasureNo.
So if there have been two measurements for a user, there will be two rows with the same user id. And the rows have the same length, but some columns have different entries/no entry at all.

Example:

Dataset1:

UserID Name MeasureNo SharedVal1 SpecificVal1
     1 Anna         1         42            8
     2 Alex         1         28           50

and

Dataset2:

UserID Name MeasureNo SharedVal1 DifferentVal1
     1 Anna         2         15            99
     2 Alex         2         33            45

And they should be merged into:

UserID Name MeasureNo SharedVal1 SpecificVal1 DifferentVal1
     1 Anna         1         42            8             -
     1 Anna         2         15            -            99
     2 Alex         1         28           50             -
     2 Alex         2         33            -            45

and so on...

problem is, the dataset is huge and there are a lot of rows and columns, so I thought that somehow merging them on the id and than reshaping is the most generic approach. But I could not achieve the expected behaviour.

What I am trying to say programatically is: "Merge the two dataframes based on the userid and create a as much rows per userid as there are different times of measurement(MeasureNo). Both rows obviously have the same amount of columns. So im both rows, some values in certain columns cannot be filled.

Sorry I am new to SO and this was my best approach to visualizing a table with rows starting in a new line and the Key:Val representing a columing inside that row.

Answer 1

您可以进行外部联接：

new_df <- merge(df1, df2, all = T)

How to properly move from wide to long format with different/same variables

1 个答案: