Question

我有2个数据框，它看起来像这样： DF1：

Product, Region, ProductScore
AAA, R1,100
AAA, R2,100
BBB, R2,200
BBB, R3,200

DF2：

Region, RegionScore
R1,1
R2,2

如何让这2个加入1个数据帧，结果如下：

Product, Region, ProductScore, RegionScore
AAA, R1,100,1
AAA, R2,100,2
BBB, R2,200,2

非常感谢！

EDIT1：

我使用了df.merge（df_new）得到此错误消息：

  File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 4071, in merge
    suffixes=suffixes, copy=copy)
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 37, in merge
    copy=copy)
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 183, in __init__
    self.join_names) = self._get_merge_keys()
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 318, in _get_merge_keys
    self._validate_specification()
  File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 409, in _validate_specification
    if not self.right.columns.is_unique:
AttributeError: 'list' object has no attribute 'is_unique'

EDIT2：我意识到我的df_new是一个数据系列（使用groupby创建）而不是数据帧。现在我已将其转换为数据框，这里是信息：打印（df.info（）） Int64Index：1111个条目，0到1110 数据列（共8列）： product 1111非null对象 reviewuserId 1111非null对象 reviewprofileName 1111非null对象 reviewelpful 881非null float64 reviewscore 1111非null float64 reviewtime 1111非null int64 reviewsummary 1111非null对象 reviewtext 1111非null对象 dtypes：float64（2），int64（1），object（5）内存使用量：56.4+ KB 无

print(df_new_2.info())

<class 'pandas.core.frame.DataFrame'>
Index: 1089 entries, A100Y8WSLFJN7Q to AZWBQPQN96SS6
Data columns (total 1 columns):
reviewelpfulnessbyuserid    864 non-null float64
dtypes: float64(1)
memory usage: 12.8+ KB
None

print(df.head())

      product    reviewuserId                         reviewprofileName  \
0  B003AI2VGA  A141HP4LYPWMSR          Brian E. Erland "Rainbow Sphinx"   
1  B003AI2VGA  A328S9RN3U5M68                                Grady Harp   
2  B003AI2VGA  A1I7QGUDP043DG                 Chrissy K. McVay "Writer"   
3  B003AI2VGA  A1M5405JH9THP9                              golgotha.gov   
4  B003AI2VGA   ATXL536YX71TR  KerrLines "&#34;MoviesMusicTheatre&#34;"   

   reviewelpfulness  reviewscore  reviewtime  \
0               1.0            3  1182729600   
1               1.0            3  1181952000   
2               0.8            5  1164844800   
3               1.0            3  1197158400   
4               1.0            3  1188345600   

                                       reviewsummary  \
0  There Is So Much Darkness Now ~ Come For The M...   
1  Worthwhile and Important Story Hampered by Poo...   
2                      This movie needed to be made.   
3                  distantly based on a real tragedy   
4  What's going on down in Juarez and shining a l...   

                                          reviewtext  
0  Synopsis: On the daily trek from Juarez Mexico...  
1  THE VIRGIN OF JUAREZ is based on true events s...  
2  The scenes in this film can be very disquietin...  
3  THE VIRGIN OF JUAREZ (2006)<br />directed by K...  
4  Informationally this SHOWTIME original is esse...

print(df_new_2.head())

                reviewelpfulnessbyuserid
reviewuserId                            
A100Y8WSLFJN7Q                       NaN
A103VZ3KDF2RT5                  0.555556
A1041HQGJDKFG5                  0.000000
A10FBJXMQPI0LL                  0.333333
A10LIHFA4SSK3F                  0.000000

现在错误消息如下所示：

  File "pandas\hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12245)
KeyError: 'reviewuserId'

打印完这些信息后，我只需添加以下内容即可修复问题： df_new_2 = df_new.to_frame().reset_index()

Answer 1

当您使用R3跳过该行时，您要求的是左侧合并，您只想执行内部merge：

In [120]:
df.merge(df1)

Out[120]:
  Product Region  ProductScore  RegionScore
0     AAA     R1           100            1
1     AAA     R2           100            2
2     BBB     R2           200            2

左合并会导致：

In [121]:
df.merge(df1, how='left')

Out[121]:
  Product Region  ProductScore  RegionScore
0     AAA     R1           100            1
1     AAA     R2           100            2
2     BBB     R2           200            2
3     BBB     R3           200          NaN

如何使用pandas进行左连接

1 个答案: