Question

我正在尝试执行外部联接，但它不断出现错误，如下所示... 我也用 none_df = chunk.set_index（'author'）。join（both_authors.set_index，on ='author'，how ='outer'，lsuffix ='_ left'，rsuffix ='_ right'）它没有给出输出列 [索引，作者，正文，subreddit，subreddit_id，得分] 但是在两个df中都不会产生列author_right 我的必填列都不是 [author，author_left，body，subreddit，subreddit_id，score，author_right]

   chunk = chunk.astype(object)
   chunk.author=chunk.author.astype(object)
   chunk.info()
   both_authors =both_authors.astype(object)
   both_authors.info()
   neither_df = chunk.join(both_authors, on='author', how='outer', lsuffix='_left', rsuffix='_right')

甚至我所有的数据类型都是对象，它再次给出错误

RangeIndex: 10000 entries, 0 to 9999
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   author        10000 non-null  object
 1   body          10000 non-null  object
 2   subreddit     10000 non-null  object
 3   subreddit_id  10000 non-null  object
 4   score         10000 non-null  object
dtypes: object(5)
memory usage: 390.8+ KB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10017 entries, 0 to 13410
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   author  10017 non-null  object
dtypes: object(1)
memory usage: 156.5+ KB





---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-108f6e06d14a> in <module>
     30     both_authors =both_authors.astype(object)
     31     both_authors.info()
---> 32     neither_df = chunk.join(both_authors, on='author', how='outer', lsuffix='_left', rsuffix='_right')
     33     neither_df = neither_df[neither_df['author_right'].isnull()]
     34     if neither_record_count < 10000 and not neither_df.empty:

c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
   7207         """
   7208         return self._join_compat(
-> 7209             other, on=on, how=how, lsuffix=lsuffix, rsuffix=rsuffix, sort=sort
   7210         )
   7211 

c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort)
   7230                 right_index=True,
   7231                 suffixes=(lsuffix, rsuffix),
-> 7232                 sort=sort,
   7233             )
   7234         else:

c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     84         copy=copy,
     85         indicator=indicator,
---> 86         validate=validate,
     87     )
     88     return op.get_result()

c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\reshape\merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    629         # validate the merge keys dtypes. We may need to coerce
    630         # to avoid incompat dtypes
--> 631         self._maybe_coerce_merge_keys()
    632 
    633         # If argument passed to validate,

c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\reshape\merge.py in _maybe_coerce_merge_keys(self)
   1144                     inferred_right in string_types and inferred_left not in string_types
   1145                 ):
-> 1146                     raise ValueError(msg)
   1147 
   1148             # datetimelikes must match exactly

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

Answer 1

您应该使用pd.concat([key1,key2], axis=1)

数据框加入熊猫Python

1 个答案: