我正在尝试执行外部联接,但它不断出现错误,如下所示... 我也用 none_df = chunk.set_index('author')。join(both_authors.set_index,on ='author',how ='outer',lsuffix ='_ left',rsuffix ='_ right') 它没有给出输出列 [索引,作者,正文,subreddit,subreddit_id,得分] 但是在两个df中都不会产生列author_right 我的必填列都不是 [author,author_left,body,subreddit,subreddit_id,score,author_right]
chunk = chunk.astype(object)
chunk.author=chunk.author.astype(object)
chunk.info()
both_authors =both_authors.astype(object)
both_authors.info()
neither_df = chunk.join(both_authors, on='author', how='outer', lsuffix='_left', rsuffix='_right')
甚至我所有的数据类型都是对象,它再次给出错误
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 author 10000 non-null object
1 body 10000 non-null object
2 subreddit 10000 non-null object
3 subreddit_id 10000 non-null object
4 score 10000 non-null object
dtypes: object(5)
memory usage: 390.8+ KB
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10017 entries, 0 to 13410
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 author 10017 non-null object
dtypes: object(1)
memory usage: 156.5+ KB
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-18-108f6e06d14a> in <module>
30 both_authors =both_authors.astype(object)
31 both_authors.info()
---> 32 neither_df = chunk.join(both_authors, on='author', how='outer', lsuffix='_left', rsuffix='_right')
33 neither_df = neither_df[neither_df['author_right'].isnull()]
34 if neither_record_count < 10000 and not neither_df.empty:
c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
7207 """
7208 return self._join_compat(
-> 7209 other, on=on, how=how, lsuffix=lsuffix, rsuffix=rsuffix, sort=sort
7210 )
7211
c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort)
7230 right_index=True,
7231 suffixes=(lsuffix, rsuffix),
-> 7232 sort=sort,
7233 )
7234 else:
c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
84 copy=copy,
85 indicator=indicator,
---> 86 validate=validate,
87 )
88 return op.get_result()
c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\reshape\merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
629 # validate the merge keys dtypes. We may need to coerce
630 # to avoid incompat dtypes
--> 631 self._maybe_coerce_merge_keys()
632
633 # If argument passed to validate,
c:\users\nimal\appdata\local\programs\python\python36\lib\site-packages\pandas\core\reshape\merge.py in _maybe_coerce_merge_keys(self)
1144 inferred_right in string_types and inferred_left not in string_types
1145 ):
-> 1146 raise ValueError(msg)
1147
1148 # datetimelikes must match exactly
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
答案 0 :(得分:0)
您应该使用pd.concat([key1,key2], axis=1)