我有两个熊猫系列的文本列,我怎么能得到它们的交集?
print(df)
0 {this, is, good}
1 {this, is, not, good}
print(df1)
0 {this, is}
1 {good, bad}
我正在寻找类似下面的输出。
print(df2)
0 {this, is}
1 {good}
我已经尝试过了,但是它返回了
df.apply(lambda x: x.intersection(df1))
TypeError: unhashable type: 'set'
答案 0 :(得分:1)
这种方法对我有用
import pandas as pd
import numpy as np
data = np.array([{'this', 'is', 'good'},{'this', 'is', 'not', 'good'}])
data1 = np.array([{'this', 'is'},{'good', 'bad'}])
df = pd.Series(data)
df1 = pd.Series(data1)
df2 = pd.Series([df[i] & df1[i] for i in xrange(df.size)])
print(df2)
答案 1 :(得分:1)
看起来很简单:
s1 = pd.Series([{'this', 'is', 'good'}, {'this', 'is', 'not', 'good'}])
s2 = pd.Series([{'this', 'is'}, {'good', 'bad'}])
s1 - (s1 - s2)
#Out[122]:
#0 {this, is}
#1 {good}
#dtype: object
答案 2 :(得分:1)
我很感谢上面的回答。如果您有 DataFrame ,这是一个简单的示例,可以解决相同的问题(据我猜测,在调查了df
和df1
之类的变量名之后,您曾要求使用< strong> DataFrame 。)。
此df.apply(lambda row: row[0].intersection(df1.loc[row.name][0]), axis=1)
将执行此操作。让我们看看我如何到达解决方案。
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({
... "set": [{"this", "is", "good"}, {"this", "is", "not", "good"}]
... })
>>>
>>> df
set
0 {this, is, good}
1 {not, this, is, good}
>>>
>>> df1 = pd.DataFrame({
... "set": [{"this", "is"}, {"good", "bad"}]
... })
>>>
>>> df1
set
0 {this, is}
1 {bad, good}
>>>
>>> df.apply(lambda row: row[0].intersection(df1.loc[row.name][0]), axis=1)
0 {this, is}
1 {good}
dtype: object
>>>
>>> df.apply(lambda x: print(x.name), axis=1)
0
1
0 None
1 None
dtype: object
>>>
>>> df.loc[0]
set {this, is, good}
Name: 0, dtype: object
>>>
>>> df.apply(lambda row: print(row[0]), axis=1)
{'this', 'is', 'good'}
{'not', 'this', 'is', 'good'}
0 None
1 None
dtype: object
>>>
>>> df.apply(lambda row: print(type(row[0])), axis=1)
<class 'set'>
<class 'set'>
0 None
1 None
dtype: object
>>> df.apply(lambda row: print(type(row[0]), df1.loc[row.name]), axis=1)
<class 'set'> set {this, is}
Name: 0, dtype: object
<class 'set'> set {good}
Name: 1, dtype: object
0 None
1 None
dtype: object
>>> df.apply(lambda row: print(type(row[0]), type(df1.loc[row.name])), axis=1)
<class 'set'> <class 'pandas.core.series.Series'>
<class 'set'> <class 'pandas.core.series.Series'>
0 None
1 None
dtype: object
>>> df.apply(lambda row: print(type(row[0]), type(df1.loc[row.name][0])), axis=1)
<class 'set'> <class 'set'>
<class 'set'> <class 'set'>
0 None
1 None
dtype: object
>>>
答案 3 :(得分:0)
类似于上面,除非您要将所有内容保留在一个数据框中
Current df:
df = pd.DataFrame({0: np.array([{'this', 'is', 'good'},{'this', 'is', 'not', 'good'}]), 1: np.array([{'this', 'is'},{'good', 'bad'}])})
Intersection of series 0 & 1
df[2] = df.apply(lambda x: x[0] & x[1], axis=1)