Question

我可能使这个问题复杂化了，但是我似乎找不到简单的解决方案。

我有两个DataFrame。我们称它们为df1和df2。为了使事情简单。假设df1有一个列称为“某些数据”，而df2有两列称为“某些数据”和“其他数据”。

示例：

df1

Some Data "Lebron James 123" "Lebron James 234"

df2

some data                        other data
"Lebron James 123 + other text"  "I want this in df1["New?"]"
"Michael Jordan"                 "Doesn't Matter"

因此，基本上我想在df1中创建一个名为“ New？”的新列。如果df1 [“ Some data”]在df2 [“ Some other data”]中，则此新列（在df1中）将显示“ New”。但是，如果df2 [“ some data”]中没有实例，则将df1 [“ New？”]设置为df2 [“ other data”]中该特定行的值。

运行后所需的结果：

df1

Some Data                         New?
"Lebron James 123"  "I want this in df1["New?"]"
"Lebron James 234"               "New"

因此，您可以看到《 The New》吗？列将包含来自另一数据列的特定行的值。 Lebron James 234在df2的某些数据中并不存在，因此它是全新的。

我可以使用.isin()方法让它说是对还是错，但是不知道如何获取另一个df的索引并从另一个数据列获取值。

谢谢

编辑：

据我所知会起作用

df["New?"] = df1["Some Data"].isin(df2["some data"])

将渲染

df1 [“新建？”]

True
False

所以我希望True成为“我希望在df1 [“ New？”]]中，而False作为New

Answer 1

首先加入您的df1系列来创建正则表达式：

rgx = '|'.join(df1['some data'])

现在使用np.where：

df1.assign(data=np.where(df2['some data'].str.match(rgx), df2['other data'], 'New'))

          some data                        data
0  Lebron James 123  I want this in df1["New?"]
1  Lebron James 234                         New

形状不匹配的示例：

df1 = pd.DataFrame({'a': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'a': ['aaaaa', 'bbbb', 'ffff', 'gggg', 'hhhh']})

rgx = '({})'.format('|'.join(df1.a))
m = df2.assign(flag=df2.a.str.extract(rgx))

df1.set_index('a').join(m.set_index('flag')).fillna('New').reset_index()

  index      a
0     a  aaaaa
1     b   bbbb
2     c    New
3     d    New

Answer 2

根据您的信息，似乎您只需要一个简单的np.where（如果dfs的长度相同）

df1['New?'] = np.where(df1["Some Data"].isin(df2["some data"]), df2['other data'], 'New')

    Some Data                       New?
0   Lebron James 123 + other text   I want this in df1[New?"]"
1   Lebron James 234                New

对于不同的长度，

mask = df2["some data"].isin(df["Some Data"]).values
df.loc[mask,'New'] = df2.loc[mask, 'other data']

df.fillna('New')

说明

基本上，您有一个掩码，并且使用相同的掩码来筛选两个数据帧。给定说明，这会在两个dfs上产生相同数量的结果，并且您将df2中过滤的行的“其他数据”值分配给df“一些数据中的相同匹配行

如果一个数据框的行值在另一数据框的列中，则创建一个新列并获取该索引

2 个答案: