我正在尝试用熊猫来实现一些看似简单的东西,但是经过几次不合理的测试后我才陷入困境。
这就是事情。我有一个Dataframe(让我们称之为街道)只有两个系列:街道名称和与之相关的性别:
name gender
0 Abraham Lincoln Avenue undefined
1 Donald Trump Dead End undefined
2 Hillary Clinton Street undefined
...
1754 Ziggy Marley Boulevard undefined
另一方面,我有另一个Dataframe(我们称之为fnames),非常庞大。它有四个系列:
gender gender_detail main_gender first_name
0 F Female Female Aaf
1 F Female Female Aafke
2 F Female Female Aafkea
3 M Male Male Aafko
...
40211 F Female Female Zyta
所以就像你一定猜到的那样,我会用'first_name'系列fnames检查一下这些名字是否出现在街道的'name'系列中。
如果找到了第一个名字,我会在街道上更新“性别”系列,其中fnames'系列的相关值称为“性别”。如果没有,我让'undefined'。
显然,由于Dataframes的大小,我不能使用两个for循环......是否有任何快速解决方案可以实现这一目标?
例如,我是否应该创建一个只有名字作为键,性别作为值才能提高效率的词典?
PS:我不知道它是否可以简化问题,但我的两个Dataframe按字母顺序排序!
答案 0 :(得分:2)
是的,我认为您可以使用dict
name
whitespace
分割列str[0]
map
NaN
并print (df1)
name gender
0 Abraham Lincoln Avenue undefined
1 Donald Trump Dead End undefined
2 Hillary Clinton Street undefined
3 Aaf Street undefined
1754 Ziggy Marley Boulevard undefined
print (df2)
gender gender_detail main_gender first_name
0 F Female Female Aaf
1 F Female Female Aafke
2 F Female Female Aafkea
3 F Female Female Aafko
40211 F Female Female Zyta
选择第一个值},最后由split
替换d = df2.set_index('first_name')['gender'].to_dict()
print (d)
{'Zyta': 'F', 'Aaf': 'F', 'Aafkea': 'F', 'Aafke': 'F', 'Aafko': 'F'}
print (df1['name'].str.split().str[0])
0 Abraham
1 Donald
2 Hillary
3 Aaf
1754 Ziggy
Name: name, dtype: object
df1['gender'] = df1['name'].str.split().str[0].map(d).fillna('undefined')
print (df1)
name gender
0 Abraham Lincoln Avenue undefined
1 Donald Trump Dead End undefined
2 Hillary Clinton Street undefined
3 Aaf Street F
1754 Ziggy Marley Boulevard undefined
:
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Ansi, Pack = 1)]
public struct DM
{
[MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(AnsiNullTerminatedString))]
public string shader;
[MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(AnsiNullTerminatedString))]
public string texture;
public uint flags;
public float m_min_scale;
public float m_max_scale;
public uint num_vertices;
public uint num_indices;
[MarshalAs(UnmanagedType.ByValArray, ArraySubType = UnmanagedType.Struct, SizeParamIndex = 5)]
public DMVertex[] vb;
[MarshalAs(UnmanagedType.ByValArray, ArraySubType = UnmanagedType.U2, SizeParamIndex = 6)]
public ushort[] ib;
}
[StructLayout(LayoutKind.Sequential, Pack = 4)]
public struct DMVertex
{
public Vector3 point;
public Vector2 texcoord;
}
public static T MarshalStruct<T>(byte[] data) where T : struct
{
GCHandle handle = GCHandle.Alloc(data, GCHandleType.Pinned);
T temp = (T)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(T));
handle.Free();
return temp;
}