熊猫中的广义级数分裂(一对多映射)

时间:2016-05-26 22:00:37

标签: python pandas

我想将pandas.Series单元格中包含的元组或字典拆分为两列。我知道有一个method for strings分割字符串单元格。

是否有一种通用的pandasese方法将pandas.Series拆分为包含两列或更多列的数据框?

示例:

b = {'B3GALT6': '{"a": 0, "b": 0}',
 'BC033949': '{"b": 2, "c": 0}',
 'C1orf159': '{"a": 3, "c": 1}',
 'ISG15': '{"a": 5, "b": 3}',
 'LOC643837': '{"b": 4, "a": 0}',
 'NOC2L': '{"a": 0, "c": 0}',
 'SDF4': '{"b": 0, "c": 0}',
 'TNFRSF18': '{"a": 0, "b": 0}',
 'TNFRSF4': '{"a": 0, "c": 0}',
 'WASH7P': '{"a": 0, "c": 0}'}
ds = pd.Series(list(b.values()), index = b.keys())
ds.map(json.loads).apply(lambda x: (x["a"] if "a" in x else None, x["b"] if "b" in x else None))

现在我想拆分元组并取消堆叠"a""b"的列。

2 个答案:

答案 0 :(得分:2)

如果您在应用中返回一个系列,则会将其拆分为列(返回一个DataFrame):

In [11]: ds.map(json.loads).apply(lambda x: pd.Series([x["a"] if "a" in x else None, x["b"] if "b" in x else None]))
Out[11]:
             0    1
TNFRSF18   0.0  0.0
SDF4       NaN  0.0
TNFRSF4    0.0  NaN
B3GALT6    0.0  0.0
C1orf159   3.0  NaN
BC033949   NaN  2.0
ISG15      5.0  3.0
WASH7P     0.0  NaN
NOC2L      0.0  NaN
LOC643837  0.0  4.0

注意:您应该查看read_json,以避免这种情况?

答案 1 :(得分:1)

基于@Andy Hayden的精致解决方案:

    using System;
    using System.ServiceModel.Web;

    private WebServiceHost webHost;

    public void Start()
    {
        webHost.Opening += ConfigureEnpointBinding;

        webHost.Open();
    }

    private void ConfigureEnpointBinding(object sender, EventArgs e)
    {
        var endpointBinding = (System.ServiceModel.WebHttpBinding)
            ((WebServiceHost)sender)
            .Description
            .Endpoints
            .Single(endpoint => endpoint.Contract.ContractType == typeof(IYourInterface))
            .Binding;

        endpointBinding.MaxReceivedMessageSize = int.MaxValue;
        endpointBinding.MaxBufferSize = int.MaxValue;
    }