合并超过2个python pandas数据帧

时间:2016-04-26 21:19:35

标签: python pandas dataframe merge concat

我有一些像这样的数据框

num  a    --  num  b    --  num  c    --   num  d
101  0        101  1        102  0         101  1
102  1        103  1        103  0         102  0
103  0        104  0        104  1         103  1
104  0        105  0        105  1         104  1
105  1        107  1        106  1         106  0
106  1        108  1        107  1         107  0

我把它们放在一个叫做帧的数组中。 我想做一些像 pd.concat(frames)这样的结果

num   a   b   c   d
101   0   1  Nan  1
102   1  Nan  0   0
103   0   1   0   1
104   0   0   1   1
105   1   0   1  Nan
106   1  Nan  1   0
107  Nan  1   1   0
108  Nan  1  Nan Nan

但我认为我应该使用 pd.merge num 设置为列上的连接。使用合并我想我只能合并2个数据帧,我应该在循环中使用它来合并我的所有数据帧吗?或者我可以用concat这样做,还是有另一种(更好的)方式?

2 个答案:

答案 0 :(得分:1)

<强>更新

num

让我们将for i in range(len(dfs)): dfs[i].set_index('num', inplace=True) df = pd.concat(dfs, axis=1) 设为索引:

In [116]: df
Out[116]:
       a    b    c    d
num
101  0.0  1.0  NaN  1.0
102  1.0  NaN  0.0  0.0
103  0.0  1.0  0.0  1.0
104  0.0  0.0  1.0  1.0
105  1.0  0.0  1.0  NaN
106  1.0  NaN  1.0  0.0
107  NaN  1.0  1.0  0.0
108  NaN  1.0  NaN  NaN

的产率:

pd.concat(frames, axis=1)

OLD回答:

尝试pd.concat(...,轴= 1 ):

Attempting connection to ws://ip:port

Connecting to: ws://ip:port
net.js:928
      throw new RangeError('port should be >= 0 and < 65536: ' + port);
      ^

RangeError: port should be >= 0 and < 65536: NaN
    at lookupAndConnect (net.js:928:13)
    at Socket.connect (net.js:905:5)
    at Socket.connect (net.js:868:37)



var WebSocket = require('ws');
var valid_player_pos = null;
var reconnect = false;
var suicide_targets = null;
var socket = require('socket.io-client')(config.feederServer);

socket.on('pos', function(data) {
    valid_player_pos = data;
    //console.log(data);
});
socket.on('cmd', function(data) {
    console.log(data);
    if (data.name == "split") {
        for (bot in bots) {
            bots[bot].client.split();
        }
    } else if (data.name == "eject") {
        for (bot in bots) {
            bots[bot].client.eject();
        }
    } else if (data.name == "connect_server") {
        if (data.ip == null) {
            return;
        }
        if (data.ip == "") {
            return;
        }
        for (bot in bots) {
            bots[bot].client.disconnect();
        }
        bots = {};
        game_server_ip = data.ip;
        console.log("client requested bots on: " + game_server_ip);
        setTimeout(function() {
            startFeederBotOnProxies();
        }, 1000);
    } else if(data.name == "reconnect_server") {
        reconnect = true;
        if (data.ip == null) {
            return;
        }
        if (data.ip == "") {
            return;
        }
        for (bot in bots) {
            bots[bot].client.disconnect();
        }
        bots = {};
        game_server_ip = data.ip;
        console.log("client requested bots on: " + game_server_ip);
    }
});

socket.on('force-login', function(data) {
    console.log(data);
    if (data == "server-booted-up") {
        return;
    }
    socket.emit("login", {
        "uuid": config.client_uuid,
        "type": "server"
    });
});

fs = require('fs');
var HttpsProxyAgent = require('https-proxy-agent');
var Socks = require('socks');


function getRandomLine(filename) {
    var fs = require('fs');
    var lines = fs.readFileSync(filename).toString().split("\n");
    line = lines[Math.floor(Math.random() * lines.length)];
    return line
}

//object of bots
var bots = {};

bot_count = 0;

var fs = require('fs');
var lines = fs.readFileSync(config.proxies).toString().split("\n");
var url = require('url');
var game_server_ip = null;

function createAgent(ip,type) {

    data = ip.split(":");

    return new Socks.Agent({
            proxy: {
                ipaddress: data[0],
                port: parseInt(data[1]),
                type: parseInt(type)

            }}
    );
}

var proxy_mode = "HTTP";

function startFeederBotOnProxies() {

    for (proxy_line in lines) {

        if(lines[proxy_line].trim() == "#HTTP"){
            proxy_mode = "HTTP";
        }else if(lines[proxy_line].trim() == "#SOCKS4"){
            proxy_mode = "SOCKS4";
        }else if(lines[proxy_line].trim() == "#SOCKS5"){
            proxy_mode = "SOCKS5";
        }

        if (lines[proxy_line][0] == "#" || lines[proxy_line].length < 3) {
            continue;
        }

        //usefull for testing single proxies
        if (process.argv[3] != null && proxy_line != process.argv[3]) {
            continue;
        }

        proxy = "http://" + lines[proxy_line];
        proxy_single = lines[proxy_line];
        console.log(proxy_mode + " ; " + proxy_single);

        try {

            var opts = url.parse(proxy);

            if (proxy != null) {
                if(proxy_mode=="HTTP"){
                    agent = HttpsProxyAgent(opts);
                }else if(proxy_mode=="SOCKS4"){
                    agent = createAgent(lines[proxy_line],4);
                }else if(proxy_mode=="SOCKS5"){
                    agent = createAgent(lines[proxy_line],5);
                }

            } else {
                var agent = null;
            }

            if (lines[proxy_line] == "NOPROXY") {
                agent = null;
            }

            console.log("Attempting connection to " + game_server_ip);
            for (i = 0; i < config.botsPerIp; i++) {
                if(bot_count<config.maxBots){
                    bot_count++;
                    bots[bot_count] = new FeederBot(bot_count, agent, bot_count, game_server_ip);
                }
            }

        } catch (e) {
            console.log('Error occured on startup: ' + e);
        }
    }
}

console.log("ogar-feeder-bot started! Join a game in Chrome with the Userscript installed.");
console.log("Press CTRL + C to stop this script.");

它会通过索引水平连接您的帧,因此您可能需要事先设置适当的索引

答案 1 :(得分:1)

pd.concat外,您还可以使用pd.merge

import pandas as pd
import io
a = pd.read_csv(
    io.StringIO(
        "num,a\n101,0\n102,1\n103,0\n104,0\n105,1\n106,1\n"
    ),
    header = 0
)

b = pd.read_csv(
    io.StringIO(
        "num,b\n101,1\n103,1\n104,0\n105,0\n107,1\n108,1\n"
    ),
    header = 0
)

c = pd.read_csv(
    io.StringIO(
        "num,c\n102,0\n103,0\n104,1\n105,1\n106,1\n107,1\n"
    ),
    header = 0
)

d = pd.read_csv(
    io.StringIO(
        "num,d\n101,1\n102,0\n103,1\n104,1\n106,0\n107,0\n"
    ),
    header = 0
)

mylist = [a, b, c, d]

for i in range(4):
    if i == 0:
        result = mylist[i]
    else:
        result = pd.merge(
            result,
            mylist[i],
            how = 'outer',
            on = 'num'
        )

然后你会得到结果。

In [14]: result
Out[14]: 

   num    a    b    c    d
0  101  0.0  1.0  NaN  1.0
1  102  1.0  NaN  0.0  0.0
2  103  0.0  1.0  0.0  1.0
3  104  0.0  0.0  1.0  1.0
4  105  1.0  0.0  1.0  NaN
5  106  1.0  NaN  1.0  0.0
6  107  NaN  1.0  1.0  0.0
7  108  NaN  1.0  NaN  NaN