Question

我正在尝试合并两个巨大的数据帧（大约15GB），如下所示：

a_df：

func getBrowserURL(_ appName: String) -> String? {
        guard let scriptText = getScriptText(appName) else { return nil }
        var error: NSDictionary?
        guard let script = NSAppleScript(source: scriptText) else { return nil }

        guard let outputString = script.executeAndReturnError(&error).stringValue else {
            if let error = error {
                Logger.error("Get Browser URL request failed with error: \(error.description)")
            }
            return nil
        }

        // clean url output - remove protocol & unnecessary "www."
        if let url = URL(string: outputString),
            var host = url.host {
            if host.hasPrefix("www.") {
                host = String(host.dropFirst(4))
            }
            let resultURL = "\(host)\(url.path)"
            return resultURL
        }

        return nil
    }

    func getScriptText(_ appName: String) -> String? {
        switch appName {
        case "Google Chrome":
            return "tell app \"Google Chrome\" to get the url of the active tab of window 1"
        case "Safari":
            return "tell application \"Safari\" to return URL of front document"
        default:
            return nil
        }
    }

b_df：

            A       B       C       D
2018-10-16  a16     b16     c16     d16
2018-10-17  a17     b17     c17     d17
2018-10-18  a18     b18     c18     d18
2018-10-19  a19     b19     c19     d19
2018-10-20  a20     b20     c20     d20

获得最终数据框：

            A       C       Y       Z       
2018-10-21  a21     c21     y21     z21     
2018-10-22  a22     c22     y22     z22

我正在尝试该代码：

            A       B       C       D       Y       Z
2018-10-16  a16     b16     c16     d16
2018-10-17  a17     b17     c17     d17
2018-10-18  a18     b18     c18     d18
2018-10-19  a19     b19     c19     d19
2018-10-20  a20     b20     c20     d20
2018-10-21  a21             c21             y21     z21
2018-10-22  a22             c22             y22     z22

有没有更有效的方法？

Answer 1

作为所需的输出，dfs列表上的直pd.concat和fillna将产生输出

dfs = [a_df, b_df]
df_out = pd.concat(dfs, sort=True).fillna('')

Out[176]:
              A    B    C    D    Y    Z
2018-10-16  a16  b16  c16  d16
2018-10-17  a17  b17  c17  d17
2018-10-18  a18  b18  c18  d18
2018-10-19  a19  b19  c19  d19
2018-10-20  a20  b20  c20  d20
2018-10-21  a21       c21       y21  z21
2018-10-22  a22       c22       y22  z22

垂直合并具有不同列的数据框的更快方法

1 个答案: