Question

我正在尝试从CSV中读取，合并和追加大量内容。基础知识都正常工作。但是，我覆盖了我的结果集并且无法纠正它。

两个文件中的数据非常简单：

import pandas as pd

# Read CSVs
inventory_1 = pd.read_csv("file1.csv")

# Create new DF to hold the merge results
bucket = pd.DataFrame(columns=list("ABC"))

# Use chunk to read in the large file, merge and append the data
for chunk in pd.read_csv("file2.csv",chunksize=2):
    chunk_merge = pd.merge(
        inventory_1, chunk,
        left_on=['A'],
        right_on=['A'],
        how='left')
    result = bucket.append(chunk_merge)
    print(result)

示例代码

# 1st Loop
  A    B   B_x  B_y    C     C_x    C_y
0  1  NaN   2.0  2.0  NaN  1000.0  101.0
1  3  NaN   4.0  4.0  NaN  2000.0  102.0
2  9  NaN  10.0  NaN  NaN  3000.0    NaN

# 2nd Loop
   A    B   B_x   B_y    C     C_x    C_y
0  1  NaN   2.0   NaN  NaN  1000.0    NaN
1  3  NaN   4.0   NaN  NaN  2000.0    NaN
2  9  NaN  10.0  10.0  NaN  3000.0  103.0

合并将在块中的数据上正常工作，但结果中会覆盖先前的结果。所以，在上面的例子中，我得到：

   A  B_x   C_x  B_y  C_y
0  1    2  1000    2  101
1  3    4  2000    4  102
2  9   10  3000   10  103

我需要的答案是：

componentWillMount() {
   // run once before first render()
   this.props.initializeApp();
   this.dispatchActionOnExit = this.dispatchActionOnExit.bind(this);
}  
componentDidMount() {
   window.addEventListener("unload", this.dispatchActionOnExit());
}
dispatchActionOnExit() {
   // dispatch the action when exit
}
componentWillUnmount() {
   window.removeEventListener("unload", this.updateWindowDimensions);
}

我觉得答案是盯着我，但我看不到它。任何帮助，将不胜感激。

Answer 1

正如我在评论中所说，覆盖问题来自于您在数据框上使用append的方式，当您重新分配result时数据会丢失。使用您提供的示例，您可以将chunk_merge附加到每个循环的列表中，然后使用pd.concat。

inventory_1 = pd.read_csv("file1.csv")
list_to_concat = [] #empty list you will append with chunk_merge 
for chunk in pd.read_csv("file2.csv",chunksize=2):
    list_to_concat.append( pd.merge(
        inventory_1, chunk,
        on='A', #simple on as both column have the same name
        how='inner')) # this will help for concat, if you want to keep left, then a dropna is necessary
result = pd.concat(list_to_concat) #add .dropna() if left above

根据您的数据，我人为地缩小了您的大数据集＆＃34;进入2行的df和1行的另一行来重新创建这个想法，最后，我得到：

result
Out[366]: 
   A  B_x  C_x  B_y   C_y
0  1    2  101    2  1000
1  3    4  102    4  2000
0  9   10  103   10  3000

请注意，C_x和C_y是交换（B也是，但您不会看到您的数据），因为您首先在inventory_1合并，但除此之外它是您想要的

Answer 2

>>> df1=pd.DataFrame({'A': [1,3,9], 'B': [2,4,10], 'C': [101,102,103]})
>>> df2=pd.DataFrame({'A': [1,3,9], 'B': [2,4,10], 'C': [1000, 2000, 3000]})
>>> 
>>> df2.merge(df1, on='A')
   A  B_x   C_x  B_y  C_y
0  1    2  1000    2  101
1  3    4  2000    4  102
2  9   10  3000   10  103
>>>

使用Python Pandas合并和附加数据

2 个答案: