我正在尝试从CSV中读取,合并和追加大量内容。基础知识都正常工作。但是,我覆盖了我的结果集并且无法纠正它。
两个文件中的数据非常简单:
import pandas as pd
# Read CSVs
inventory_1 = pd.read_csv("file1.csv")
# Create new DF to hold the merge results
bucket = pd.DataFrame(columns=list("ABC"))
# Use chunk to read in the large file, merge and append the data
for chunk in pd.read_csv("file2.csv",chunksize=2):
chunk_merge = pd.merge(
inventory_1, chunk,
left_on=['A'],
right_on=['A'],
how='left')
result = bucket.append(chunk_merge)
print(result)
示例代码
# 1st Loop
A B B_x B_y C C_x C_y
0 1 NaN 2.0 2.0 NaN 1000.0 101.0
1 3 NaN 4.0 4.0 NaN 2000.0 102.0
2 9 NaN 10.0 NaN NaN 3000.0 NaN
# 2nd Loop
A B B_x B_y C C_x C_y
0 1 NaN 2.0 NaN NaN 1000.0 NaN
1 3 NaN 4.0 NaN NaN 2000.0 NaN
2 9 NaN 10.0 10.0 NaN 3000.0 103.0
合并将在块中的数据上正常工作,但结果中会覆盖先前的结果。所以,在上面的例子中,我得到:
A B_x C_x B_y C_y
0 1 2 1000 2 101
1 3 4 2000 4 102
2 9 10 3000 10 103
我需要的答案是:
componentWillMount() {
// run once before first render()
this.props.initializeApp();
this.dispatchActionOnExit = this.dispatchActionOnExit.bind(this);
}
componentDidMount() {
window.addEventListener("unload", this.dispatchActionOnExit());
}
dispatchActionOnExit() {
// dispatch the action when exit
}
componentWillUnmount() {
window.removeEventListener("unload", this.updateWindowDimensions);
}
我觉得答案是盯着我,但我看不到它。任何帮助,将不胜感激。
答案 0 :(得分:0)
正如我在评论中所说,覆盖问题来自于您在数据框上使用append
的方式,当您重新分配result
时数据会丢失。使用您提供的示例,您可以将chunk_merge附加到每个循环的列表中,然后使用pd.concat
。
inventory_1 = pd.read_csv("file1.csv")
list_to_concat = [] #empty list you will append with chunk_merge
for chunk in pd.read_csv("file2.csv",chunksize=2):
list_to_concat.append( pd.merge(
inventory_1, chunk,
on='A', #simple on as both column have the same name
how='inner')) # this will help for concat, if you want to keep left, then a dropna is necessary
result = pd.concat(list_to_concat) #add .dropna() if left above
根据您的数据,我人为地缩小了您的大数据集"进入2行的df和1行的另一行来重新创建这个想法,最后,我得到:
result
Out[366]:
A B_x C_x B_y C_y
0 1 2 101 2 1000
1 3 4 102 4 2000
0 9 10 103 10 3000
请注意,C_x和C_y是交换(B也是,但您不会看到您的数据),因为您首先在inventory_1
合并,但除此之外它是您想要的
答案 1 :(得分:0)
>>> df1=pd.DataFrame({'A': [1,3,9], 'B': [2,4,10], 'C': [101,102,103]})
>>> df2=pd.DataFrame({'A': [1,3,9], 'B': [2,4,10], 'C': [1000, 2000, 3000]})
>>>
>>> df2.merge(df1, on='A')
A B_x C_x B_y C_y
0 1 2 1000 2 101
1 3 4 2000 4 102
2 9 10 3000 10 103
>>>