如何在循环过程中将df附加到另一个df

时间:2019-04-16 07:06:41

标签: python python-3.x pandas

此代码获取数据,并将数据放入一个循环,直到循环完成为止。

因此,我需要将数据附加到在每个过程完成后存储数据的df中

代码:

a = "SELECT id FROM USER WHERE time >'2018-03-01'"
dataa = pd.read_sql_query(a, con=engine)
print(dataa)

for userid in dataa:
   x=f"SELECT idbody FROM col1 WHERE user_id='{userid}'"
   data = pd.read_sql_query(x,con = engine)

所以这里要处理的数据和每次生成的数据都是不同的,需要将数据附加到存储所有已处理数据的df中

4 个答案:

答案 0 :(得分:1)

我假设您获得相同数量的列,并且这些列具有相同的名称。 例如这是基本思想:

df = pd.DataFrame()  # this will hold your all data

df1 = pd.DataFrame([(1, 2, 3)], columns=['a', 'b', 'c'])  # 1st iteration data
df2 = pd.DataFrame([(11, 22, 33)], columns=['a', 'b', 'c'])  # 2nd iteration data
df3 = pd.DataFrame([(111, 222, 333)], columns=['a', 'b', 'c'])  # 3rd iteratin data etc.

for data in [df1, df2, df3]:
    df = df.append(df1)

     a    b    c
0    1    2    3
1   11   22   33
2  111  222  333

您需要做的是:

a = "SELECT id FROM USER WHERE time >'2018-03-01'"
dataa = pd.read_sql_query(a, con=engine)
print(dataa)

df_all = pd.DataFrame()  # create an empty df to store all returns
for userid in dataa:
    x=f"SELECT idbody FROM col1 WHERE user_id='{userid}'"
    data = pd.read_sql_query(x,con = engine)
    df_all = df_all.append(data)  # update df with new dframes

答案 1 :(得分:1)

您也可以使用concat

a = "SELECT id FROM USER WHERE time >'2018-03-01'"
dataa = pd.read_sql_query(a, con=engine)
print(dataa)

df = pd.DataFrame()
for userid in dataa:
    x=f"SELECT idbody FROM col1 WHERE user_id='{userid}'"
    data = pd.read_sql_query(x,con = engine)
    df = pd.concat([df_all, data])

现在:

print(df)

将是所需的输出。

答案 2 :(得分:1)

循环或按列表理解将值追加到list,并且仅使用concat一次:

a = "SELECT id FROM USER WHERE time >'2018-03-01'"
dataa = pd.read_sql_query(a, con=engine)

dfs = []
for userid in dataa:
    x=f"SELECT idbody FROM col1 WHERE user_id='{userid}'"
    data = pd.read_sql_query(x,con = engine)
    dfs.append(data)

df = pd.concat(dfs, ignore_index=True)

dfs = [pd.read_sql_query(f"SELECT idbody FROM col1 WHERE user_id='{userid}'",con = engine) 
       for userid in dataa]

df = pd.concat(dfs, ignore_index=True)

答案 3 :(得分:1)

另一种方法,而不是循环,为什么不将所有userid连接到一个字符串中,并使用SQL IN语句对数据库进行一次调用:

a = "SELECT id FROM USER WHERE time >'2018-03-01'"
dataa = pd.read_sql_query(a, con=engine)

userids = ', '.join([f'"{x}"' for x in dataa['id'].astype(str).values])
x = f"SELECT idbody FROM col1 WHERE user_id IN ({userids})"

data = pd.read_sql_query(x,con = engine)

示例

dataa = pd.DataFrame({'id': ['123', '124', '125', '126']})

userids = ', '.join([f'"{x}"' for x in dataa['id'].astype(str).values])
x = f"SELECT idbody FROM col1 WHERE user_id IN ({userids})"
print(x)

[出]

# SELECT idbody FROM col1 WHERE user_id IN ("123", "124", "125", "126")