如何更新具有两列的sql表并比较python中具有两列的df?

时间:2019-01-30 10:24:53

标签: python mysql sql pandas dataframe

带有列的SQL表

$q = "UPDATE `gestion`.`utilisateurs` SET `np` = '$np', `adr` ='$adr', `numTel` = '$ntel', `cin` = '$cin', `pass` = '$pass', `idEquipe` = '$idEquipe', `adrElectronique` = '$mail', `role` = '$role' WHERE `utilisateurs`.`id` = '$id'";

必须使用数据框更新sql表列

具有带有值的数据框ABC

    0    1
0  789 | NA     
1  123 | NA
2  456 | NA

我在sql表中的输出应该是

    0          1
0  123   |  pass1
1  456   |  pass2
2  789   |  pass3

使用sql的建议也可以。

3 个答案:

答案 0 :(得分:3)

设置数据

首先以可复制的方式创建数据框:

import datetime as dt
import pandas as pd

# provided data
data = [('2019-08-23', '10'), ('2019-06-23', '18'),('2019-07-21', '05'),
    ('2019-09-09', '09'), ('2019-09-19', '04'), ('2019-08-27', '22'),
    ('2019-05-03', '02'), ('2019-06-27', '07'), ('2019-05-25', '19'),
    ('2019-04-27', '02'), ('2019-01-19', '02'), ('2019-05-28', '10'),
    ('2019-02-22', '09'), ('2019-01-25', '06'), ('2019-10-22', '17'),
    ('2019-11-02', '13'), ('2019-10-29', '17'), ('2019-03-11', '18'),
    ('2019-03-11', '19'), ('2019-10-19', '19'), ('2019-02-17', '12'),
    ('2019-10-21', '01'), ('2019-09-01', '08'), ('2019-01-15', '09'),
    ('2019-11-15', '08'), ('2019-10-10', '18'), ('2019-03-31', '01'),
    ('2019-08-17', '01'), ('2019-05-27', '07'), ('2019-02-24', '20'),
    ('2019-11-03', '21'), ('2019-06-28', '21'), ('2019-01-06', '00'),
    ('2019-03-30', '23'), ('2019-06-27', '04'), ('2019-03-08', '19'),
    ('2019-01-30', '09'), ('2019-11-15', '02'), ('2019-06-04', '09'),
    ('2019-05-03', '14'), ('2019-07-01', '08'), ('2019-09-20', '19'),
    ('2019-05-15', '12'), ('2019-05-17', '02'), ('2019-09-21', '20'),
    ('2019-02-14', '14')]

# create df
df = pd.DataFrame.from_records(data, columns=('date', 'amount'))

您似乎正在使用object数据类型-使用适当的数据类型,此操作会容易得多:

# convert dtypes
df['date'] = pd.to_datetime(df['date'], errors='coerce')
df['amount'] = df['amount'].astype('int')

为了可视化我们正在查看的内容,我对数据进行了排序,以使其更易于评估结果

df = df.sort_values(['date', 'amount']).reset_index(drop=True)
df.head()
    date    amount
0   2019-01-06  0
1   2019-01-15  9
2   2019-01-19  2
3   2019-01-25  6
4   2019-01-30  9

获取数据

推荐

获取数据帧的集合/列表/字典可能会有些混乱,因此您可能想考虑这是否是真正的要求。如果没有,您可以通过访问df['date'].dt通过多种方式切片来从单个数据帧中临时过滤:

# getting things in a certain month
mar_df = df[df['date'].dt.month == 3]  # only filtered on month
mar_df = df[(df['date'].dt.month == 3) & (df['date'].dt.year == 2019)]  # month & year

# getting values in a range of months
mar_jul_df = df[df['date'].dt.month.between(3, 7)]
mar_jul_df = df[(df['date'].dt.year == 2019) & (df['date'].dt.month.between(3, 7))]

# getting values between two dates
mar_jul_df = df[(df['date'] >= dt.datetime(2019, 3, 1)) & (df['date'] <= dt.datetime(2019, 7, 31))]

这样做,您将能够根据需要收集经过过滤的数据帧,并具有更多的控制力和更高的可读性。这不考虑您所需数据可能会在2018年12月开始到2019年4月结束的情况。

使用pd.date_range

获取日期范围可以使我们获得所需的上限和下限,或者以指定的频率获取日期范围,这使得此操作更加灵活。

# getting upper and lower bounds
>>> start_stop_date = pd.date_range(end=dt.datetime(2019, 8, 1), freq='5MS', periods=2)
>>> start_stop_date
DatetimeIndex(['2019-03-01', '2019-08-01'], dtype='datetime64[ns]', freq='5MS')

使用此功能,我们可以使用此列表过滤值

# setting two conditions -- on or after start & before end
mar_jul_df = df[(df['date'] >= start_stop_date[0]) & (df['date'] < start_stop_date[1])]
# modifying boundaries to exclude 2019-08-01
start_stop_date[1] = start_stop_date[1] - dt.timedelta(days=1)
mar_jul_df = df[df['date'].between(start_stop_date[0], start_stop_date[1])]

数据帧集

最简单的情况

如果您的解决方案需要返回五个单独的数据帧,则最简单的解决方案可能是对感兴趣月份使用列表推导方法如果您的数据范围始终在同一年

# list comprehension
df_list = [df[df['date'].dt.month == mo] for mo in range(3, 8)]

# returning individual dfs
mar_df, apr_df, may_df, jun_df, jul_df = iter(df_list)

现实案例

除了这种简单的情况之外,您还需要使用pd.date_range

# getting range of dates
>>> boundary_dates = pd.date_range(end=dt.datetime(2019, 8, 1), freq='MS', periods=6)
>>> boundary_dates
DatetimeIndex(['2019-03-01', '2019-04-01', '2019-05-01', '2019-06-01', '2019-07-01', '2019-08-01'],
              dtype='datetime64[ns]', freq='MS')

这为您提供了六个日期范围,以提供5组边界。您可以使用zip创建边界列表:

>>> [[l_bound, u_bound] for l_bound, u_bound in zip(boundary_dates, boundary_dates[1:])]
[[Timestamp('2019-03-01 00:00:00', freq='MS'), Timestamp('2019-04-01 00:00:00', freq='MS')],
 [Timestamp('2019-04-01 00:00:00', freq='MS'), Timestamp('2019-05-01 00:00:00', freq='MS')],
 [Timestamp('2019-05-01 00:00:00', freq='MS'), Timestamp('2019-06-01 00:00:00', freq='MS')],
 [Timestamp('2019-06-01 00:00:00', freq='MS'), Timestamp('2019-07-01 00:00:00', freq='MS')],
 [Timestamp('2019-07-01 00:00:00', freq='MS'), Timestamp('2019-08-01 00:00:00', freq='MS')]]

要利用pd.Series.between的优势,请再次减去dt.timedelta(days=1)

boundaries = [[l_bound, u_bound - dt.timedelta(days=1)] for
    l_bound, u_bound in zip(boundary_dates, boundary_dates[1:])]

df_list = [df[df['date'].between(b) for b in boundaries]
mar_df, apr_df, may_df, jun_df, jul_df = iter(df_list)

由于您需要动态的内容,因此您不必每次都为每个数据框指定名称。将其返回为字典可将数据帧分配给键(来自dt.datetime.strftime,以便可以更轻松地将其拔出:

df_dict = {b[0].strftime('%b_%y_df'): 
        {df[df['date'].between(b[0], b[1])] for b in boundaries}

由于每个值都包含一个数据框,因此您仍然可以使用df_dict.values()轻松访问各个数据框。

创建函数

要将这些步骤包装到一个函数中,该函数可让您灵活选择要查看的年份和月份以及要返回的月份数:

def monthly_dfs(df, year, month, n=5):
    """return a number of dataframes for the n months preceding a given month"""
    # generate list of boundaries for months of interest
    before_dt = dt.datetime(year, month, 1)
    boundary_dates = pd.date_range(end=before_dt, freq='MS', periods=n+1)
    # get boundary pairs
    boundaries = [[l_bound, u_bound - dt.timedelta(days=1)] for 
        l_bound, u_bound in zip(boundary_dates, boundary_dates[1:])]
    # return df within each boundary pair with key according to month start
    return {b[0].strftime('%b_%y_df'): 
        df[df['date'].between(b[0], b[1])] for b in boundaries}
df_dict = monthly_dfs(df, 2019, 8)
mar_df, apr_df, may_df, jun_df, jul_df = df_dict.values()

输出

重新格式化一下,这里是df_dict

{
    'Mar_19_df':
           date        amount
        9  2019-03-08      19
        10 2019-03-11      18
        11 2019-03-11      19
        12 2019-03-30      23
        13 2019-03-31       1,
    'Apr_19_df':
           date        amount
        14 2019-04-27       2,
    'May_19_df':
           date        amount
        15 2019-05-03       2
        16 2019-05-03      14
        17 2019-05-15      12
        18 2019-05-17       2
        19 2019-05-25      19
        20 2019-05-27       7
        21 2019-05-28      10,
    'Jun_19_df':
           date        amount
        22 2019-06-04       9
        23 2019-06-23      18
        24 2019-06-27       4
        25 2019-06-27       7
        26 2019-06-28      21,
    'Jul_19_df':
           date        amount
        27 2019-07-01       8
        28 2019-07-21       5
}

可以使用创建的密钥来访问这些密钥,例如:

>>>df_dict['Mar_19_df']
    date    amount
9   2019-03-08  19
10  2019-03-11  18
11  2019-03-11  19
12  2019-03-30  23
13  2019-03-31  1

答案 1 :(得分:2)

解决方案是首先列出月份和年份,因为2019年的第3个月可以有2019年的1,2个月和2018年的10、11、12个月,然后基于字符串匹配工作几个月。

year = 2019
month = 3
month_list=[]
year_list=[]
for i in range(5):
    if month-i-2<0:
        month_list.append((month-i-2)%12)
        year_list.append(year-1)
    else:
         month_list.append((month-i-2))
         year_list.append(year)

month_list =  ["%02d" % (x+1) for x in month_list]
month_names = ['jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec']
print(month_list)
dataframe_collection = {}

for i in range(5):
    ## filtering year
    df_temp = df[df['date'].str.contains(str(year_list[i]))]
    ## filtering month
    df_temp = df[df['date'].str.contains(str('-'+month_list[i]+'-'))]

    dataframe_collection[month_names[int(month_list[i])-1]]=df_temp

for i in dataframe_collection:
    print(i)
    print(dataframe_collection[i])

答案 2 :(得分:0)

您没有发布代码,所以我只能给您一个方向:

以pandas df_dbtable的形式获取表,将列0上的两个df结合在一起,以列 #include <stdio.h> #include <cs50.h> #include <string.h> typedef struct node { int val; char* name; struct node *next; } node_t; void addFirst(int value, char* word, node_t** nd) { //initialize new node, allocate space, set value node_t * tmp; tmp = malloc(sizeof(node_t)); tmp->val = value; strcpy(tmp->name, word); //let the new nodes next pointer point to the old head tmp->next = *nd; //Make tmp the head node *nd = tmp; } int findItem(int value,char* word, node_t *nd) { if(nd->val == value) return 0; while(nd->next != NULL) { if(nd->val == value && strcmp(word, nd->name) == 0) return 0; if(nd->next != NULL) nd = nd->next; } return -1; } int main (void) { node_t *head = malloc(sizeof(node_t)); head->val = 0; strcpy(head->name, ""); head->next = NULL; addFirst(15, "word", &head); addFirst(14,"word2", &head); printf("%i \n", findItem(15, "word", head)); } 创建新的df_new。截断sql表并插入新的df。

玩得开心。