给定的用数字序列替换字符串的程序应该用熊猫编写

时间:2018-11-02 08:09:00

标签: python pandas

大家好,我有一个程序读取csv文件,并将字符串替换为数字的后缀,它还有其他列,例如日期/时间,必须打印日期 仅对于所有操作,此程序运行良好,但我想在Pandas数据框中使用此程序,请有人可以将此代码用于Pandas的所有操作 我对熊猫的了解很少,我会很高兴您的。.谢谢

这是代码

with open(tempFile, 'r',encoding="utf8") as csvfile:
        # creating a csv reader object 
        reader = csv.DictReader(csvfile, delimiter=',')
    #     next(reader, None)

        '''We then restructure the data to be a set of keys with list of values {key_1: [], key_2: []}:'''        
        data = {}
        for row in reader:
    #         print(row)
            for header, value in row.items():
                try:
                    data[header].append(value)
                except KeyError:
                    data[header] = [value]

        '''Next we want to give each value in each list a unique identifier.'''            
        # Loop through all keys
        for key in data.keys():
            values = data[key]

            things = list(sorted(set(values), key=values.index))

            for i, x in enumerate(data[key]):
        if data[key][i] == "":
           data[key][i] = datetime.datetime.now().isoformat()

with open('ram5.csv', "w") as outfile:
        writer = csv.writer(outfile)
        # Write headers
        writer.writerow(data.keys())
        # Make one row equal to one value from each list
        rows = zip(*data.values())
        # Write rows
        writer.writerows(rows)

这是输入数据:

job_Id      Name        Address     Email            Date/Time
1        snehil singh   marathalli  ss@gmail.com     12/10/2011:02:03:20
2        salman         marathalli  ss@gmail.com     12/11/2011:03:10:20
3        Amir           HSR         ar@gmail.com    
4        Rakhesh        HSR         rakesh@gmail.com 09/12/2010:02:03:55
5        Ram            marathalli  r@gmail.com 
6        Shyam          BTM         ss@gmail.com     12/11/2012:01:03:20
7        salman         HSR         ss@gmail.com    
8        Amir           BTM         ar@gmail.com     07/10/2013:04:02:30
9        snehil singh   Majestic    sne@gmail.com    03/03/2018:02:03:20 

这是所需的输出:

job_Id  Name    Address Email   Date/Time

1      1       1       1    12/10/2011

2      2       1       1    12/11/2011

3      3       2       2    11/02/2018

4      4       2       3    09/12/2010

5      5       1       4    11/02/2018

6      6       3       1    12/11/2012

7      2       2       1    11/02/2018

8      3       3       2    07/10/2013

9      1       4       5    03/03/2018

注意:将空的日期/时间列替换为当前日期 ...因此,在此程序中,我得到了所有所需的数据正确,并且上面的输出是该程序的输出我已经写了。 但要使用Pandas数据框编写整个程序。.请帮助人员提供任何帮助。.thnx

1 个答案:

答案 0 :(得分:4)

splitstr[0]一起用于选择第一个列表,并替换为Timestamp.strftime转换为字符串的日期时间:

now = pd.datetime.now().strftime('%d/%m/%Y')
df['Date/Time'] = df['Date/Time'].str.split(':').str[0].fillna(now)

替代方法是转换列to_datetime,将丢失的值替换为现在,最后用Series.dt.strftime将其转换为字符串:

df['Date/Time'] = (pd.to_datetime(df['Date/Time'], format='%d/%m/%Y:%H:%M:%S')
                     .fillna(pd.datetime.now())
                     .dt.strftime('%d/%m/%Y'))

然后将factorizeapply一起用于处理多列:

cols = ['Name','Address','Email']
df[cols] = df[cols].apply(lambda x: pd.factorize(x)[0] + 1)
print (df)
   job_Id  Name  Address  Email   Date/Time
0       1     1        1      1  12/10/2011
1       2     2        1      1  12/11/2011
2       3     3        2      2  02/11/2018
3       4     4        2      3  09/12/2010
4       5     5        1      4  02/11/2018
5       6     6        3      1  12/11/2012
6       7     2        2      1  02/11/2018
7       8     3        3      2  07/10/2013
8       9     1        4      5  03/03/2018