大家好,我有一个程序读取csv文件,并将字符串替换为数字的后缀,它还有其他列,例如日期/时间,必须打印日期 仅对于所有操作,此程序运行良好,但我想在Pandas数据框中使用此程序,请有人可以将此代码用于Pandas的所有操作 我对熊猫的了解很少,我会很高兴您的。.谢谢
这是代码
with open(tempFile, 'r',encoding="utf8") as csvfile:
# creating a csv reader object
reader = csv.DictReader(csvfile, delimiter=',')
# next(reader, None)
'''We then restructure the data to be a set of keys with list of values {key_1: [], key_2: []}:'''
data = {}
for row in reader:
# print(row)
for header, value in row.items():
try:
data[header].append(value)
except KeyError:
data[header] = [value]
'''Next we want to give each value in each list a unique identifier.'''
# Loop through all keys
for key in data.keys():
values = data[key]
things = list(sorted(set(values), key=values.index))
for i, x in enumerate(data[key]):
if data[key][i] == "":
data[key][i] = datetime.datetime.now().isoformat()
with open('ram5.csv', "w") as outfile:
writer = csv.writer(outfile)
# Write headers
writer.writerow(data.keys())
# Make one row equal to one value from each list
rows = zip(*data.values())
# Write rows
writer.writerows(rows)
这是输入数据:
job_Id Name Address Email Date/Time
1 snehil singh marathalli ss@gmail.com 12/10/2011:02:03:20
2 salman marathalli ss@gmail.com 12/11/2011:03:10:20
3 Amir HSR ar@gmail.com
4 Rakhesh HSR rakesh@gmail.com 09/12/2010:02:03:55
5 Ram marathalli r@gmail.com
6 Shyam BTM ss@gmail.com 12/11/2012:01:03:20
7 salman HSR ss@gmail.com
8 Amir BTM ar@gmail.com 07/10/2013:04:02:30
9 snehil singh Majestic sne@gmail.com 03/03/2018:02:03:20
这是所需的输出:
job_Id Name Address Email Date/Time
1 1 1 1 12/10/2011
2 2 1 1 12/11/2011
3 3 2 2 11/02/2018
4 4 2 3 09/12/2010
5 5 1 4 11/02/2018
6 6 3 1 12/11/2012
7 2 2 1 11/02/2018
8 3 3 2 07/10/2013
9 1 4 5 03/03/2018
注意:将空的日期/时间列替换为当前日期 ...因此,在此程序中,我得到了所有所需的数据正确,并且上面的输出是该程序的输出我已经写了。 但要使用Pandas数据框编写整个程序。.请帮助人员提供任何帮助。.thnx
答案 0 :(得分:4)
将split
与str[0]
一起用于选择第一个列表,并替换为Timestamp.strftime
转换为字符串的日期时间:
now = pd.datetime.now().strftime('%d/%m/%Y')
df['Date/Time'] = df['Date/Time'].str.split(':').str[0].fillna(now)
替代方法是转换列to_datetime
,将丢失的值替换为现在,最后用Series.dt.strftime
将其转换为字符串:
df['Date/Time'] = (pd.to_datetime(df['Date/Time'], format='%d/%m/%Y:%H:%M:%S')
.fillna(pd.datetime.now())
.dt.strftime('%d/%m/%Y'))
cols = ['Name','Address','Email']
df[cols] = df[cols].apply(lambda x: pd.factorize(x)[0] + 1)
print (df)
job_Id Name Address Email Date/Time
0 1 1 1 1 12/10/2011
1 2 2 1 1 12/11/2011
2 3 3 2 2 02/11/2018
3 4 4 2 3 09/12/2010
4 5 5 1 4 02/11/2018
5 6 6 3 1 12/11/2012
6 7 2 2 1 02/11/2018
7 8 3 3 2 07/10/2013
8 9 1 4 5 03/03/2018