我有两个名为1.csv和2.csv的csv文件,其中1.csv为50行,而2.csv为75行。现在,我试图找出两个文件中是否都存在用户名和功能,然后将其写入新文件。
到目前为止,我的代码是:
with open('1.csv') as a:
c=pd.read_csv(a)
with open('2.csv') as b:
d=pd.read_csv(b)
if (c['User'] == d['User'] and c['Feature'] == d['Feature'] and c['ipaddress'] == d['ipaddress']):
c.to_csv('3.csv')
但是使用此代码,我遇到以下错误。
Traceback (most recent call last):
File "path/main.py", line 181, in <module>
if (c['User'] == d['User'] and c['Feature'] == d['Feature'] and c['ipaddress'] == d['ipaddress']):
File "path\lib\site-packages\pandas\core\ops.py", line 1190, in wrapper
raise ValueError("Can only compare identically-labeled "
ValueError: Can only compare identically-labeled Series objects
任何帮助将不胜感激。
1.csv :
name feature start_date
aaaa apple 2018-02-10
bbbb mango 2018-03-11
cccc orange 2018-04-12
dddd guava 2018-05-13
2.csv :
name feature end_date
aaaa apple 2018-02-13
bbbb mango 2018-03-16
cccc orange 2018-04-15
dddd guava 2018-05-18
eeee Avocado 2018-06-14
ffff Banana 2018-07-13
gggg Bilberry 2018-08-09
Expected output 3.csv
name feature start_date end_date difference
aaaa apple 2018-02-10 2018-02-13 3days.
bbbb mango 2018-03-11 2018-03-16 5days.
cccc orange 2018-04-12 2018-04-15 3days.
dddd guava 2018-05-13 2018-05-18 5days.
答案 0 :(得分:1)
您可以通过几行代码轻松地做到这一点:
import pandas as pd
# Uncomment to read from file and comment out a, b vars from pd.DataFrame
#a = pd.read_csv('1.csv')
#b = pd.read_csv('2.csv')
a = pd.DataFrame({'name': ['aaaa', 'bbbb', 'cccc', 'dddd'],
'feature': ['apple', 'mango', 'orange', 'guava'],
'start_date': ['2018-02-10','2018-03-11','2018-04-12','2018-05-13',]})
b = pd.DataFrame({'name': ['aaaa', 'bbbb', 'cccc', 'dddd', 'eeee', 'ffff', 'gggg'],
'feature': ['apple', 'mango', 'orange', 'guava', 'Avocado', 'Banana','Bilberry',],
'end_date': ['2018-02-13','2018-03-16','2018-04-15','2018-05-18','2018-06-14','2018-07-13','2018-08-09']})
# replace to on=['name', 'feature', 'ipaddress'] if needed.
# In example you don't have 'ipaddress', but in your code you have it
c = pd.merge(a, b, how='inner', on=['name', 'feature'])
c['difference'] = pd.to_datetime(c['end_date']) - pd.to_datetime(c['start_date'])
print(c)
#Uncomment to save to file
#c.to_csv('3.csv')
检查变量。完全与您的示例相同。
print(a)
name feature start_date
0 aaaa apple 2018-02-10
1 bbbb mango 2018-03-11
2 cccc orange 2018-04-12
3 dddd guava 2018-05-13
print(b)
name feature end_date
0 aaaa apple 2018-02-13
1 bbbb mango 2018-03-16
2 cccc orange 2018-04-15
3 dddd guava 2018-05-18
4 eeee Avocado 2018-06-14
5 ffff Banana 2018-07-13
6 gggg Bilberry 2018-08-09
print(c)
name feature start_date end_date difference
0 aaaa apple 2018-02-10 2018-02-13 3 days
1 bbbb mango 2018-03-11 2018-03-16 5 days
2 cccc orange 2018-04-12 2018-04-15 3 days
3 dddd guava 2018-05-13 2018-05-18 5 days
希望这会有所帮助!
答案 1 :(得分:0)
使用合并。
df1 = pd.read_csv('1.csv')
df2 = pd.read_csv('2.csv')
df3 = df1.merge(df2, on = ['name','feature'],how = 'left')
然后,您可以根据时间戳列的数据类型对日期进行减法。