您好我尝试将几个现有列合并为一个新列,然后删除CSV文件中的三个原始列。我一直试图用熊猫做这个,但没有太多运气。我对python很新。
我的代码首先将多个CSV文件组合在同一目录中,然后尝试操作列。第一个组合工作,我得到一个带有组合数据的output.csv,但列的组合没有。
with open("cyclesAndSignalChange.csv", 'wb') as csvfile:
wr = csv.writer(csvfile, delimiter=' ')
wr.writerow(['A', 'B', 'C', 'D'])
for key, value in cycle_with_signal_change.items():
wr.writerow([key, *value])
有效地解决这个问题:
import glob
import pandas as pd
interesting_files = glob.glob("*.csv")
header_saved = False
with open('output.csv','wb') as fout:
for filename in interesting_files:
with open(filename) as fin:
header = next(fin)
if not header_saved:
fout.write(header)
header_saved = True
for line in fin:
fout.write(line)
df = pd.read_csv("output.csv")
df['HostAffected']=df['Host'] + "/" + df['Protocol'] + "/" + df['Port']
df.to_csv("newoutput.csv")
这样的事情:
Host,Protocol,Port
10.0.0.10,tcp,445
10.0.0.10,tcp,445
10.0.0.10,tcp,445
10.0.0.10,tcp,445
10.0.0.10,tcp,445
10.0.0.10,tcp,445
10.0.0.10,tcp,445
10.0.0.10,tcp,49707
10.0.0.10,tcp,49672
10.0.0.10,tcp,49670
然而,csv中还有其他列。
我不是编码员,我只是想解决问题,任何帮助都非常感激。
答案 0 :(得分:2)
我认为,我们有三种选择:
10 loops, best of 3: 39.7 ms per loop
10 loops, best of 3: 35.9 ms per loop
10 loops, best of 3: 162 ms per loop
<强>计时强>:
import pandas as pd
data = '''\
ID,Host,Protocol,Port
1,10.0.0.10,tcp,445
1,10.0.0.10,tcp,445
1,10.0.0.10,tcp,445
1,10.0.0.10,tcp,445
1,10.0.0.10,tcp,445
1,10.0.0.10,tcp,445
1,10.0.0.10,tcp,445
1,10.0.0.10,tcp,49707
1,10.0.0.10,tcp,49672
1,10.0.0.10,tcp,49670'''
df = pd.read_csv(pd.compat.StringIO(data)) # Recreates a sample dataframe
cols = ['Host','Protocol','Port']
newcol = ['/'.join(i) for i in df[cols].astype(str).values]
df = df.assign(HostAffected=newcol).drop(cols, 1)
print(df)
无论如何最慢,我认为这将是您最具可读性的方法:
ID HostAffected
0 1 10.0.0.10/tcp/445
1 1 10.0.0.10/tcp/445
2 1 10.0.0.10/tcp/445
3 1 10.0.0.10/tcp/445
4 1 10.0.0.10/tcp/445
5 1 10.0.0.10/tcp/445
6 1 10.0.0.10/tcp/445
7 1 10.0.0.10/tcp/49707
8 1 10.0.0.10/tcp/49672
9 1 10.0.0.10/tcp/49670
返回:
+----------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-----+
| Time | 12:00 | 12:01 | 12:02 | 12:03 | 12:04 | 12:05 | 12:06 | 12:07 | 12:08 | ... |
+----------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-----+
| Series 1 | 8 | | 2 | | 4 | | 8 | | 6 | |
| Series 2 | | 5 | | 4 | | 7 | | 2 | | |
| Series 3 | 5 | | | | 7 | | | | 2 | |
| ... | | | | | | | | | | |
+----------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-----+
答案 1 :(得分:0)
这是你可以做到的:
dt = """Host,Protocol,Port
10.0.0.10,tcp,445
10.0.0.10,tcp,445
10.0.0.10,tcp,445
10.0.0.10,tcp,445
10.0.0.10,tcp,445
10.0.0.10,tcp,445
10.0.0.10,tcp,445
10.0.0.10,tcp,49707
10.0.0.10,tcp,49672
10.0.0.10,tcp,49670"""
tdf = pd.read_csv(pd.compat.StringIO(dt))
tdf['HostsAffected'] = tdf.apply(lambda x: '{}/{}/{}'.format(x['Host'] , x['Protocol'] , x['Port']), axis=1)
tdf = tdf[['HostsAffected']]
tdf.to_csv(<path-to-save-csv-file>)
这将是输出:
HostsAffected
0 10.0.0.10/tcp/445
1 10.0.0.10/tcp/445
2 10.0.0.10/tcp/445
3 10.0.0.10/tcp/445
4 10.0.0.10/tcp/445
5 10.0.0.10/tcp/445
6 10.0.0.10/tcp/445
7 10.0.0.10/tcp/49707
8 10.0.0.10/tcp/49672
9 10.0.0.10/tcp/49670
如果您正在从文件中读取CSV,请按如下所示编辑read_csv行:
tdf = pd.read_csv(<path-to-the-file>)