如何使用pandas添加重复的csv列

时间:2016-08-15 12:58:46

标签: python csv pandas dataframe

我的CSV只有一列domains,与此相似:

google.com
yahoo.com
cnn.com
toast.net

我想添加一个重复的列并添加标题domainmatches,以便我的csv看起来像:

domain       matching
google.com   google.com
yahoo.com    yahoo.com
cnn.com      cnn.com
toast.net    toast.net

我在使用pandas的python脚本中尝试了以下内容:

df = read_csv('temp.csv')
df.columns = ['domain', 'matching']
df['matching'] = df['domain']
df.to_csv('temp.csv', index=False)

但是我收到以下错误:

  

“ValueError:长度不匹配:预期的轴有1个元素,新值有2个元素”。

我假设我需要先添加另一个列?我可以用熊猫来做这件事吗?

1 个答案:

答案 0 :(得分:1)

您可以将参数name添加到read_csv

import pandas as pd
import io

temp=u"""google.com
yahoo.com
cnn.com
toast.net"""

#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), names=['domain'])
#real data
#df = pd.read_csv('temp.csv', names=['domain'])

print (df)
       domain
0  google.com
1   yahoo.com
2     cnn.com
3   toast.net

df['matching'] = df['domain']

print (df.to_csv(index=False))
#real data
#df.to_csv('temp.csv', index=False)
domain,matching
google.com,google.com
yahoo.com,yahoo.com
cnn.com,cnn.com
toast.net,toast.net

您可以修改解决方案,但丢失了第一行,因为它被读作列名:

df = pd.read_csv(io.StringIO(temp))
print (df)
#real data
#df = pd.read_csv('temp.csv')
  google.com
0  yahoo.com
1    cnn.com
2  toast.net

df.columns = ['domain']
df['matching'] = df['domain']

df.to_csv('temp.csv', index=False)

但您可以将参数header=None添加到read_csv并从df.columns = ['domain', 'matching']中移除第二个值,因为第一个DataFrame只有一列:

import pandas as pd
import io

temp=u"""google.com
yahoo.com
cnn.com
toast.net"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), header=None)
print (df)
#real data
#df = pd.read_csv('temp.csv', header=None)
            0
0  google.com
1   yahoo.com
2     cnn.com
3   toast.net

df.columns = ['domain']
df['matching'] = df['domain']

df.to_csv('temp.csv', index=False)