我的CSV
只有一列domains
,与此相似:
google.com
yahoo.com
cnn.com
toast.net
我想添加一个重复的列并添加标题domain
和matches
,以便我的csv
看起来像:
domain matching
google.com google.com
yahoo.com yahoo.com
cnn.com cnn.com
toast.net toast.net
我在使用pandas的python脚本中尝试了以下内容:
df = read_csv('temp.csv')
df.columns = ['domain', 'matching']
df['matching'] = df['domain']
df.to_csv('temp.csv', index=False)
但是我收到以下错误:
“ValueError:长度不匹配:预期的轴有1个元素,新值有2个元素”。
我假设我需要先添加另一个列?我可以用熊猫来做这件事吗?
答案 0 :(得分:1)
您可以将参数name
添加到read_csv
:
import pandas as pd
import io
temp=u"""google.com
yahoo.com
cnn.com
toast.net"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), names=['domain'])
#real data
#df = pd.read_csv('temp.csv', names=['domain'])
print (df)
domain
0 google.com
1 yahoo.com
2 cnn.com
3 toast.net
df['matching'] = df['domain']
print (df.to_csv(index=False))
#real data
#df.to_csv('temp.csv', index=False)
domain,matching
google.com,google.com
yahoo.com,yahoo.com
cnn.com,cnn.com
toast.net,toast.net
您可以修改解决方案,但丢失了第一行,因为它被读作列名:
df = pd.read_csv(io.StringIO(temp))
print (df)
#real data
#df = pd.read_csv('temp.csv')
google.com
0 yahoo.com
1 cnn.com
2 toast.net
df.columns = ['domain']
df['matching'] = df['domain']
df.to_csv('temp.csv', index=False)
但您可以将参数header=None
添加到read_csv
并从df.columns = ['domain', 'matching']
中移除第二个值,因为第一个DataFrame
只有一列:
import pandas as pd
import io
temp=u"""google.com
yahoo.com
cnn.com
toast.net"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), header=None)
print (df)
#real data
#df = pd.read_csv('temp.csv', header=None)
0
0 google.com
1 yahoo.com
2 cnn.com
3 toast.net
df.columns = ['domain']
df['matching'] = df['domain']
df.to_csv('temp.csv', index=False)