我有以下代码:
##Overall reported expertise men vs women
import sys, re
import numpy as np
import smtplib
import matplotlib.pyplot as plt
from random import randint
import csv
import pylab as pl
import math
import pandas as pd
from pandas.tools.plotting import scatter_matrix
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('-inVar', '--x', help = 'independent variable')
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
args = parser.parse_args()
##Manipulating data so it can be graphed more easily
df1 = pd.read_csv('atc17-pcinfo.csv')
df1['Gender'] = df1['Gender'].replace(['M'], int(1))
df1['Gender'] = df1['Gender'].replace(['F'], int(2))
df1['Gender'] = df1['Gender'].convert_objects(convert_numeric = True)
x = df1['Gender']
y = df1['topic: Big data infrastructure']
print list(df1)
ax = df1.plot.scatter(x = x, y = y)
labels = [item.get_text() for item in ax.get_xticklabels()]
labels[1] = 'M'
labels[6] = 'F'
ax.set_xticklabels(labels)
#ax.set_title(y + ' vs. ' + x, fontsize=20)
plt.xlabel(x, fontsize=16)
plt.ylabel(y, fontsize=16)
plt.show()
我正在尝试使用df1中的数据创建散点图。在我操纵数据之后:
Gender topic: Big data infrastructure
0 2 NaN
1 1 -1
2 1 -1
3 1 -1
4 2 1
5 1 NaN
6 1 NaN
7 1 NaN
8 1 -2
9 1 1
10 2 1
11 1 NaN
12 1 1
13 1 -1
14 1 1
15 1 NaN
16 1 NaN
17 1 NaN
18 1 -1
19 1 -2
20 2 1
21 1 NaN
22 1 NaN
23 2 2
24 1 -2
25 2 2
26 1 NaN
27 1 2
28 1 1
29 1 NaN
30 1 2
31 1 NaN
32 1 NaN
33 2 2
34 1 2
但是我收到了这个错误:
KeyError('%s not in index' % objarr[mask])
KeyError: '[ nan -1. -1. -1. 1. nan nan nan -2. 1. 1. nan 1. -1. 1.\n nan nan nan -1. -2. 1. nan nan 2. -2. 2. nan 2. 1. nan\n 2. nan nan 2. 2.] not in index'
有人可以帮我找出原因吗?我看了几个消息来源,但我没看到我的例子是如何与他们相关的。
答案 0 :(得分:3)
我认为你有两个问题:
第一个是您滥用x
方法的y
和scatter
参数。它们应该传递所需列的列名,而不是实际值!因此,它应该像这样使用:
ax = df1.plot.scatter(x = "Gender", y = "topic: Big data infrastructure")
你的第二个问题是你还没有转换你的大数据'将列列为数字值,就像使用'性别'之一。
这应该做的工作:
df1['topic: Big data infrastructure'] = df1['topic: Big data infrastructure'].convert_objects(convert_numeric = True)
由于在DataFrame操作过程中你会经常使用列名,我建议你使用更短更简单的名字......
下面是一个工作示例:
# Read your copied df, saved as test.csv
df1 = pd.read_csv("test.csv",sep=",")
#rename df for easier work
df1.columns = ["Gender","Big_Data"]
# convert strings into floats/integers
df1 = df1.convert_objects(convert_numeric=True)
#Create figure by selecting desired columns as input x and y
ax = df1.plot.scatter(x = "Gender", y ="Big_Data")
fig = ax.get_figure()
fig.savefig('its_working.png')