错误apache spark MultilayerPerceptronClassifier - “字段”功能“不存在”。

时间:2017-09-01 10:58:00

标签: python apache-spark neural-network pyspark

我收到此错误“字段”功能“不存在”。在我尝试使用MultilayerPerceptronClassifier之后。这是输入数据的一部分:

链接到我用来从https://www.dropbox.com/s/3q2ekgwvlculiqa/Aug%2019%202016%20AlN-GaN%20Sapphire.xrdml?dl=0提取数据的xml文件(我只从标签'intesities','startPosition'和'endPosition' - position axis = 2Theta中提取数据)。这是spark https://www.dropbox.com/s/mv5khfhx2favkbp/DataFrame_collectData.txt?dl=0之后的数据收集输出的链接。并链接到显示数据图表https://www.dropbox.com/s/nnzjkh39ejrq921/Screenshot%20from%202017-09-15%2010-06-25.png?dl=0

的图像

我的部分代码

xml_files = upload.objects.filter(file_upl_group__upload_group = name_group)


intensities = xml_intensities.objects.all()
dList = spark.createDataFrame(intensities)
#dList = dataframe.rdd.map(apache_separate_data)
select_list = dList.select(dList['intensities'])
dList.createOrReplaceTempView("dList")
name_group = request.POST.get('analize_list')


def df_extract_1(x,y,z):
    intens = re.split(' ', x)

    count = len(intens)
    theta_dif = (z - y)/count
    theta = y
    s = []
    j = 0
    for i in intens:
        theta += theta_dif
        v = Vectors.dense(theta, int(i))
        j += 1
        s.append(Row(j, v))
    a = []
    a.extend((s))
    return a


list_inten = spark.sql('SELECT intensities, startPosition, endPosition FROM dList').rdd.map(lambda x: df_extract_1(x[0], x[1], x[2]))
struct_schema = ArrayType(StructType([StructField("label", LongType()), StructField("features", VectorUDT())]))
list_inten_df = spark.createDataFrame(list_inten, schema = struct_schema)

list_inten_df.createOrReplaceTempView("list_inten_df")    



# specify layers for the neural network:
# input layer of size 4 (features), two intermediate of size 5 and 4
# and output of size 3 (classes)
layers = [2500, 25, 14, 1]

trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, blockSize=128, seed=1234)

model = trainer.fit(list_inten_df)

如果有人能够帮助我解决这个问题,我将非常感激。

实际上我只是放了一部分数据。实际数据是包含数字(向量)列表的列表。我设置2500,因为每个子列表的输入数据不同,我只有一个类作为输出,因为我需要确定是否为真(我需要根据我使用的示例将其分类为好或不好)教它)。这段代码是我为从xml文件中提取数据而创建的基于Web的应用程序的一部分,我想建立一个神经网络来根据我之前知道的好的数据来判断数据是否良好。

这是其中一个xml文件的一部分...

 <positions axis="2Theta" unit="deg">
             <startPosition>30.00728257</startPosition>
            <endPosition>37.99705961</endPosition>
        </positions>

        <commonCountingTime unit="seconds">148.920</commonCountingTime>
        <intensities unit="counts">1314 1350 1299 1401 1365 1423 1419 1343 1417 1360 1347 1397 1313 1306 1336 1366 1302 1393 1393 1400 1379 1370 1389 1335 1348 1374 1449 1387 1380 1371 1320 1413 1306 1327 1306 1331 1398 1336 1407 1297 1375 1335 1306 1279 1303 1327 1381 1377 1346 1328 1416 1339 1320 1370 1232 1311 1331 1315 1361 1297 1416 1347 1363 1314 1357 1361 1340 1285 1272 1381 1301 1297 1311 1360 1324 1338 1274 1293 1303 1379 1340 1383 1389 1352 1380 1275 1375 1293 1364 1355 1337 1345 1306 1367 1311 1354 1389 1295 1283 1328 1330 1380 1367 1306 1317 1317 1314 1332 1282 1351 1341 1267 1341 1305 1313 1315 1383 1335 1370 1317 1373 1333 1297 1332 1291 1326 1280 1368 1320 1314 1318 1321 1261 1369 1337 1304 1306 1336 1282 1327 1270 1291 1258 1314 1281 1231 1294 1286 1328 1314 1327 1332 1324 1361 1376 1313 1288 1268 1348 1309 1322 1310 1296 1338 1341 1275 1389 1363 1332 1255 1380 1297 1350 1267 1335 1317 1342 1369 1307 1301 1290 1285 1320 1323 1284 1349 1350 1323 1302 1373 1334 1300 1384 1312 1372 1317 1282 1319 1257 1329 1250 1395 1241 1386 1320 1325 1371 1343 1350 1265 1345 1269 1308 1343 1345 1376 1316 1284 1251 1321 1280 1283 1314 1392 1308 1243 1387 1370 1301 1308 1381 1306 1330 1363 1303 1201 1346 1298 1303 1311 1367 1316 1366 1313 1380 1317 1280 1275 1293 1305 1300 1381 1313 1427 1320 1339 1296 1262 1290 1307 1298 1188 1424 1269 1262 1343 1273 1333 1220 1310 1283 1335 1189 1295 1308 1319 1371 1321 1219 1221 1291 1280 1278 1307 1309 1342 1365 1339 1337 1324 1340 1327 1291 1277 1319 1318 1281 1366 1284 1286 1308 1348 1316 1297 1294 1306 1275 1350 1248 1263 1305 1281 1287 1230 1376 1246 1297 1354 1311 1233 1227 1334 1263 1395 1247 1321 1302 1259 1262 1313 1329 1330 1285 1295 1326 1318 1274 1265 1287 1285 1268 1241 1213 1316 1311 1283 1281 1282 1265 1276 1190 1272 1325 1341 1237 1274 1281 1392 1288 1300 1292 1326 1262 1292 1294 1349 1348 1320 1329 1306 1268 1237 1212 1281 1229 1222 1297 1274 1331 1310 1322 1296 1321 1229 1239 1261 1283 1264 1288 1359 1304 1280 1291 1314 1290 1258 1345 1283 1376 1263 1290 1336 1302 1233 1296 1296 1314 1335 1312 1371 1313 1310 1259 1388 1302 1251 1264 1295 1370 1242 1326 1358 1316 1311 1196 1302 1302 1275 1259 1347 1215 1368 1285 1356 1345 1248 1284 1266 1377 1275 1270 1288 1328 1296 1254 1271 1251 1249 1262 1251 1308 1228 1241 1273 1224 1303 1238 1349 1238 1352 1328 1268 1328 1195 1317 1285 1277 1308 1326 1322 1329 1345 1281 1294 1328 1296 1299 1322 1315 1341 1302 1291 1255 1324 1274 1314 1258 1271 1249 1239 1231 1325 1260 1307 1306 1284 1221 1291 1257 1236 1327 1270 1334 1226 1342 1230 1201 1288 1221 1304 1282 1302 1230 1245 1221 1330 1267 1405 1305 1272 1283 1260 1273 1254 1239 1300 1273 1246 1319 1250 1273 1308 1268 1267 1295 1248 1293 1280 1353 1262 1294 1333 1338 1350 1375 1315 1284 1278 1335 1255 1337 1254 1344 1284 1338 1299 1303 1244 1358 1206 1256 1289 1308 1290 1269 1214 1258 1312 1239 1290 1351 1226 1290 1330 1265 1225 1253 1275 1305 1265 1245 1248 1256 1263 1334 1327 1255 1347 1262 1375 1270 1294 1244 1265 1290 1220 1291 1216 1241 1298 1297 1318 1332 1297 1241 1340 1310 1225 1287 1242 1212 1238 1228 1267 1330 1327 1373 1316 1332 1304 1329 1312 1219 1236 1256 1347 1338 1307 1294 1322 1228 1338 1251 1269 1301 1238 1374 1295 1331 1344 1312 1323 1308 1301 1280 1335 1217 1284 1281 1310 1277 1286 1297 1396 1303 1370 1346 1274 1262 1290 1294 1369 1296 1349 1319 1353 1308 1305 1331 1372 1349 1418 1400 1384 1384 1411 1443 1391 1347 1526 1384 1416 1370 1433 1381 1438 1480 1456 1499 1400 1498 1497 1581 1550 1613 1600 1597 1637 1700 1714 1740 1696 1673 1691 1715 1631 1764 1714 1744 1739 1844 1689 1817 1800 1809 1810 1800 1892 1799 1853 1811 1878 1791 1757 1841 1772 1755 1860 1751 1768 1801 1716 1737 1821 1733 1619 1718 1633 1715 1615 1572 1652 1580 1705 1555 1505 1634 1564 1578 1483 1488 1499 1492 1581 1466 1425 1457 1468 1465 1411 1432 1373 1471 1405 1347 1361 1376 1312 1387 1281 1355 1357 1397 1386 1324 1374 1339 1321 1348 1387 1296 1369 1326 1293 1303 1352 1327 1279 1334 1287 1394 1306 1300 1371 1290 1281 1270 1356 1258 1289 1270 1298 1356 1287 1300 1344 1342 1267 1239 1284 1321 1264 1278 1320 1270 1327 1294 1273 1293 1294 1238 1263 1331 1267 1304 1380 1317 1349 1320 1246 1260 1277 1245 1285 1286 1267 1343 1336 1300 1341 1294 1322 1264 1274 1234 1347 1242 1387 1297 1278 1288 1300 1389 1285 1296 1327 1390 1295 1308 1283 1262 1322 1338 1284 1422 1324 1289 1348 1282 1281 1296 1311 1366 1337 1344 1260 1328 1365 1357 1347 1332 1335 1324 1352 1320 1399 1267 1336 1285 1390 1351 1326 1333 1316 1311 1403 1404 1385 1355 1397 1438 1340 1317 1376 1388 1333 1374 1374 1346 1340 1380 1436 1338 1361 1355 1361 1424 1405 1418 1414 1358 1350 1439 1377 1345 1468 1464 1505 1419 1521 1460 1350 1427 1390 1470 1392 1443 1464 1464 1431 1417 1469 1465 1467 1483 1478 1473 1484 1489 1368 1457 1506 1466 1471 1469 1467 1527 1464 1436 1507 1472 1511 1596 1558 1458 1525 1501 1541 1510 1518 1489 1583 1509 1529 1619 1608 1463 1488 1575 1557 1519 1505 1552 1543 1596 1537 1560 1551 1653 1617 1593 1672 1532 1632 1599 1599 1619 1615 1708 1606 1722 1616 1640 1684 1635 1695 1734 1667 1717 1591 1676 1724 1728 1749 1725 1781 1716 1802 1703 1779 1725 1746 1711 1839 1777 1778 1760 1803 1775 1728 1745 1778 1760 1737 1815 1816 1869 1719 1874 1845 1871 1866 1839 1879 1815 1836 1870 1793 1874 1847 1807 1868 1899 1916 1848 1891 1870 1949 1877 1832 1916 1909 1878 1771 1901 1815 1841 1821 1921 1877 1957 1838 1873 1922 1866 1900 1859 1995 1878 1826 1908 1803 1869 1824 1857 1819 1815 1840 1855 1855 1808 1906 1894 1864 1922 1768 1874 1824 1796 1761 1794 1722 1804 1867 1803 1856 1764 1750 1804 1743 1783 1778 1701 1778 1816 1747 1729 1641 1787 1726 1717 1786 1658 1587 1684 1724 1712 1694 1667 1672 1670 1738 1682 1621 1754 1582 1634 1594 1703 1605 1686 1593 1708 1664 1716 1650 1672 1547 1567 1605 1620 1628 1662 1626 1582 1649 1578 1671 1632 1646 1643 1801 1804 2048 2230 2478 2761 3050 2913 2594 2190 1968 1862 1804 1775 1703 1677 1584 1622 1497 1554 1575 1553 1578 1529 1584 1571 1483 1526 1508 1470 1447 1513 1529 1516 1489 1413 1474 1478 1456 1423 1453 1465 1466 1379 1508 1424 1415 1453 1368 1521 1487 1428 1438 1451 1399 1429 1388</intensities>
    </dataPoints>

谢谢!

1 个答案:

答案 0 :(得分:0)

希望这有帮助!

from pyspark.sql.types import Row
from pyspark.ml.linalg import Vectors
from pyspark.ml.classification import MultilayerPerceptronClassifier

#I converted your data to below simplified example
list_inten = [Row(0, Vectors.dense([30.0138, 1314.0])),
              Row(1, Vectors.dense([30.0204, 1350.0])),
              Row(0, Vectors.dense([30.027, 1299.0])),
              Row(1, Vectors.dense([30.0335, 1401.0])),
              Row(0, Vectors.dense([30.0401, 1365.0])),
              Row(1, Vectors.dense([6.0524, 1104.0])),
              Row(0, Vectors.dense([6.0786, 1041.0])),
              Row(1, Vectors.dense([6.1049, 1050.0])),
              Row(0, Vectors.dense([6.1311, 1238.0])),
              Row(1, Vectors.dense([6.1574, 1158.0]))]
list_inten_df = sqlContext.createDataFrame(list_inten, ["label", "features"])
list_inten_df.show()

# specify layers for the neural network:
# input layer of size 2 (features), two intermediate of size 5 and 4
# and output of size 2 (classes) - note that in this example 'labels' has two values (i.e. 0 & 1)
layers = [2, 5, 4, 2]
trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, blockSize=128, seed=1234)
model = trainer.fit(list_inten_df)
model.transform(list_inten_df).show()

在您的代码中layers定义错误。这意味着你有2500个功能(但在你的例子中它只有2个),两个大小为25和14的中间(可以!)和1个类(这也是错误的。在你的例子中它似乎是1218 - 检查标签的值)

layers = [2500, 25, 14, 1]