我是PySpark的新手,我想做以下事情,
考虑以下代码,
import numpy as np
b =np.array([[1,2,100],[3,4,200],[5,6, 300],[7,8, 400]])
a = np.array([[1,2],[3,4],[11,6],[7,8], [1, 2], [7,8]])
RDDa = sc.parallelize(a)
RDDb = sc.parallelize(b)
dsmRDD = RDDb.map(lambda x: (list(x[:2]), x[2]))
我想获取与RDDa的每个值关联的值作为dsmRDD的键,即
result = [100, 200, 0, 400, 100, 400]
非常感谢您。
答案 0 :(得分:0)
如果数据不是太大,可以使用如下数据框:
import React, { PureComponent } from 'react';
import ReactDOM from 'react-dom';
import {
LineChart, Line, XAxis, YAxis, CartesianGrid, Tooltip, Legend,BarChart, Bar, Label
} from 'recharts';
export default class Example extends React.Component{
render(){
const data = [
{
name: '10:00', SmartMeter1: 0, SmartMeter2: 2400,
},
{
name: '10:30', SmartMeter1: 600, SmartMeter2: 1398,
},
{
name: '11:00', SmartMeter1: 1000, SmartMeter2: 1398,
},
{
name: '11:30', SmartMeter1: 1100, SmartMeter2: 2500,
},
{
name: '12:00', SmartMeter1: 1200, SmartMeter2: 1398,
},
{
name: '12:30', SmartMeter1: 1500, SmartMeter2: 2450,
},
{
name: '13:00', SmartMeter1: 1900, SmartMeter2: 9800,
},
{
name: '13:30', SmartMeter1: 2000, SmartMeter2: 3908,
},
{
name: '14:00', SmartMeter1: 2200, SmartMeter2: 4800,
},
{
name: '14:30', SmartMeter1: 2350, SmartMeter2: 3800,
},
{
name: '15:00', SmartMeter1: 2400, SmartMeter2: 4300,
},
];
return (
<LineChart
title = "Tagesverbrauch"
width={800}
height={500}
data={data}
margin={{
top: 5, right: 30, left: 20, bottom: 5,
}}
>
<CartesianGrid strokeDasharray="3 3" />
<XAxis unit=" Uhr" dataKey="name" tick={{ fill: 'white' }}>
</XAxis>
<YAxis unit="kWh" tick={{ fill: 'white' }}/>
<Tooltip />
<Line name="Smart Meter 1" type="monotone" dataKey="SmartMeter1" stroke="#f59f4a" strokeWidth={2} activeDot={{ r: 8 }} />
</LineChart>
);
}
}
答案 1 :(得分:0)
正如另一个答案所建议的那样,您可以转换为数据框和join
。如果您只愿意继续使用rdd
,则可以这样做,
import numpy as np
a = np.array([[1,2],[3,4],[11,6],[7,8], [1, 2], [7,8]])
b = np.array([[1,2,100],[3,4,200],[5,6, 300],[7,8, 400]])
RDDa = sc.parallelize(a)
RDDb = sc.parallelize(b)
dsmRDD = RDDa.zipWithIndex()\
.map(lambda x: (tuple(x[0].tolist()),(0,x[1])))\
.leftOuterJoin(RDDb.map(lambda x: (tuple(x[:2].tolist()), x[2])))\
.map(lambda x: (x[1][0][1], x[1][1]) if x[1][1] is not None else (x[1][0][1],x[1][0][0]))
output = map(lambda x:x[1], sorted(dsmRDD.collect()))
print output
为您提供输出,
[100, 200, 0, 400, 100, 400]