如果存在这样的文档,如何跳过索引文档?

时间:2017-02-21 03:52:12

标签: apache-spark elasticsearch

我使用Spark将大量数据写入Elasticsearch。但是它们中的一些(有时是大多数)是在这种情况下具有相同id的重复文档。由于将数据写入ES需要花费很多时间,我想知道如果文档的id已经存在于ES中,如何跳过索引?

喜欢:

if doc.id in ES:
    continue
else 
   doc.index(ES)

1 个答案:

答案 0 :(得分:0)

我不知道如何与spark结合,但在es中你可以设置operation type

import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.IOException;

public class Transforms2D extends JPanel {

  private class Display extends JPanel {
    protected void paintComponent(Graphics g) {
        super.paintComponent(g);
        Graphics2D g2 = (Graphics2D)g;
        g2.translate(300,300);  // Moves (0,0) to the center of the display.
        int whichTransform = transformSelect.getSelectedIndex();

        g2.rotate(Math.toRadians(45)); //picture 2

        g2.scale(0.5,1);
        g2.rotate(Math.toRadians(180)); //picture 3

        g2.shear(0.05,0.5); //picture 4

        g2.scale(2,1);
        g2.translate(300,0);//picture 5

        g2.shear(0.05,0.5);
        g2.rotate(Math.toRadians(45)); //picture 6

        g2.scale(0.5,1);
        g2.rotate(Math.toRadians(180)); //picture 7

        g2.scale(2,1);
        g2.rotate(Math.toRadians(45));
        g2.translate(300,400); //picture 8

        g2.rotate(Math.toRadians(180));
        g2.scale(1,2);
        g2.translate(150,300); //picture 9

     //these 8 transforms are what I did. but I don't know how to make them depending on the value of 'whichTransform' which comes from 'for loop' following.


        // TODO Apply transforms here, depending on the value of whichTransform!

        g2.drawImage(pic, -200, -150, null); // Draw image with center at (0,0).
    }
}

private Display display;
private BufferedImage pic;
private JComboBox<String> transformSelect;

public Transforms2D() throws IOException {
    pic = ImageIO.read(getClass().getClassLoader().getResource("shuttle.jpg"));
    display = new Display();
    display.setBackground(Color.YELLOW);
    display.setPreferredSize(new Dimension(600,600));
    transformSelect = new JComboBox<String>();
    transformSelect.addItem("None");
    for (int i = 1; i < 10; i++) {
        transformSelect.addItem("No. " + i);
    }
    transformSelect.addActionListener( new ActionListener() {
        public void actionPerformed(ActionEvent e) {
            display.repaint();
        }
    });
    setLayout(new BorderLayout(3,3));
    setBackground(Color.GRAY);
    setBorder(BorderFactory.createLineBorder(Color.GRAY,10));
    JPanel top = new JPanel();
    top.setLayout(new FlowLayout(FlowLayout.CENTER));
    top.setBorder(BorderFactory.createEmptyBorder(4, 4, 4, 4));
    top.add(new JLabel("Transform: "));
    top.add(transformSelect);
    add(display,BorderLayout.CENTER);
    add(top,BorderLayout.NORTH);
}


public static void main(String[] args) throws IOException {
    JFrame window = new JFrame("2D Transforms");
    window.setContentPane(new Transforms2D());
    window.pack();
    window.setResizable(false);
    window.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
    Dimension screen = Toolkit.getDefaultToolkit().getScreenSize();
    window.setLocation( (screen.width - window.getWidth())/2, (screen.height - window.getHeight())/2 );
    window.setVisible(true);
}

但唯一的问题

PUT twitter/tweet/1?op_type=create
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}