Anyone who hears the term Big Data, automatically thinks of Hadoop and Spark, but for real-time analysis of Big Data nothing is better than Apache Storm. Before we look more closely at Apache Storm, we describe in this article, how Storm can be installed with Vagrant and Docker.
Why use Vagrant?
Vagrant is a high-level wrapper around virtualization software. It makes it easy to start virtual machines from scratch because it only needs a single file which describes the type of machine, the software that needs to be installed and the way how to access the machine.
Why use Docker?
Docker provides software containers in which applications run isolated from their surroundings. Docker uses the resource isolation features to run multiple containers within a single Linux instance, avoiding the overhead of starting and maintaining virtual machines.
Storm needs min 3 nodes (zookeeper, nimbus and supervisor), so Docker is the best choice to run Storm on a single machine.
Create the following directory structure
/home/vagrant/storm
/home/vagrant/storm/docker
/home/vagrant/storm/docker/storm
/home/vagrant/storm/docker/zookeeper
In the Vagrant VM is the path /home/vagrant/storm is mapped to the path /vagrant. Within the VM, the above structure is shown as follows.
/vagrant
/vagrant/docker
/vagrant/docker/storm
/vagrant/docker/zookeeper
/home/vagrant/storm/docker/storm and /home/vagrant/storm/docker/zookeeper are the directories where the Docker instructions to create the images are stored.
Download and install Vagrant on your system. Packages regarding your system can be found here.
Change into directory /home/vagrant/storm. After executing the command
$ vagrant init ubuntu/trusty64
a file Vagrant was created. Now a few adjustments to the file Vagrant must be made. A "ready for use" Vagrant environemt file can be downloaded via the following link. An existing proxy is determined by the environment variable http_proxy.
The name for the VM, generated by Vagrant is not very meaningful. Therefore, we give the VM name "storm_vm" so that it is better to identify. In addition, the main memory of the VM needs to be increased because the standard used by Vagrant of 512 mb is not sufficient. This is done with the instruction
config.vm.provider "virtualbox" do |v| v.name = "storm_vm" v.memory = 2048 end
In order to achieve the Storm UI outside of the VM, port forwarding is required. We map the port 8080 of the VM on port 8888.
config.vm.network "forwarded_port", guest: 8080, host: 8888
Vagrant includes built-in support for Docker. It does NOT contain any special Docker version. This ensures that at the first start of a Vagrant box the latest version of Docker is always installed. Make sure that you are connected to the Internet.
config.vm.provision "docker"
Installs Docker in the virtual machine.
config.vm.provision "shell", inline: <<-SHELL sudo docker network create cluster SHELL
Creates a new network named cluster. All our containers are connected to that network. By using a private network, the "--link" option when starting container is no longer needed.
Now let’s build the images.
config.vm.provision "docker" do |d| d.build_image "/vagrant/docker/storm", args: "-t 'storm' --build-arg HTTP_PROXY=#{ENV['http_proxy']} --build-arg http_proxy=#{ENV['http_proxy']} --build-arg HTTPS_PROXY=#{ENV['http_proxy']} --build-arg https_proxy=#{ENV['http_proxy']}" d.build_image "/vagrant/docker/zookeeper", args: "-t 'zookeeper' --build-arg HTTP_PROXY=#{ENV['http_proxy']} --build-arg http_proxy=#{ENV['http_proxy']} --build-arg HTTPS_PROXY=#{ENV['http_proxy']} --build-arg https_proxy=#{ENV['http_proxy']}"
This statement builds two Docker images "storm" and "zookeeper". For the successful assembly of the images, Docker needs the correct information in a Dockerfile.
After successful build, the container should automatically be started. All containers must be on the network "cluster" and must have their readable names as the hostname. To make life easier, we map the log directory to a directory to which we have access outside the VM.
d.run "zookeeper", args: "--net=cluster --hostname=zookeeper" d.run "nimbus", image: "storm", cmd: "nimbus", args: "--net=cluster --hostname=nimbus -t -v /vagrant/docker/storm/nimbus:/opt/apache-storm-1.0.1/logs" d.run "supervisor", image: "storm", cmd: "supervisor", args: "--net=cluster --hostname=supervisor -t -v /vagrant/docker/storm/supervisor:/opt/apache-storm-1.0.1/logs" d.run "storm-ui", image: "storm", cmd: "ui", args: "--net=cluster -p 8080:8080 --hostname=storm-ui -t -v /vagrant/docker/storm/ui:/opt/apache-storm-1.0.1/logs" end
Place the following contents into a file named Dockerfile in /home/vagrant/storm/docker/storm.
FROM ubuntu:14.04 MAINTAINER Carsten Zaddach RUN apt-get update RUN apt-get install -y curl RUN apt-get install -y java-common RUN apt-get install -y ca-certificates-java RUN apt-get install -y openjdk-7-jre RUN apt-get install -y python RUN cd /opt \ && curl -fsSL http://mirror.softaculous.com/apache/storm/apache-storm-1.0.1/apache-storm-1.0.1.tar.gz \ | tar -xz COPY storm.yaml /opt/apache-storm-1.0.1/conf/storm.yaml ENTRYPOINT ["/opt/apache-storm-1.0.1/bin/storm"]
Create a file storm.yaml in /home/vagrant/storm/docker/storm and put the following into this file.
########### These MUST be filled in for a storm configuration storm.zookeeper.servers: - "zookeeper" # nimbus.seeds : ["nimbus"]
Place the following contents into a file named Dockerfile in /home/vagrant/storm/docker/zookeeper.
FROM ubuntu:14.04 MAINTAINER Carsten Zaddach RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y zookeeper CMD /usr/share/zookeeper/bin/zkServer.sh start-foreground
Now start the process of creating the VM and its Docker container.
$ vagrant up
After successful provisioning open your browser and go to http://localhost:8888.
If you have any questions or suggestions, do not hesitate to contact us.
Part 2: Install Web Based Integrated Development Environment (IDE)
comments powered by Disqus