Skip to content

hausmane/Twitter_app

Repository files navigation

Twitter Realtime Data Analytics with Storm and Kafka

This application is an example of how to scalably process any big data stream with Storm and Kafka.

The aim of this project is to have an application that performs the trending topics in Twitter for a given period of time and frecuency. For this purpose, it is used sliding window analysis algorithm which is one of the most common algorithms in complex event processing.

Index

  • twitterApp.sh
    • Usage
  • twitterTrendingTopics.jar
    • Usage
  • Contact information
  • Blog explaining the basics steps to develop this project

twitterApp

The startTwitterApp.sh script is a java application that reads tweets from both the Twitter Streaming API (MODE 2) and a preloaded log file (MODE 1). This application uses Apache Kafka to store the tweets and send them through a topic "twitter-topic" with a storm bolt as a consumer of this topic.

Usage:

This script file has the following arguments associated:

  • mode: 1 means read from file, 2 means read from the Twitter API.
  • apiKey: key associated with the Twitter app consumer.
  • apiSecret: secret associated with the Twitter app consumer.
  • tokenValue: access token associated with the Twitter app.
  • tokenSecret: access token secret.
  • Kafka Broker URL: String in the format IP:port corresponding with the Kafka Broker
  • filenameLogsPath: path to the file with the tweets (the path is related to the filesystem of the node that will be used to run the Twitter app)

To execute this application you must have installed kafka (http://kafka.apache.org/) and execute the following commands:

First, it is needed to start kafka and zookeper servers

------------ Start kafka -----------------------------------------------------------------------

1. Start Zookeeper server in Kafka using following script in your kafka installation folder
 ./bin/zookeeper-server-start.sh config/zookeeper.properties & 
2. Start Kafka server using following script
 ./bin/kafka-server-start.sh config/server.properties & 

---------- Verify the Topic and Messages -------------------------------------------------------

1. Check if topic is there using
 $./kafka-topics.sh --list --zookeeper localhost:2181 
2. Consume messages on topic twitter-topic to verify the incoming message stream.
 $ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic twitter-topic --from-beginning 

Once you have adquired the key and secret associated with your twitter account (It is explained in further details in the blog provided at the end of this file), an example of execution could be:

(MODE 1) In order to read from file (mode 1) The following arguments are required: ./startTwitterApp.sh mode kafkaBrokerURL filenameLogsPath

 ./startTwitterApp.sh "1" "node4:9092" "../../tweetsLogFile.log" 

(MODE 2) In order to read from twitter API (mode 2) The following arguments are required: ./startTwitterApp.sh mode apiKey apiSecret tokenValue tokenSecret KafkaBrokerURL

 ./startTwitterApp.sh "2" "FjFVvfxNkx3aqv2X0xIP" "7IIB5CafrQIRlkHuO3283ciJYud7DdqTC0Kem" "475871668-W6d8hmIVwpjaypzxSDEmjrufRP5mNaR" "zFJUvJoOspsb39E2fZBfYNuGGZhgQJ0gfJ9Df" "node4:9092" 

twitterTrendingTopics.jar

The twitterTrendingTopics.jar has implemented the topology needed to process twitter topics using the Storm API (http://storm.apache.org/index.html)

![Alt text] (https://theredqueeneffectblog.files.wordpress.com/2015/12/captura-de-pantalla-de-2015-12-08-1931461.png?w=1108 "Storm Topology")

Usage

This script file has the following arguments associated:

  • langList: String with the list of languages (“lang” values) we are interested in. The list is in CSV format, example: en,pl,ar,es
  • Zookeeper URL: String IP:port of the Zookeeper node.
  • winParams: String with the window parameters size and advance using the format: size,advance. The time units are seconds.
  • topologyName: String identifying the topology in the Storm Cluster.
  • Folder: path to the folder used to store the output files (the path is relative to the filesystem of the node that will be used to run the Storm Supervisor)

Once you have installed Storm and configurated the environment (It is explained in further details in the blog provided at the end of this file), an example of execution in a local environment could be:

 java -jar trendingTopology.jar "es,en,pl,ar" "localhost:2181" "60,30" "trending-topology" "/home/yolanda" 
To ilustrate an execution of this topology in a cluster mode, it is provided the example below:
1) Connection to the cluster

First it is needed to access the cluster and start the necessary instances associated to this project:

ssh [email protected] -p 51005 # --> twitterApp ssh [email protected] -p 51004 # --> kafka ./kafka_2.10-0.8.2.1/bin/kafka-server-start.sh ./kafka_2.10-0.8.2.1/config/server.properties ./kafka-topics.sh --create --topic twitter-topic --zookeeper node2 --partitions 2 --replication-factor 1 ./kafka-topics.sh --describe --zookeeper node2 zssh [email protected] -p 51002 # --> storm H2 ./zookeeper-3.4.6/bin/zkServer.sh start ./zookeeper-3.4.6/bin/zkServer.sh stop Nimbus: ./apache-storm-0.10.0/bin/storm nimbus UI: ./apache-storm-0.10.0/bin/storm ui Supervisor: ./apache-storm-0.10.0/bin/storm supervisor ssh [email protected] -p 51003 # --> storm H3 Supervisor: ./apache-storm-0.10.0/bin/storm supervisor 

The Storm UI has a friendly user interface to monitor the performance of each process and control the execution of each thread:

![Alt text] (https://theredqueeneffectblog.files.wordpress.com/2016/04/storm_ui1.png?w=1108 "Storm UI")

URL: http://138.4.110.141:41002

2) Compile the project locally and copy files to the cluster
mvn clean compile package appassembler:assemble scp -P 51005 -r appassembler [email protected]:/home/masteruser1 scp -P 51005 tweetsLogFile.log [email protected]:/home/masteruser1 scp -P 51002 trendingTopology.jar [email protected]:/home/masteruser1 
3) Submit the topology to the cluster using the storm client, specifying the path to your jar, the classname to run, and any arguments it will use
mvn assembly:assembly storm jar trendingTopology.jar arg1 arg2 arg3 

Submition example of the storm topology

storm jar trendingTopology-1.0-SNAPSHOT-jar-with-dependencies.jar master2015.Top3App "es,en,pl,ar" "node2:2181" "60,30" "trending-topologya" "/home/masteruser1" 

About

Trending topic Twitter API via APACHE Kafka & Storm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages