Spark Streaming with Twitter and Kafka

I created a Spark Streaming application that will continuously read data from Twitter, analyze them for their sentiment, and send the values to Apache Kafka. A pipeline using Elasticsearch and Kibana will read the data from Kafka and analyze it visually. :

LinkWithConfluent(It links with Kafka of Confluent’s cluster and checks whether the streaming works fine or not.) (To see the source code, please download this HTML file on your computer). ,
LinkWithTwitterAndAnalyzerAndProducerToKafka(It links with Twitter API, analyzes the API, predicts sentiment of each text in real-time, and produces messages to the topic of Kafka.) (To see the source code, please download this HTML file on your computer).

For this, I followed below steps.


<topic of Kafka on Confluent> – the “streaming_test_8” topic gets messages from twitter api in real time.
<ElasticSearchSinkConnector on Confluent> – it makes the kafka link with elastic cloud for visualization. 
<stream lineage of Confluent> – it visually shows streaming of data from producers, to downstream topics and consumers.
<Kibana graphical plot – data came from the “streaming_test_8” topic of Confluent Kafka>

The above graphical plot shows average “sentiment” with time. I used “covid” as search term. I set the sentiment value as “-1:negative, 0:central, 1:positive”.

In the date set, for example, “2021111103” means year:2021, month:11, day:11, hour:03.

From “2021111103” to “2021111104”, the average of sentiment value was 0.18.

From “2021111104” to “2021111105”, the average of sentiment value was 0.22.

From “2021111105” to “2021111106”, the average of sentiment value was 0.21.

From “2021111106” to “2021111107”, the average of sentiment value was 0.14.

Generally, all sentiment’s value of each date is positive value. And, the average value of covid is also positive. As I analyze the search term, these days, people don’t tend to consider “covid” seriously compared to the past.

Skills

Spark(pyspark), Kafka, Elasticsearch, Kibana