What is Elasticsearch and Why it is important?

 

Unlike other articles, here i am going to discuss about the Elasticsearch theoretical background in the form of questions and answers. I hope that this would be more appropriate way to emphasize what is Elasticsearch and its importance. So lets start.

 

1. What is ElasticSearch?

ElasticSearch is distributed full text search engine based on Apache Lucene. It is kind of NoSQL database stores the data as JSON Documents.  Lucene is the underlying technology that Elasticsearch uses for extremely fast data retrieval.

In ElasticSearch everything is indexed (It index everything in the document to make the search faster). Therefore it provides speed search operations across large data set. ElasticSearch is commonly used to build search engine because it is optimized for it.

 

ElasticSearch is built by targeting speed and scalability.

SPEED  : Elasticsearch is fast, really fast! Since everything is indexed, you’re never left with index envy. You search and retrieve the data in really speedy manner due the the inbuilt indexing support.

SCALABILITY : Distributed architecture of the ElasticSearch helps to scale it horizontally.

 

 

2. What are the advantages of ElasticSearch?

Speed Search

ElasticSearch is really fast because everything is indexed. Therefore complex search queries can be performed on larger data set and retrieve the data in lesser time (speedy manner)

Horizontally Scalable Architecture

ElasticSearch has a distributed architecture and it  supports the horizontal scaling with adding more Nodes. Elasticsearch cluster is not limited to a single machine, you can infinitely scale your system to handle higher traffic and larger data sets.

Easy development and integration

ElasticSearch client libraries are available for many programming languages such as Java, PHP, Python, .Net and much more. In addition, most of the frameworks provide their own libraries and implementations for the accessing the ElasticSearch.  Therefore it is easy to develop the application that uses ElasticSearch for search and data retrieval operations. In addition, it provides a REST api to access data and perform data search queries. 

 

 

3. What are the limitations of ElasticSearch with compared to other NoSQL databases?

No Transaction support

ElasticSearch is not a transactional database and it does not support transactions.      Therefore it may lose data and data integrity issues may arise.

Slow on inserting new data

Adding new data may require to create or update indexes. Therefore insertion operation may take a considerable amount of time with compared to other NoSQL databases.

Lack of authentication and authorization.

No default (built in free) authentication and authorization module. You have to buy plugins (like X-Pack) if you need any authentication and authorization support.

 

 

4. Can we use the ElasticSearch as the main application database?

There is no direct answer for this question. It depends on the type of application that you are building. ElasticSearch is not a transactional database and does not support transactions. On the other hand, it slows on inserting data  as the indexes need to be updated.

If you are building a search engine and that does not need any database transactional support, you can go head with Elasticsearch as the primary data storage. But in real world, it is hard to find such applications. Every application has a some sort of transactional behaviour and database transactional  support is expected. Therefore Elasticsearch is used as a secondary storage that helps to improve the speed data search and retrieval.

In most of the cases, the recommendation is to use Elasticsearch as the secondary storage along with any transactional persistence storage.  Your primary storage can be any transactional database either relational or NoSQL. Elasticsearch is used to perform speed search operations in the application. Therefore if your application has a large data set and you need to perform the speed search operations on the data set, then you can migrate or replicate the relevant data set to the  ElasticSearch.  The other data can be kept remaining in the primary data storage which is your main persistence storage.

If you already have a working schema in SQL but you have slow search (not necessarily SQL’s strength), then maybe you copy searchable data to ElasticSearch and use it to perform searches.

 

5. What are the similarities of ElasticSearch and other NoSQL databases? (like mongoDB)

  • Store the data as JSON objects.
  • Allow querying the body of the JSON Objects.   

 

 

6. Why transactional support is much important in NoSQL databases?

In NoSQL databases, data is not normalized (they are denormalized). Unlike relational database, the data is not split up into multiple collections. In the relational database, the data will be normalized and will be split up into multiple tables. The related data will be connected and referred through foreign key constraints.

In NoSQL world, this is completely different. Data is denormalized and related referrencial data will be managed as nested documents.

E.g:- think about the employee and department relationship. Each employee document will contain the department (nested) document that has the information about the employee departments.

In NoSQL, data is duplicated in each document. Therefore whenever the data is changed, it need to be updated and reflected in all the required documents (multiple documents need to be updated). Since the elasticsearch does not support transactions, we cannot guarantee on the integrity of the data after performing such operations. This is because the partially updating of the documents will not rollback the operations (partially updated data) due to the lack of the transaction support. Therefore this will lead to arise data integrity issues.

 

 

7. What is the difference between mongoDB and the Elasticsearch?

mongoDB is a general purpose database that can be used as the main persistence storage. It provides transactional support. If we need to speed up the data retrieval, we can create the indexes on the collections. But It is optional and it is decided by the developer.

Elasticsearch is a full text based search engine and it does not provide transactional support. It performs speed search operations since everything (everything in the document)  is indexed. Therefore insertion operation is slower when compared to the insertion operation of the mongoDB. This is because the index need to be created/updated whenever new entry is added. In addition, everything is indexed by default and developer cannot control that behaviour.

 

 

8. Why do we need the ElasticSearch? Why can’t we use the full text search feature of the mongoDB ?

mongoDB also provides the full text search feature. It is ok to proceed with the mongodb’s full text search feature when you need a basic full text search feature on medium scale data set. If you plan to proceed with mongoDB, make sure that the indexes are created on the relevant fields.  

If the data set it too large and searching is complex, then it is better to go with Elasticsearch. It has been developed and optimized to work as a full text search engine with speed data retrieval.

 

Hope this gives you a proper introduction and overview of elasticsearch with  its importance.

Upload and Delete files with Amazon S3 and Spring Boot

Introduction

As a developer, i am pretty sure that you may have come across with scenarios where you need to store images (either user uploaded or application itself) of your application.  There are several possibilities to store file(s) as follows.

  • Store the file(s) somewhere in the hosting server where the application is deployed (if it is a web application).
  • Store the file(s) in the database as binary files.
  • Store the file using cloud storage services.

Here we are going to evaluate the third option (given above) which is “Store the file using cloud storage services“.

Amazon Simple Storage Service (S3) is an AWS object storage platform which helps you to store the files in form of objects, and, store and retrieve any amount of data from anywhere.  Each file stored in Amazon S3 (as an object) is represented using a key.

 

 

Spring Boot Application and Amazon S3 Cloud  

 

Untitled Diagram (12).png

AWS Java SDK supports various APIs related to Amazon S3 service for working with files stored in S3 bucket.

 

 

Amazon S3 Account Configuration

Please follow the instructions given in the Amazon S3 official documentation for creating and configuring the S3 account and bucket.

Click here to visit the official documentation. 

I will list down the steps in below for your convenience.

Continue reading “Upload and Delete files with Amazon S3 and Spring Boot”

Message Driven Microservices with Spring Cloud Stream and RabbitMQ (Publish and Subscribe messages) – using @StreamListener for header based routing – Part 3

In this article, i am not going to explain the basics of Spring Cloud Stream OR the process of creating publishers and subscribers.  Those have been clearly described in the Part 1 and Part 2 of this article series.

It is possible to send messages with headers. In the receiving end (consumer application), there can be multiple message handlers (@StreamListener annotated methods) those accepts messages based on the headers of the message.

A copy of the message will be sent to every handler method and they will accept the message if it matches the given condition . The condition is a SpEL expression (Spring Expression Language) that performs checks on header values.  The sample condition is given as follows.

e.g:-

@StreamListener(target = OrderSink.INPUT,condition = "headers['payment_mode']=='credit'")

(Please refer the source code the complete code)

In that way, you can use the headers to route messages (message routing) among multiple message handlers.  Here we will look at, how to deliver the messages to the correct recipient based on the header.

Continue reading “Message Driven Microservices with Spring Cloud Stream and RabbitMQ (Publish and Subscribe messages) – using @StreamListener for header based routing – Part 3”

Message Driven Microservices with Spring Cloud Stream and RabbitMQ (Publish and Subscribe messages) with custom bindings – Part 2

In the previous part, we have tried Spring Cloud Stream pre-built component such as Sink, Source and Processor for building message driven microservices.

In this part, we will look at how to create custom binding classes with custom channels for publishing and retrieving messages with RabbitMQ.

 

Setting up the publisher application

The publisher application is almost similar as the previous article except the bindings and related configurations.

The previous article uses Source class (Spring Cloud Stream built-in component) for configuring the output message channel (@Output) for publishing messages.  Here we are not going to use the built-in component and we will be developing a custom output binding class to build and configure the output message channel.

 

we have declared a custom Source class with “orderPublishChannel” as the output message channel.

Now we need to bind this OrderSource class in the OrderController.

 

source.create() will configure the output message channel whose name is “orderPublishChannel“.  The published messages will be delegated to the RabbitMQ exchange through the “orderPublishChannel“. 

We need to change the application.properties based on the channel name as follows.

spring.cloud.stream.bindings.orderPublishChannel.destination=orders-exchange

 

Now we have completed the development of the publisher application with custom source bindings for publishing messages.  Lets move forward with developing the consumer application.

 

Setting up the consumer application.

The consumer application is almost similar as the previous article except the bindings and related configurations.

The previous article uses Sink class (Spring Cloud Stream built-in component) for configuring the input message channel (@Input) for retrieving messages.  Here we are not going to use the built-in component and we will be developing a custom input binding class to build and configure the input message channel.

Continue reading “Message Driven Microservices with Spring Cloud Stream and RabbitMQ (Publish and Subscribe messages) with custom bindings – Part 2”

Message Driven Microservices with Spring Cloud Stream and RabbitMQ (Publish and Subscribe messages) with Sink, Source and Processor bindings – Part 1

 

What is Spring Cloud Stream?

Spring Cloud Stream is a framework that helps in developing message driven or event driven microservices. Spring Cloud Stream uses an underlying message broker (such as RabbitMQ or Kafka) that is used to send and receive messages between services.

When i am writing this article, there are two implementations of the Spring Cloud Stream.

  1. Spring Cloud Stream implementation that uses RabbitMQ as the underlying message broker.
  2. Spring Cloud Stream implementation that uses Apache Kafka as the underlying message broker.

 

High Level Overview of Spring Cloud Stream

 

application-core.png

source:- https://ordina-jworks.github.io/img/spring-cloud-stream/application-core.png

An application defines Input and Output channels which are injected by Spring Cloud Stream at runtime. Through the use of so-called Binder implementations, the system connects these channels to external brokers.

The difficult parts are abstracted away by Spring, leaving it up to the developer to simply define the inputs and outputs of the application. How messages are being transformed, directed, transported, received and ingested are all up to the binder implementations. (e.g:- RabbitMQ or Kafka)

Continue reading “Message Driven Microservices with Spring Cloud Stream and RabbitMQ (Publish and Subscribe messages) with Sink, Source and Processor bindings – Part 1”

Spring Cloud Config : Using Git Webhook to Auto Refresh the config changes with Spring Cloud Stream, Spring Cloud Bus and RabbitMQ (Part 3)

 

You can refer the previous parts of this article as follows.

Click here for Part 1 

Click here for Part 2

 

The Problem

In the previous article (Part 2 of this series),  we have discussed how to use Spring Cloud Bus to broadcast the refresh event ( /actuator/bus-refresh) across all the connected services. In here the refresh event should be manually triggered on any service that is connected to the Spring Cloud Bus. (You can select any service as you wish. The only requirement is that it should connect to the Spring Cloud Bus).

The main problem here is that whenever the properties are changed, the refresh event should be manually triggered. Even if it is for just one service, it is still a manual process. What will happen if the developer forgets to manually trigger the refresh event after updating the properties in the remote repository? 

Wouldn’t be nicer if there is any way to automate this refresh event triggering  whenever the remote repository is changed. In order to achieve this, the config server may need to listen for the events of the remote repository.  This can be done with webhook event feature provided by the remote repository providers.

 

 

The Solution

Here is the architecture of the proposed solution.

 

Untitled Diagram (10).png

Continue reading “Spring Cloud Config : Using Git Webhook to Auto Refresh the config changes with Spring Cloud Stream, Spring Cloud Bus and RabbitMQ (Part 3)”

Spring Cloud Config : Refreshing the config changes with Spring Cloud Bus (Part 2)

You can refer the part 1 of this article as follows.

Click here for Part 1 

 

The Problem

The previous article (click here to visit it) has described how to use Spring Cloud Config Server as a centralized location for keeping the configuration properties related to the application services (microservices).  The application services will act as Config Clients who will communicate with Config Server to retrieve the properties related to them.

If any property is changed, the related service need to be notified by triggering a refresh event with Spring Boot Actuator (/actuator/refresh). The user will have to manually trigger this refresh event. Once the event is triggered, all the beans annotated with @RefreshScope will be reloaded (the configurations will be re-fetched) from the Config Server.

In a real microservice environment, there will be a large number of independent application services. Therefore is it not practical for the user to manually trigger the refresh event for all the related services whenever a property is changed.

Continue reading “Spring Cloud Config : Refreshing the config changes with Spring Cloud Bus (Part 2)”