• Products
    LOG COLLECTOR
    NXLog Enterprise Edition
    Full feature multi-platform log collection
    NXLog Community Edition
    Open-source free log collector
    ADD-ONS FOR NXLOG ENTERPRISE EDITION
    NXLog Add-Ons
    Integration with various software
    AGENT MANAGER FOR NXLOG ENTERPRISE EDITION
    NXLog Manager
    Manage and monitor NXLog instances
    NXLog Minder
    Hyper-scalable, API-first agent management
    DATABASE FOR NXLOG ENTERPRISE EDITION
    Raijin Database Engine
    The schemaless SQL database for storing events
    more from nxlog
    Professional Services
    Compare NXLog EE and CE
  • Downloads
    NXLog Enterprise Edition
    Full feature multi-platform log collection
    NXLog Manager
    Manage and monitor NXLog instances
    NXLog Community Edition
    Open-source free log collector
  • Solutions
    Integrations
    With SIEM, Devices, SaaS...
    Specfic OS support
    AIX, Linux, FreeBSD
    SCADA/ICS
    Energy, Oil & Gas, Transport...
    Windows Event log
    Collect locally or remotely, ..
    DNS Logging
    Enterprise-grade DNS log...
    Log Collection Modes
    Agent-based, Agentless or Cloud
    Agent Management
    Agents management and monitoring
    FIM
    File Integrity Monitoring
    macOS Logging
    ULS events, Apple System Logs ...

    By Industry

    Financial Services
    Government & Education
    Entertainment & Gambling
    Telecommunications
    Medical & Healthcare
    Military & Defense
    Law Firms & Legal Counsel
    Industrial & Manufacturing
  • Partners
    Find a Reseller
    Look for our resellers worldwide
    Technology Ecosystem
    See all our partners and integrations
    Partner Program
    Join our community of partners
    Partner Portal →
  • Resources
    Documentation
    Products guides and integrations
    Blog
    Tutorials, updates and releases
    White papers
    Datasheets, infographics and more
    Videos
    Trainings and tutorial on specific topics
    Webinars
    Community events and webinars
    Community Forum →
  • Support
  • Why Nxlog
    About Us
    Our journey, team and mission
    Customers
    Testimonials and case studies
    Careers
    We are hiring!
    Contact Us →
Log In Sign Up
Request Trial
LOG COLLECTOR
NXLog Enterprise Edition
Full feature multi-platform log collection
NXLog Community Edition
Open-source free log collector
ADD-ONS FOR NXLOG ENTERPRISE EDITION
NXLog Add-Ons
Integration with various software
AGENT MANAGER FOR NXLOG ENTERPRISE EDITION
NXLog Manager
Manage and monitor NXLog instances
NXLog Minder
Hyper-scalable, API-first agent management
DATABASE FOR NXLOG ENTERPRISE EDITION
Raijin Database Engine
The schemaless SQL database for storing events
more from nxlog
Professional Services
Compare NXLog EE and CE
NXLog Enterprise Edition
Full feature multi-platform log collection
NXLog Manager
Manage and monitor NXLog instances
NXLog Community Edition
Open-source free log collector
Integrations
With SIEM, Devices, SaaS...
Specfic OS support
AIX, Linux, FreeBSD
SCADA/ICS
Energy, Oil & Gas, Transport...
Windows Event log
Collect locally or remotely, ..
DNS Logging
Enterprise-grade DNS log...
Log Collection Modes
Agent-based, Agentless or Cloud
Agent Management
Agents management and monitoring
FIM
File Integrity Monitoring
macOS Logging
ULS events, Apple System Logs ...

By Industry

Financial Services
Government & Education
Entertainment & Gambling
Telecommunications
Medical & Healthcare
Military & Defense
Law Firms & Legal Counsel
Industrial & Manufacturing
Find a Reseller
Look for our resellers worldwide
Technology Ecosystem
See all our partners and integrations
Partner Program
Join our community of partners
Partner Portal →
Documentation
Products guides and integrations
Blog
Tutorials, updates and releases
White papers
Datasheets, infographics and more
Videos
Trainings and tutorial on specific topics
Webinars
Community events and webinars
Community Forum →
Support
About Us
Our journey, team and mission
Customers
Testimonials and case studies
Careers
We are hiring!
Contact Us →
  • Loading...
Request Trial
August 9, 2022 databasesiemdeploymentcomparison

Raijin vs Elasticsearch

By John Kirch

Share
ALL SIEM STRATEGY SECURITY ANNOUNCEMENT DEPLOYMENT COMPLIANCE COMPARISON RSS

Log collection is most closely linked to enterprise security practices—​for example, aggregation and analysis in a SIEM. However, collecting certain logs for reasons other than security is often valuable. It may even be a requirement of your organization for the purposes of auditing, legal compliance, or data retention.

Storing all these logs in a database is the most efficient way to manage the data. Finding and managing logs stored as flat files or structured data can be challenging without a database. Even so, not all database engines are suited to ingesting and organizing vast amounts of log data. When choosing a database engine to manage your logs, consider the following "must-have" features:

Flexible schema

This is the number one "must-have" feature. Some log sources, such as those we associate with relational databases, exhibit an inflexible schema. However, suppose you consult Microsoft’s Windows Event Log Reference. In that case, you will soon realize that Microsoft’s XML-based documents to store event data cannot be easily mapped to a relational database schema. For instance, many Windows log sources contain an EventData field: a complex datatype that defines a nested data structure containing multiple subfields. If we examine Windows Sysmon events, the EventData field may have as few as three subfields or as many as 17 subfields, depending on the Event ID. Ideally, the database should be able to store such events without data loss and with minimum restructuring. Yes, a relational database could store such events in multiple tables joined by foreign key constraints. Still, a document database or a database engine that supports flexible schemas could do this without defining schemas manually for every log source.

Another core feature of analytics-based systems is their ability to aggregate data. Without flexible schemas, a database cannot aggregate logs from disparate log sources that monitor a common category of events. For instance, the only way to aggregate DNS events is to normalize the field/column names and their respective data types common to multiple log sources, such as query name, query type, query class, query severity, process ID, etc. On the other hand, fields/columns not common to multiple log sources, like client ID, query flags, Windows Domain, Provider GUID, etc., can contain valuable data you should not discard. Only a flexible schema can safeguard such log source-specific data.

Even for log sources with fixed schemas, you should consider the probability that their providers will change their schemas over time. Why wouldn’t a log source’s schema undergo updates to keep in step with its parent application? Do you really want to maintain schema changes for every log source you collect?

High ingestion rate

Ideally, log data should be ingested in real-time. Otherwise, consistently accumulating a backlog of events will not result in a sustainable log collection solution. A database infrastructure that cannot ingest all log data of interest in real time will become more of a liability than an asset.

Time series data optimization

One type of log data that is usually important exists as time series data. Unfortunately, most database engines are not equipped to handle this type of data efficiently. The most popular database engines optimized for time series data are MongoDB, Prometheus, and a newcomer, Raijin.

There are many other factors to consider, like total cost of ownership, scalability, ease of configuration, and reliability, to name a few. Still, these three basic requirements are essential and led to the selection of Raijin and Elasticsearch for this comparison. Although Raijin and Elasticsearch are very different in their design and goals, they share some important commonalities.

Introducing Raijin Database

Raijin is a columnar database that supports semi-structured data of varied forms, like logs. The impetus for creating Raijin was the need for a high-performance database engine tailor-made for log data that would also integrate easily and efficiently with NXLog agents. With native JSON support and a built-in REST API, there is no need to install any client-side tools, making it a perfect solution for integrating with web applications. For large organizations already using NXLog, Raijin is the logical choice for managing their logs because it provides the foundation needed for generating real-time analytics.

Under the hood, Raijin has been engineered for high performance. It leverages modern improvements in CPU design—​like optimized SIMD instructions (SSE2/AVX2)--to excel when working with vectorized data, which means it can process more data with fewer cycles. In addition, with its hybrid columnar data storage and block-level data compression, you will see a marked reduction in your data storage requirements and better performance with data ingestion and queries. Combined with vectorized execution, it can easily meet the data throughput that analytical workloads demand.

Perhaps the most exciting characteristic of the Raijin database engine is that it achieves its query performance without indexes. Most other database engines would only reach a fraction of their purported performance without explicit, user-defined index definitions.

Database types

Elasticsearch is a document-oriented database, which traditionally falls into the NoSQL category because its flexible schema supports nested data structures, similar to the XML-based documents Windows uses for event logs. On the other hand, Raijin is in a class all by itself with its flexible schemas, REST API, hybrid columnar data storage, and native SQL capabilities. It has many similarities with relational database engines. Because it doesn’t support nested data structures, Raijin stores its rows and columns of data in tables instead of documents. What sets it apart from the relational crowd is its unique ability to insert data into a newly-created table without first defining a schema, although it does support schema definitions. The other unique feature is its ability to ingest large batches of JSON data via its REST API, which gives it exceptional write performance. In addition to the standard INSERT INTO <table> SET syntax, it offers an alternative JSON syntax for inserting JSON-formatted data.

INSERT INTO events {
    "hostname":  "srv-04",
    "eventtime": "2015-06-24 12:20:12Z",
    "eventtype": "ERROR",
    "message":   "Database timeout!"
};

Query languages

Raijin natively speaks SQL. Like other popular NoSQL databases, native Elasticsearch queries follow JSON syntax. Still, the structure and reserved names used for the keys are vendor-specific, as shown in this unaltered example taken directly from Elastic’s documentation on Query and filter context:

Example 1. Example of a query and filter contexts

Below is an example of query clauses used in query and filter context in the search API. This query will match documents where all of the following conditions are met:

  • The title field contains the word search.

  • The content field contains the word elasticsearch.

  • The status field contains the exact word published.

  • The publish_date field contains a date from 1 Jan 2015 onwards.

GET /_search
{
  "query": { (1)
    "bool": { (2)
      "must": [
        { "match": { "title":   "Search"        }},
        { "match": { "content": "Elasticsearch" }}
      ],
      "filter": [ (3)
        { "term":  { "status": "published" }},
        { "range": { "publish_date": { "gte": "2015-01-01" }}}
      ]
    }
  }
}
1 The query parameter indicates query context.
2 The bool and two match clauses are used in the query context, which means that they are used to score how well each document matches.
3 The filter parameter indicates filter context. Its term and range clauses are used in the filter context. They will filter out documents that do not match but will not affect the score for matching documents.

Although Query DSL (Domain Specific Language) may be the best way to query for documents in Elasticsearch, Elastic is fully aware of the significant, long-term investments most larger organizations have made in SQL-based solutions. For this reason, they offer Elasticsearch SQL as a way to bridge this gap.

If a Raijin database contained the same documents used in the Elasticsearch example above, the equivalent SQL query shown here would work with Raijin right out of the box:

SELECT * FROM tbl
WHERE MATCH (title) AGAINST ('Search')
AND   MATCH (content) AGAINST ('Elasticsearch')
AND   status = 'published'
AND   DATE(publish_date) >= '2015-01-01';

Database performance

We conducted the following benchmarks to see how Raijin stacks up against Elasticsearch:

  • Data ingestion speed

  • Query processing speed

  • Average CPU usage

The CPU used in our tests that produced these benchmarks was an AMD Ryzen 7 4800H. We used the same set of 120,000 logs for benchmarking data ingestion and query processing speed.

Data ingestion speed

The Ingestion tests were performed using variable batch sizes (1-10-100-1000-10000) and client concurrency (1-2-3-4-5-6-7-8).

The test clients sent a fixed amount of data, recording the start and end time, then using the difference to calculate the resulting ingestion speed: number_of_records/elapsed_time.

The table below shows the maximum ingestion speed for Raijin and Elasticsearch. Using six threads, Raijin reached its maximum performance of 355,603 events per second, and Elasticsearch reached 43,282 events per second, representing a peak ingestion speed increase of 721% for Raijin.

Peak Processes Batch size

Raijin

355603 (EPS)

6

10000

Elastic

43282 (EPS)

8

10000

Multi-threaded peak ingestion speed between Raijin and Elasticsearch

Raijin’s single-thread performance proved to be more than sufficient. In the single-thread test case scenario, ingesting 100,000 records with a batch size of 10,000 events per HTTP request, Elasticsearch consumed 12,419 events per second. In contrast, Raijin ingested 114,058 events per second, showing a performance increase of 818% against Elasticsearch.

Single thread peak ingestion speed between Raijin and Elasticsearch

The effect of batch size

Elastisearch and Raijin support bundling multiple events or log entries in the same POST request; therefore, we examined the effect of request sizes on the throughput. The batch size is a configurable setting (BatchSize) in NXLog Enterprise Edition, so it’s essential to understand its effect on performance.

The measurements below were recorded with a single client with varying batch sizes.

Batch size Raijin Elasticsearch

1

494.15 EPS

170.68 EPS

10

4,563.75 EPS

1,458.41 EPS

100

31,244.90 EPS

6,879.46 EPS

1000

86,352.00 EPS

11,647.64 EPS

10000

114,058.59 EPS

12,419.30 EPS

Raijin vs Elasticsearch effect of batch size on data ingestion

The results show that both Elasticsearch and Raijin favor larger batch sizes, but the improvement is much more pronounced with Raijin.

The effect of client concurrency

We also took a look at how these databases scale with multiple clients. This is an important factor when designing a data delivery pipeline - the number of clients directly feeding into the database may need to be limited to maximize performance.

The measurements below were recorded with a fixed 10,000 event batch size.

Client count Raijin Elasticsearch

1

114,058.59 EPS

12,419.30 EPS

2

200,347.17 EPS

17,593.05 EPS

3

282,055.50 EPS

24,077.87 EPS

4

285,867.42 EPS

27,939.35 EPS

5

323,411.97 EPS

32,387.89 EPS

6

355,603.18 EPS

35,713.75 EPS

7

330,671.73 EPS

41,631.91 EPS

8

282,681.83 EPS

41,571.82 EPS

Raijin vs Elasticsearch effect of client count on data ingestion

Elasticsearch levels out after seven clients without a performance drop. Raijin peaks at six clients, with a marked decline in ingestion performance with higher concurrency. This result suggests that the number of clients feeding the database on the given system should not be higher than 6 for Raijin. Elasticsearch shows no performance drop with eight clients, and we did not notice a performance drop in the range of the tests.

Query processing speed

We executed 21 different queries against the database explicitly created for the tests, containing approximately 175 million records. Each query was run cold after a database restart.

Queries used in these tests
query1: SELECT AVG(case when c3 < 0 then c3 end) FROM logs GROUP BY uid
query2: SELECT AVG(c3) FROM logs GROUP BY uid
query3: SELECT COUNT(c3) FROM logs GROUP BY uid
query4: SELECT MAX(c3) FROM logs GROUP BY uid
query5: SELECT MIN(c3) FROM logs GROUP BY uid
query6: SELECT SUM(c3) FROM logs GROUP BY uid
query7: SELECT COUNT(c3) FROM logs WHERE uid = 24 GROUP BY uid
query8: SELECT COUNT(DISTINCT c3) FROM logs WHERE uid = 24 GROUP BY uid
query9: SELECT COUNT(*) FROM logs WHERE ip LIKE '32%.%.%.127'
query10: SELECT COUNT(c3) FROM logs WHERE c3 < 0
query11: SELECT COUNT(case when c3 < 0 then c3 end) FROM logs
query12: SELECT COUNT(CAST(c3 AS INT)) FROM logs WHERE c3 > -200
query13: SELECT COUNT(CAST(c3 AS FLOAT)) FROM logs WHERE c3 > -200
query14: SELECT COUNT(CAST(c3 AS DOUBLE)) FROM logs WHERE c3 > -200
query15: SELECT COUNT(CAST(c3 AS STRING)) FROM logs WHERE c3 > -200
query16: SELECT COUNT(c3) FROM logs WHERE MOD(c3,2) <> 0
query17: SELECT COUNT(c3) FROM logs WHERE MOD(c3,2) = 0
query18: SELECT FIRST(c3) FROM logs GROUP BY uid
query19: SELECT LAST(c3) FROM logs GROUP BY uid
query20: SELECT COUNT(IIF(success=false,True,null)) FROM logs where '2022-06-02'<time and time<'2022-06-02'
query21: SELECT COUNT(action) FROM logs where '2022-06-01'<time and time<'2020-06-30' and success=false

The measurements recorded database response times, so lower numbers are better.

Raijin vs Elasticsearch total test suite run time

Clearly, Raijin maintains a considerable lead for the total run time, but some of the queries favor Elasticsearch.

Raijin vs Elasticsearch query response times

To summarize these results, it is readily apparent that Raijin is a better choice for aggregation queries in general. Elasticsearch performed poorly in 11 of these 21 queries, taking over 20 seconds to return results. When executing six of these queries, Elasticsearch needed more than half a minute to return any results. The poorest performance was the roughly 55 seconds Elasticsearch needed to complete query0.

However, even when Elasticsearch was faster than Raijin while running query11 through query14 (probably due to the CAST function within COUNT()), Raijin’s poorer performance was still in the sub-5-second range, a far cry from taking more than half a minute to return results!

The biggest surprise was Elasticsearch performing better than Raijin on query20, which counts how many documents contain the action field and uses the time field to filter the results to a specific date range. However, the query only took under three seconds with Raijin, which is still a perfectly reasonable length of time.

Average CPU usage

Raijin vs Elasticsearch average CPU usage

Elasticsearch CPU usage increased with the number of CPU threads, reaching a maximum value of 637 cycles. On the other hand, Raijin reached a maximum of 485 cycles, which translates to a 31% lower average CPU usage than Elasticsearch.

On a side note, while benchmarking the query speeds, Raijin demonstrated lower CPU usage in 17 out of 21 tests, while Elasticsearch had high CPU usage in most tests.

Who is the winner?

Actually, both are winners. They both offer schemaless (schema flexibility) storage of log records, which is essential for maintaining the integrity of event logs. In addition, they both perform well when querying date/time ranges of time-series data.

However, each database engine has its area of expertise, which became evident during our performance testing. For example, if data ingestion rates are a problem in your enterprise, Raijin is the better choice for collecting and managing your logs. Also, Raijin’s performance with actual aggregation queries that involve a GROUP BY clause is phenomenal. So, if real-time data analytics is your goal, you will be better served using Raijin for that task.

Elasticsearch still reigns as the king of full-text search in the world of database engines. If full-text search performance is your most important feature, then you are probably already using Elasticsearch and should stick with it if that is your highest priority. For most use cases, Raijin’s full-text search performance is acceptable. However, the need to perform full-text searches will be significantly diminished when paired with NXLog’s powerful parsing capabilities. This is especially true once NXLog has parsed catch-all fields used for storing embedded but flattened, structured data and has sent these new fields to Raijin as dedicated columns in a flexible schema. Ideally, most log data should be parsed as structured data. The need for excessive full-text searches is often an indicator of a data model that could use some design improvements.

If you are unaware of this, Raijin offers a perk that you might find very appealing. Raijin can be downloaded for free and used as an on-premise database engine, meaning you are free to deploy it anywhere you like, and you can manage your logs without any subscription or data storage costs.

Solutions for common needs

Log storage and analytics

Most NoSQL solutions are inefficient or lack support for analytical queries. Raijin Database is optimized for high-volume log ingestion and exhibits exceptional query times when executing standard SQL aggregation queries. It can function as the aggregator and final resting place of logs in a centralized logging environment. It is the logical facilitator of enterprise-wide logging analytics.

Data retention compliance

With Raijin’s expression-based partitioning, you can create partitions dynamically based on a combination of date and other columns. Dynamic partitioning enables the precise selection of records to be purged based on your specific data retention requirements. For further details, including examples showing how easy it is to automate this process, see the Purging data guide in the Raijin Database Engine User Manual.

Summary

After looking at the benefits of storing log data in a database, we established some basic database features needed for effectively managing logs and gleaning analytics from their data, with both Raijin and Elasticsearch as contenders for handling these tasks.

The performance measured in this limited set of benchmarks may not be conclusive regarding overall capabilities since database engines are complex tools, each with its own design goals and optimizations for a set of tasks. However, given the challenge of finding the best overall database engine for collecting and aggregating logs in a centralized logging environment, we can only conclude that Raijin has a definite edge over Elasticsearch. Our criteria for this decision rely not only on the "must have" features we introduced early on but also on the everyday tasks that larger organizations engage with. For example, such engagements might be real-time high-performance log aggregation and data retention compliance mandates.

Raijin may be the new kid on the block and may not provide all the functionality other database engines have yet. Still, no single database engine is the perfect choice for every use case imaginable. If you want to collect logs in a database, why not try them both yourself? Go download Elasticsearch and set it up. Then download Kibana and set it up. Send Elasticsearch a relatively large set of events from your log sources, run some aggregation queries with a GROUP BY clause, and make notes of your overall experience. Then download Raijin and follow the Quickstart guide. Send Raijin the same logs you sent Elasticsearch, and run the same aggregation queries. Let us know which one was easier for you to deploy and how they integrated with your log sources. Please share your experience with us regarding the differences between both solutions regarding data ingestion and running queries.

GET STARTED TODAY:
CONTACT US Our experts are happy to help DOWNLOAD RAIJIN DATABASE Give Raijin Database a try RAIJIN USER GUIDE Learn more about Raijin
Disclaimer

While we endeavor to keep the information in this topic up to date and correct, NXLog makes no representations or warranties of any kind, express or implied about the completeness, accuracy, reliability, suitability, or availability of the content represented here.

The accurateness of the content was tested and proved to be working in our lab environment at the time of the last revision with the following software versions:

Raijin Database 0.9.3901 (M5)
Elasticsearch 8.3.1
Ubuntu 18.04.1

Last revision: 9 July 2022

NXLog Ltd. develops multi-platform log collection tools that support many different log sources, formats, transports, and integrations. The tools help administrators collect, parse, and forward logs so they can more easily respond to security issues, investigate operational problems, and analyze event data. NXLog distributes the free and open source NXLog Community Edition and offers additional features and support with the NXLog Enterprise Edition.

This document is provided for informational purposes only and is subject to change without notice. Trademarks are the properties of their respective owners.

  • raijin
  • elasticsearch
  • database
  • comparison
  • sql
Share

Facebook Twitter LinkedIn Reddit Mail
Related Posts

Using Raijin Database Engine to aggregate and analyze Windows security events
11 minutes | July 29, 2021
How NXLog can help meet compliance mandates
4 minutes | June 1, 2022
DNS Log Collection - Part 2
9 minutes | May 28, 2020

Stay connected:

Sign up

Keep up to date with our weekly digest of articles.

By clicking singing up, I agree to the use of my personal data in accordance with NXLog Privacy Policy.

Featured posts

Announcing NXLog Enterprise Edition 5.7
January 20, 2023
NXLog - 2022 in review
December 22, 2022
Need to replace syslog-ng? Changing to NXLog is easier than you think
November 23, 2022
The EU's response to cyberwarfare
November 22, 2022
Looking beyond Cybersecurity Awareness Month
November 8, 2022
GDPR compliance and log data
September 23, 2022
NXLog in an industrial control security context
August 10, 2022
Raijin vs Elasticsearch
August 9, 2022
NXLog provides native support for Google Chronicle
May 11, 2022
Aggregating macOS logs for SIEM systems
February 17, 2022
How a centralized log collection tool can help your SIEM solutions
April 1, 2020

Categories

  • SIEM
  • STRATEGY
  • SECURITY
  • ANNOUNCEMENT
  • DEPLOYMENT
  • COMPLIANCE
  • COMPARISON

Subscribe to our newsletter to get the latest updates, news, and products releases.

© Copyright 2023 NXLog Ltd.

PRIVACY POLICY TERMS OF USE

  • PRODUCTS

  • NXLOG ENTERPRISE EDITION
  • NXLOG COMMUNITY EDITION
  • NXLOG ADD-ONS
  • NXLOG MANAGER
  • NXLOG MINDER
  • RAIJIN DATABASE
  • MORE NXLOG

  • COMPARE SOLUTIONS
  • INDUSTRIES
  • INTERGRATIONS
  • FIND A RESELLER
  • PARTNER PROGRAM
  • RESOURCES

  • DOCUMENTATION
  • WHITE PAPERS
  • WEBINARS
  • TUTORIALS
  • BLOG
  • COMMUNITY FORUM
  • ABOUT US

  • WHY NXLOG
  • CUSTOMERS
  • CAREERS
  • CONTACT US
  • DOWNLOADS

  • NXLOG ENTERPRISE EDITION
  • NXLOG COMMUNITY EDITION
  • NXLOG MINDER
  • NXLOG MANAGER
  • NXLOG ADD-ONS
  • RAIJIN DATABASE