How to Set Up Centralized Logging in Linux — A Practical, Step-by-Step Guide

By VPS.DO
December 9, 2025

Centralized logging is a foundational capability for modern server operations and DevOps workflows. Collecting logs from multiple Linux servers into a single, searchable store simplifies troubleshooting, security investigations, compliance reporting, and capacity planning. This article presents a practical, step-by-step approach to design and deploy a robust centralized logging system on Linux, with detailed technical guidance on components, protocols, parsing, storage, security, and operational management.

Why centralize logs: principles and objectives

At its core, centralized logging answers three questions: where logs are stored, how they are transported, and how they are processed and retained. A good design ensures:

Reliable transport — logs must reach the collector without loss (or with acceptable loss guarantees).
Structured processing — parsing and enrichment to make logs searchable and meaningful.
Scalable storage — indexing and retention policies to balance performance, cost, and compliance.
Security and integrity — encryption in transit, authentication, and tamper-evidence.
Operational visibility — health monitoring, alerting, and backup.

Common architectures and components

There are several established architectures. Below are the principal building blocks and common open-source solutions:

Log shippers / agents

Filebeat / Metricbeat (Elastic Beats): lightweight, efficient, forwards files or metrics.
Fluentd / Fluent Bit: flexible, with many plugins; Fluent Bit is lightweight for edge nodes.
rsyslog / syslog-ng: classic syslog daemons, support TCP/TLS and structured templates.
systemd-journald forwarders: forward journal logs to syslog or via journalctl to a shipper.

Ingest / pipeline

Logstash: powerful filters (grok, mutate, date, geoip), but heavier resource usage.
Fluentd: alternative to Logstash with many plugins.
Beats processors: lightweight ingest pipelines when using Elasticsearch.

Storage and search

Elasticsearch: full-text search and analytics; commonly paired with Kibana.
Graylog: integrates Elasticsearch and MongoDB, provides a UI and alerting.
OpenSearch: fork of Elasticsearch offering similar capabilities.

Visualization and alerting

Kibana or OpenSearch Dashboards for dashboards and discovery.
Alerting via ElastAlert, Watcher (commercial), or built-in OpenSearch alerts.

Step-by-step setup: a practical deployment using rsyslog + Filebeat + ELK

The following steps describe a common, balanced configuration: system-level syslogs and application logs collected at the host, shipped securely to a central ELK stack for parsing, indexing and visualization.

1. Plan sizing and retention

Estimate log volume (GB/day) across hosts. Key factors:

Number of servers and apps, log verbosity (INFO/DEBUG).
Retention policy (days/months) based on compliance.
Indexing overhead (Elasticsearch needs ~1.2–2x raw data depending on mappings and replicas).

Example: 50 servers × 500 MB/day ≈ 25 GB/day. For 30-day retention and 1.5x indexing factor, plan ~1.125 TB storage plus headroom. Size JVM, CPU, and disk IOPS accordingly.

2. Central server: install and secure Elasticsearch + Logstash + Kibana

On your central collector (ideally dedicated VPS instances with sufficient RAM and fast disks):

Install Elasticsearch/OpenSearch, Logstash (or Fluentd), and Kibana.
Configure cluster settings: heap size (50% of RAM, max 30–32GB), discovery, and replicas.
Enable TLS between nodes and enable authentication (native users or LDAP).
Set up persistent volumes and snapshots to remote storage (S3-compatible) for backups and DR.

Basic Logstash pipeline example (pipeline.conf):

input { tcp { port => 5000 codec => json_lines ssl_enable => true ssl_certificate => “/etc/ssl/certs/elk.crt” ssl_key => “/etc/ssl/private/elk.key” } }
filter { grok { match => { “message” => “%{COMBINEDAPACHELOG}” } } date { match => [ “timestamp” , “dd/MMM/YYYY:HH:mm:ss Z” ] } }
output { elasticsearch { hosts => [“http://127.0.0.1:9200”] index => “logs-%{+YYYY.MM.dd}” } }

3. Generate and deploy certificates

Use a CA (internal or public) to create server and client certificates. Use TLS for both shipper-to-ingest and node-to-node communication.

Generate CA key and cert.
Create server certificates (Logstash/Elasticsearch) signed by the CA.
Create client certificates for agents that require mutual TLS (optional but recommended).

4. Configure Linux hosts: system logging and shipper

Choose an agent: Filebeat for file logs and system logs, or rsyslog for syslog forwarding. Example with Filebeat:

Install Filebeat on each host.
Enable modules for system: filebeat modules enable system.
Configure output to Logstash with TLS and authentication:

filebeat.yml snippet:

filebeat.inputs: – type: log paths: – /var/log/*.log
output.logstash: hosts: [“elk.example.com:5044”] ssl.certificate_authorities: [“/etc/pki/ca/ca.crt”] ssl.certificate: “/etc/pki/client/client.crt” ssl.key: “/etc/pki/client/client.key”

For journald logs: use systemd-journal input in Filebeat or forward via rsyslog to a local file and let Filebeat pick it up.

5. Parsing and enrichment

Parsing makes logs searchable by fields. Use Logstash or ingest pipelines in Elasticsearch (faster, lighter). Common techniques:

Grok patterns for unstructured text (webserver logs, app logs).
Dissect for faster field splitting when structure is predictable.
GeoIP and user-agent enrichments for security analytics.
Use date filters to parse timestamps to @timestamp for proper time-series indexing.

6. Index management and lifecycle

Configure Index Lifecycle Management (ILM) to automate rollover, shrink, and delete:

Create hot-warm-cold tiers if you have multiple node types to optimize cost.
Set rollover size (e.g., 50GB) and max age (e.g., 1d) to keep indices manageable.
Define policies for snapshotting older indices to cheaper storage.

7. Security and access control

Implement role-based access control (RBAC) and limit who can view or modify indices. Steps:

Enable TLS and basic authentication on Kibana/Elasticsearch.
Create read-only roles for analysts, write roles for ingest pipelines.
Log audit events (admin actions, config changes) to a separate index with stricter retention.

8. Monitoring and alerting

Monitor the logging stack itself: JVM heap, GC times, disk usage, CPU, and ingestion latency. Useful tools:

Elasticsearch monitoring APIs and Metricbeat dashboards.
Alerting on high ingestion lag, disk threshold, or high 5xx rates in application logs.

Application scenarios and best practices

Centralized logging supports multiple use cases. Below are scenarios and operational tips.

Operations and troubleshooting

Use centralized search to trace requests across microservices by correlating request IDs.
Index custom fields (environment, service, version) to narrow search scope.

Security and forensics

Ingest firewall logs, authentication logs, and IDS alerts into the same system for cross-correlation.
Enable immutable indices or WORM storage for tamper-evidence if required by compliance.

Business analytics

Extract metrics (response times, error rates) and create dashboards for product and business owners.

Advantages compared to other approaches

Centralized logging offers several advantages over ad-hoc log storage on individual servers or temporary log aggregation tools:

Unified search across all services without SSHing into servers.
Faster Mean Time To Repair (MTTR) by enabling cross-service correlation and historical context.
Retention and compliance are enforced centrally, simplifying audits.
Scale-out capability by adding nodes or using managed services.

Trade-offs include operational overhead and cost — you must budget for cluster resources and backups. For smaller teams, managed logging services or hosted Elasticsearch/OpenSearch can reduce overhead.

Selection guidance: choose the right stack

When selecting technologies and hosting options, consider the following factors:

Volume and velocity of logs — high throughput favors agents like Filebeat + ingest pipelines.
Resource constraints — lightweight agents (Fluent Bit, Filebeat) for resource-limited VPS instances.
Compliance needs — choose storage and retention policies that meet legal/regulatory requirements.
Operational expertise — if you lack Elasticsearch operational experience, consider managed solutions or simpler stacks like Graylog.
Geographic distribution — if servers are dispersed, deploy regional collectors to reduce latency and then forward aggregated data.

Operational recommendations and hardening

Use compression (gzip/snappy) between shipper and server to reduce bandwidth.
Throttle logging for noisy sources via application-side rate limiting or shipper processors.
Keep logs immutable where possible and restrict deletion rights.
Regularly test restore procedures from snapshots to ensure backups are usable.

Summary and next steps

Centralized logging transforms raw logs into a searchable, actionable resource for operations, security, and business intelligence. A practical deployment balances reliable transport (TLS, persistent queues), smart parsing (grok/dissect, ingest pipelines), scalable storage (Elasticsearch + ILM), and robust monitoring. Start with a pilot: collect logs from a representative set of hosts, build parsing rules for your common log formats, and iterate on retention and resource sizing as you observe real traffic.

For small teams or to reduce operational burden, consider hosting your logging stack on reliable VPS providers. If you’re looking for fast, geographically distributed instances to run collectors or small clusters, check out the VPS.DO offerings. Learn more about their services at VPS.DO and, in particular, their USA VPS options at https://vps.do/usa/, which are well-suited for deploying logging collectors and ELK components with competitive resource options.

How to Set Up Centralized Logging in Linux — A Practical, Step-by-Step Guide