SAP Basis Operations: The Fine Line Between a Running System and a Resilient One

Beyond the Checklist: Expertise Forged by Experience

This article was born in the “kitchen” of the industry—written by those who have spent years in air-conditioned server rooms, had their sleep interrupted by midnight “Update terminated” messages, and experienced firsthand just how deep the rabbit hole of “The system is slow” complaints can go.

In the world of SAP Basis, junior colleagues are often handed a standard daily checklist: Check SM21, monitor ST22, verify backups, and consider the job done. While following these routines usually means your system is “up and running,” there is an invisible chasm between a merely functional system and a truly agile, resilient architecture—a gap that typically only becomes visible during a major system outage.

A resilient system is not just one that doesn’t throw errors; it is a system managed by an expert who can sense a problem by its “scent” before it even arrives. Let’s move beyond the clichés and dive into the layers of expertise shaped by years of hands-on experience.


1. The “Clean System Log” Fallacy: Listening to Silent Screams

It is common to see administrators enjoying their morning coffee in peace just because they didn’t see any red lines in SM21. However, a true senior professional focuses not on what the log “says,” but on what it “doesn’t say.”


A Real-World Case:

Once, our system logs were pristine, yet users were reporting random “connection lost” errors. None of the classic checks revealed anything. Upon a deeper dive, we discovered a network switch experiencing packet loss. SAP didn’t perceive this as a system error; it merely swallowed it as “tolerable network latency.”


Life-Saving Tip:

Monitor statistical deviations, not just explicit errors. If a system that usually generates a certain number of dumps (ST22) suddenly shows zero activity one morning, it doesn’t necessarily mean the system has “healed.” Sometimes, a database connection failure can leave the system so “blind” that it cannot even generate a dump. For instance, when you see a critical dump like TSV_TNEW_PAGE_ALLOC (memory allocation failure), recognize that this isn’t just a code error; it’s a sign that the system is literally “suffocating” at that moment.

The Expert Reflex:

When the logs are suspiciously “silent,” I ask myself the following:

  • How many events does this system normally generate per day?
  • Are metrics like network latency, enqueue wait, and dialog response time shifting in unison?
  • Is SAP logging this as an error, or is it masking the issue as a “tolerable delay”?

A system that produces zero errors is not proof of perfection; it is often evidence that the monitoring layer has become detached from the system’s actual state. Treat silence in the logs not as a sign of peace, but as a signal to be on high alert.

2. The Black Box of Performance: ST04 and DB02

“The database space is at 90%, let’s add more disk.” This is the logic that merely keeps a system running. The resilient logic, however, asks: “Why did this space fill up?”


Case: Bloated Tables and the Archiving Gap

In SAP HANA systems, while memory is expensive, the real trouble often begins with disk space and the corruption of database statistics. We have seen cases where IDOC tables (such as EDIDC and EDID4) or RFC log tables (ARFCSSTATE, ARFCRSTATE) reach hundreds of millions of rows. While this doesn’t create immediate “memory garbage” in HANA, as data volume grows, the database statistics become sluggish. A simple SELECT query then begins to strain the system as indexes bloat and the query transforms into “expensive SQL.”


Risk and Solution:

Risk: A system without an archiving strategy will eventually drown in “Expensive SQL” queries, regardless of how powerful your hardware is.

Solution: Analyze the top 20 largest tables once a month. If a log table is growing uncontrollably, it indicates an accumulation of “technical debt.” The solution is not adding more disk; it is managing the data lifecycle.

Expert Note:

Beyond ST04 and DB02, it is essential to regularly review Early Watch Alert (EWA) reports. Think of EWA as a “health report” where SAP scans the internal organs of your system for you. A yellow or red warning seen there today can manifest as a major crisis just a few weeks later.


The Life-Saving Detail:

The exact same table might cause no issues mid-month but could lock the system during the month-end closing. This is why I focus on the growth velocity and peak intensity times rather than just the total size. The question “When does this table grow?” is more critical than “How big has it grown?”


3. Hanging Jobs (SM37): The “Time-Bomb” Jobs

In a system that just “runs,” jobs are simply “Finished” or “Cancelled.” In a resilient system, jobs must be “Meaningful.” In the SAP world, a job sometimes neither finishes nor throws an error; it stays “Active” and holds system resources hostage.


A Note of Irony:

Developers often love to say, “I set the job to run every 5 minutes; if one doesn’t finish, the next one will start.” In the Basis world, this translates to: “I’ve set a time-bomb, and I have no idea when it will go off.”


Suggestion:

These “hanging” jobs fill up BTC (Background) work processes, causing system operations to grind to a halt. You must track the execution durations of critical jobs. If a job that normally finishes in 10 minutes suddenly takes 2 hours, something is wrong, even if the job eventually shows a “Finished” status.


The Expert Reflex:

I don’t panic if a job duration spikes once. But if it grows by 10% every week, I put that job under the radar before anyone complains. Jobs usually don’t kill a system on the day they fail; they kill it during the weeks they were slowly becoming sluggish.

Further Reading

SAP architecture is not just a diagram. Discover how to build a resilient SAP ecosystem through operational discipline, effective error management, and the realities of live systems.

4. Buffers and Architectural Nuances (ST02)

In classic databases (Oracle, MSSQL, etc.), the “Swap” values in the ST02 screen serve as a report card, showing how much unnecessary disk activity is occurring. However, in the SAP HANA era, we must significantly update our perspective.

While ST02 remains essential in HANA environments, the focus shifts away from traditional “swapping.” Instead, expertise is demonstrated by monitoring the Buffer Hit Ratio in tandem with memory metrics in the DBA Cockpit. If you consistently observe a low hit ratio in ST02, it indicates that your system is repeatedly “bothering” the database for data rather than reading it from the application buffer.

True Expertise: The ability to merge ST02 observations with DBA Cockpit data to provide a definitive answer to the question: “Where exactly is the bottleneck?”


5. The Most Dangerous Mistake: “I Took a Backup”

The oldest and most accurate joke in the Basis world is this: “It’s not the Basis administrator who takes the backup who sleeps soundly, but the one who can restore it.”

A Real-World Risk

Do not blindly trust the “Backup Successful” green light. On a weekly or monthly basis, perform Restore Tests, preferably in a non-production (test) environment. Backup software can sometimes skip specific datafiles due to permission errors but still report “Successful” in the general log. If a backup for a 10 TB system results in a 100 GB file, that isn’t a miracle—it’s a disaster waiting to happen.

A Niche Example

At a very large retail corporation, the backup software reported “Backup Successful” every single day. However, during a Disaster Recovery (DR) test, we discovered that while the database logs were being backed up, a partition containing the actual datafiles was excluded due to a permission error. The software had flagged this as a “warning” rather than a “failure,” which went unnoticed.


The Checklist Trap

In most checklists, backup verification is just a single line item. But in the real world, a backup is not an answer to the question “Was it taken?” Instead, it must answer:

  • “Under which scenario can we restore?”
  • “How long will it take?”
  • “Who is authorized to perform the restore?”

6. S/4HANA and the 2026 Reality: Hybrid Expertise

We are in 2026, and the era of the Basis administrator who “only performs installations” has undergone a fundamental transformation. While traditional skills remain our core pillar, we must now layer BTP (Business Technology Platform), Cloud Connector, and Advanced Security (SNOTE/Hot News) on top of that foundation.

A Basis professional must now act as a Connectivity Architect. In a landscape integrated with cloud services, failing to track certificate expiries or neglecting to apply critical Security Notes (Hot News) is equivalent to leaving the system’s front door wide open to the outside world.


Running System vs. Resilient System
FeatureRunning System (Routine Management)Resilient System (Expertise)
MonitoringReacts only when an error occurs.Monitors trend deviations; prevents errors before they manifest.
Transport (STMS)Simply presses the “Import” button.Manages version conflicts and object dependencies.
BackupChecks the log and moves on.Validates the backup with regular restore tests.
SecurityGrants authorizations upon request.Proactively manages SNOTE (Hot News) and RFC security.
SchedulingDoes not consider peak times.Pre-simulates peak-hour overlaps and concurrency.


The Invisible Touch

A true SAP Basis expert is the one who, during high-pressure month-end closings or peak holiday sales traffic, can lean back and say, “The system is calm; we simulated every scenario”.

If you find yourself constantly “firefighting,” you are managing a routine, not practicing true expertise. Treat checklists as a “minimal requirement” and focus on the technical nuances the system whispers to you. After all, the devil is always hidden in those unexamined locks or unpurged log queues.

You Might Also Like These

SAP’s New Era in Log Management: Intelligent Monitoring & Root Cause Analysis
What Tech Professionals Really Want: Motivation Beyond Salary
A Practical Guide to SAP Basis Automation Scenarios for Saving Time
Basisci
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.