SQLite3 versus Ext4 - A Simple Benchmark on Ubuntu

🔍 Introduction

SQLite is a popular lightweight database engine often used in embedded systems, mobile apps, and local desktop applications. But is it actually faster than plain file I/O on Linux filesystems like ext4?

In this blog post, I’ll benchmark SQLite vs. raw CSV file operations (read/write/filter) on an Ubuntu system running ext4. I’ll also explain how SQLite interacts with the filesystem, why it might be slower or faster, and under what conditions its advantages become obvious.


📦 Filesystem and Setup

  • Operating System: Ubuntu 22.04 LTS
  • Filesystem: EXT4 (journaling filesystem)
  • Disk: SSD
  • Python: 3.10+
  • SQLite version: 3.37+

🧠 How SQLite Works with the File System

SQLite stores all its data in a single disk file, which it manages internally using a custom B-tree structure. When you insert or query data, SQLite:

  • Opens a file descriptor to the .db file
  • Reads and writes data pages (typically 4KB each)
  • Uses journaling or WAL (Write-Ahead Logging) to ensure atomicity and durability
  • Issues fsync() calls to flush changes to disk safely

🔍 What About ext4?

  • EXT4 is a journaling filesystem.
  • When you write a text file (e.g. CSV), it’s written sequentially into a file.
  • There’s no structure, indexing, or optimization for queries or concurrent access.
  • Ext4 doesn’t understand your data — it’s just a byte stream.

⚔️ SQLite vs. ext4 CSV File: When Is SQLite Faster?

Operation SQLite CSV + ext4
Sequential Write Slower (has transaction overhead) Faster (direct write)
Full Read Comparable Comparable
Filtered Query ✅ Much Faster (uses SQL engine) ❌ Slow (has to scan everything)

SQLite becomes faster when structured access is needed — for example:

  • Filtered queries (WHERE, LIMIT)
  • Joins
  • Concurrent readers/writers
  • Frequent inserts + reads
  • Querying subsets of large data

🧪 Benchmarking on ext4 with Python

Let’s test 3 operations:

  1. Full write of 100,000 records
  2. Full read
  3. Filtered query (WHERE id > 99000)

🧬 Benchmark Code

import sqlite3
import time
import csv
import os

N = 100000
DATA = [(i, f"Name_{i}") for i in range(N)]

# --------- WRITE BENCHMARKS ---------
def benchmark_file_write():
    filename = "test_data.csv"
    with open(filename, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(["id", "name"])
        writer.writerows(DATA)

def benchmark_sqlite_write():
    db = "test.db"
    conn = sqlite3.connect(db)
    conn.execute("PRAGMA journal_mode=WAL;")
    conn.execute("PRAGMA synchronous=NORMAL;")
    c = conn.cursor()
    c.execute("DROP TABLE IF EXISTS users")
    c.execute("CREATE TABLE users (id INTEGER, name TEXT)")
    c.executemany("INSERT INTO users VALUES (?, ?)", DATA)
    conn.commit()
    conn.close()

# --------- READ BENCHMARKS ---------
def benchmark_file_read():
    filename = "test_data.csv"
    start = time.time()
    with open(filename, 'r') as f:
        reader = csv.reader(f)
        next(reader)
        data = [row for row in reader]
    end = time.time()
    return end - start

def benchmark_sqlite_read():
    db = "test.db"
    start = time.time()
    conn = sqlite3.connect(db)
    c = conn.cursor()
    c.execute("SELECT * FROM users")
    data = c.fetchall()
    conn.close()
    end = time.time()
    return end - start

# --------- FILTERED QUERY BENCHMARKS ---------
def benchmark_file_query():
    filename = "test_data.csv"
    start = time.time()
    with open(filename, 'r') as f:
        reader = csv.reader(f)
        next(reader)
        data = [row for row in reader if int(row[0]) > 99000]
    end = time.time()
    return end - start

def benchmark_sqlite_query():
    db = "test.db"
    conn = sqlite3.connect(db)
    c = conn.cursor()
    start = time.time()
    c.execute("SELECT * FROM users WHERE id > 99000")
    data = c.fetchall()
    end = time.time()
    conn.close()
    return end - start

# --------- RUN ALL BENCHMARKS ---------
benchmark_file_write()
benchmark_sqlite_write()

file_read_time = benchmark_file_read()
sqlite_read_time = benchmark_sqlite_read()

file_query_time = benchmark_file_query()
sqlite_query_time = benchmark_sqlite_query()

# --------- CLEANUP ---------
os.remove("test_data.csv")
os.remove("test.db")

# --------- PRINT RESULTS ---------
print(f"\n=== Read Benchmark ===")
print(f"File read time:   {file_read_time:.4f} seconds")
print(f"SQLite read time: {sqlite_read_time:.4f} seconds")
read_diff = file_read_time - sqlite_read_time
read_percent = (read_diff / file_read_time) * 100
if read_percent > 0:
    print(f"SQLite is {read_percent:.2f}% faster than file system (read)")
else:
    print(f"File system is {-read_percent:.2f}% faster than SQLite (read)")

print(f"\n=== Filtered Query Benchmark ===")
print(f"File query time:   {file_query_time:.4f} seconds")
print(f"SQLite query time: {sqlite_query_time:.4f} seconds")
query_diff = file_query_time - sqlite_query_time
query_percent = (query_diff / file_query_time) * 100
if query_percent > 0:
    print(f"SQLite is {query_percent:.2f}% faster than file system (filtered query)")
else:
    print(f"File system is {-query_percent:.2f}% faster than SQLite (filtered query)")

📊 Benchmark Results (on EXT4)

=== Read Benchmark ===
File read time:   0.0305 seconds
SQLite read time: 0.0304 seconds
SQLite is 0.25% faster than file system (read)

=== Filtered Query Benchmark ===
File query time:   0.0188 seconds
SQLite query time: 0.0028 seconds
SQLite is 85.12% faster than file system (filtered query)

📌 Interpretation

  • Write performance: Flat file wins — it’s raw and simple.
  • Full read: No major difference. SQLite and CSV both stream quickly.
  • Filtered query: SQLite dominates — thanks to efficient query execution and internal optimizations.

🧠 Final Thoughts

SQLite isn’t faster in all cases. In fact, flat files on ext4 are faster for simple write-heavy tasks. But when structure, queries, or filtered access matter, SQLite wins by a large margin.

When to Use CSV + ext4:

  • Append-only logs
  • Archival exports
  • Simplicity and portability

When to Use SQLite:

  • Filtering/querying data
  • Frequent reads + writes
  • Need for data integrity
  • Multi-user access

✅ Summary

Feature CSV on ext4 SQLite
Write speed ✅ Very fast ❌ Slower (WAL/ACID)
Full read ⚖️ About equal ⚖️ About equal
Filtered queries ❌ Very slow ✅ Very fast
ACID compliance ❌ No ✅ Yes
Query language ❌ None ✅ SQL
Binary format ❌ Plaintext ✅ Compact/efficient

If you’re running on ext4 and need structured queries, go with SQLite. If you just need to dump and store data as fast as possible, CSV may be fine.

打赏一个呗

取消

感谢您的支持,我会继续努力的!

扫码支持
扫码支持
扫码打赏,你说多少就多少

打开支付宝扫一扫,即可进行扫码打赏哦