The CI/CD Pipeline Behind This Jekyll Blog

By Michael McGarrah · May 08, 202612 min read

Any content platform that publishes on a schedule, validates quality automatically, and scans for security regressions needs a CI/CD pipeline — even if the “platform” is a personal blog. This site runs three GitHub Actions workflows that form a feedback loop: build and deploy with scheduled future-post publishing, CodeQL security scanning of the workflow YAML itself, and an SEO health check that validates sitemaps, canonical URLs, structured data, and accessibility on every content push. Dependabot closes the loop on dependency hygiene across three package ecosystems.

Most Jekyll blogs on GitHub Pages use the default build. Push to main, GitHub builds it, done. That worked for me too — until I needed custom plugins, scheduled future posts, and wanted to stop deploying broken sitemaps.

Here’s how it all fits together.

The Pipeline at a Glance

graph LR
    A[git push to main] --> B[Jekyll Build & Deploy]
    A --> C[CodeQL Security Scan]
    A --> D[SEO Health Check]
    E[Daily 10:05 UTC] --> B
    F[Weekly Friday 02:43 UTC] --> C
    G[Weekly Monday 06:00 UTC] --> D
    H[Dependabot] --> I[PRs for dependency updates]
    I --> A

Workflow	File	Triggers	Purpose
Build & Deploy	`jekyll.yml`	Push, daily cron, manual	Build site, validate sitemap, deploy to GitHub Pages
CodeQL	`codeql.yml`	Push, PR, weekly cron	Security scanning of GitHub Actions workflows
SEO Health Check	`seo-health-check.yml`	Push (path-filtered), weekly cron, manual	Lighthouse CI, link checking, SEO validation
Dependabot	`dependabot.yml`	Daily (Actions, Bundler), weekly (npm)	Dependency update PRs

Workflow 1: Build and Deploy

This is the core workflow. It replaced the default GitHub Pages build in August 2024 when I needed custom plugins that aren’t in the GitHub Pages whitelist.

name: Deploy Jekyll site to Pages

on:
  push:
    branches: ["main"]
  schedule:
    - cron: '5 10 * * *'
  workflow_dispatch:

permissions:
  contents: read
  pages: write
  id-token: write

concurrency:
  group: "pages"
  cancel-in-progress: false

Why a Custom Build?

The default GitHub Pages Jekyll build is convenient but limiting:

No custom plugins — Only whitelisted gems run. My tag/category generator and Pandoc exports plugin need a custom build.
No Ruby version pinning — GitHub controls the Ruby version. I pin to 3.2.6 for reproducibility.
No build validation — The default build deploys whatever Jekyll produces. I validate the sitemap before deploying.

Scheduled Builds for Future Posts

The schedule trigger is the key feature that makes future-dated posts work:

schedule:
  - cron: '5 10 * * *'  # 10:05 UTC daily (6:05 AM EDT)

See GitHub’s scheduled events documentation for cron syntax details.

Jekyll’s future: false setting (the default) excludes posts with dates in the future from the build output. When a post’s date arrives, the next build picks it up. The daily cron at 10:05 UTC (6:05 AM EDT) means a post dated 2026-05-15 will go live within 24 hours of that date — close enough for a blog.

Without this, I’d have to manually push a commit or trigger a build on the day I want a post to go live.

The `--future` Flag Gotcha

During local development, bundle exec jekyll serve also respects future: false — future-dated posts won’t render locally unless you add the --future flag:

bundle exec jekyll serve --future

This confused me early on. I’d write a post with tomorrow’s date, run the local server, and the post wouldn’t appear. The future: false config setting only controls the build output, not whether the file is recognized. The --future flag overrides it for local previewing. In production, the daily cron handles it — you never need future: true in _config.yml.

Sitemap Validation

After discovering that a bad build once deployed a sitemap full of localhost URLs, I added a pre-deploy validation step:

- name: Validate sitemap URLs
  run: |
    if grep -q 'localhost' ./_site/sitemap.xml; then
      echo "::error::sitemap.xml contains localhost URLs"
      grep 'localhost' ./_site/sitemap.xml | head -5
      exit 1
    fi
    echo "Sitemap OK: all URLs use production domain"

This is a cheap check that has saved me at least once. The JEKYLL_ENV: production environment variable is also critical — without it, Jekyll may use development URLs.

Build Steps

The full build job:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v6
      - uses: ruby/setup-ruby@v1.300.0
        with:
          ruby-version: '3.2.6'
          bundler-cache: true
          cache-version: 1
      - uses: actions/configure-pages@v6
      - name: Build with Jekyll
        run: bundle exec jekyll build --baseurl "${{ steps.pages.outputs.base_path }}"
        env:
          JEKYLL_ENV: production
      - name: Validate sitemap URLs
        run: |
          if grep -q 'localhost' ./_site/sitemap.xml; then
            echo "::error::sitemap.xml contains localhost URLs"
            exit 1
          fi
      - uses: actions/upload-pages-artifact@v4

Key details:

bundler-cache: true — Caches installed gems between runs. Cuts build time significantly.
cache-version: 1 — Increment this to force a fresh gem install if the cache gets corrupted.
configure-pages — Sets the base path for GitHub Pages. Required for correct URL generation.
cancel-in-progress: false — Don’t cancel a running deployment if a new push arrives. Let it finish.

Evolution

The build workflow has been through nine commits since August 2024:

Aug 2024 — Initial creation, replacing default GitHub Pages build
Jan 2025 — Ruby version updates and deploy comments
May 2025 — Fix caching issues during build
Sep 2025 — Add scheduled daily builds for future posts
Sep 2025 — Dependabot bumps for checkout and upload-pages-artifact
Apr 2026 — Add sitemap localhost validation
Apr 2026 — Update all actions to Node 24 compatible versions

Workflow 2: CodeQL Security Scanning

name: "CodeQL Advanced"

on:
  push:
    branches: ["main"]
  pull_request:
    branches: ["main"]
  schedule:
    - cron: '43 2 * * 5'  # Weekly Friday at 02:43 UTC

What Does CodeQL Scan on a Jekyll Blog?

Not much, honestly. The initial setup in September 2025 included Ruby and JavaScript language analysis, but those were removed the same day — CodeQL’s Ruby analysis isn’t useful for Jekyll plugins (they’re too simple), and the JavaScript is mostly CDN-loaded.

What remains is GitHub Actions workflow analysis (language: actions), which scans the workflow YAML files themselves for security issues like:

Untrusted input in run steps
Missing permission restrictions
Vulnerable action versions
Script injection via ${{ }} expressions

strategy:
  fail-fast: false
  matrix:
    include:
    - language: actions
      build-mode: none

Is It Worth It?

For a static blog? Marginally. The Actions language scanner has caught zero issues so far. But it’s free, runs weekly, and takes under a minute. The real value is that it’s already configured — if I add more complex JavaScript or Ruby in the future, I can re-enable those language scanners with one line change.

Workflow 3: SEO Health Check

This is the most complex workflow and has its own dedicated article. Here’s the summary of what it validates on every content push and weekly:

Triggers

on:
  schedule:
    - cron: '0 6 * * 1'  # Weekly Monday at 6 AM UTC
  workflow_dispatch:
  push:
    branches: [main]
    paths:
      - '_config.yml'
      - '_layouts/**'
      - '_includes/**'
      - '_plugins/**'
      - '_posts/**'
      - '_sass/**'
      - 'assets/**'
      - 'robots.txt'
      - '.github/workflows/seo-health-check.yml'
      - '.lighthouserc.json'

The paths filter is important — this workflow only runs on pushes that change content or configuration, not on README edits or draft changes. This saves CI minutes.

What It Checks

The workflow builds the site, serves it locally, then runs a gauntlet of checks:

Lighthouse CI — Performance (≥0.8), accessibility (≥0.9), best practices (≥0.8), SEO (≥0.9 — hard fail). Runs 3 times per URL and averages. Blocks AdSense and Analytics scripts to get clean scores.
Canonical URL consistency — Every <link rel="canonical"> tag must use mcgarrah.org
Sitemap validation — Valid XML, correct domain, no www prefix
Sitemap index validation — References both blog and resume sitemaps
Robots.txt — Exists, references correct sitemap index
Meta tags — Description and Open Graph tags on homepage
RSS feed — Valid XML
Link checking — Lychee for broken links across all HTML
Structured data — JSON-LD presence
Image optimization — Missing alt text, oversized images (>500KB)
Content quality — Duplicate titles, duplicate meta descriptions, generic link text (“click here”, “read more”)
Mobile optimization — Viewport meta tag coverage
Accessibility indicators — Invalid anchors, small tap targets

Lighthouse Configuration

The .lighthouserc.json blocks ad and analytics scripts to get clean performance scores:

{
  "ci": {
    "collect": {
      "numberOfRuns": 3,
      "settings": {
        "blockedUrlPatterns": [
          "**/pagead/js/adsbygoogle.js*",
          "**/googlesyndication.com/**",
          "**/googletagmanager.com/**"
        ]
      }
    },
    "assert": {
      "assertions": {
        "categories:performance": ["warn", {"minScore": 0.8}],
        "categories:accessibility": ["warn", {"minScore": 0.9}],
        "categories:seo": ["error", {"minScore": 0.9}]
      }
    }
  }
}

SEO is the only hard fail (error). Performance and accessibility are warnings — I want to know about regressions but don’t want to block deploys over a 0.79 performance score.

The Canonical URL Bug

The SEO health check went through five rapid-fire commits on April 8, 2026 — all fixing the same class of problem. The canonical URL check was matching mcgarrah.org in blog post content (syntax-highlighted code examples), not just in <link> tags. The fix narrowed the grep to match only actual <link rel="canonical"> tags:

find _site -name "*.html" -exec grep -h '<link[^>]*rel="canonical"' {} \;

Lesson: when your blog posts contain code examples about your own blog’s configuration, your CI checks will match the examples. Always be specific in your grep patterns.

Dependabot

version: 2
updates:
  - package-ecosystem: "github-actions"
    directory: "."
    schedule:
      interval: "daily"
  - package-ecosystem: "bundler"
    directory: "./"
    schedule:
      interval: "daily"
  - package-ecosystem: "npm"
    directory: "./"
    schedule:
      interval: "weekly"

Three ecosystems, three schedules:

GitHub Actions (daily) — Catches action version bumps quickly. These are the most security-sensitive since they run with repo permissions.
Bundler (daily) — Jekyll and Ruby gem updates. The Gemfile.lock pins exact versions.
npm (weekly) — Tracks CDN library versions in package.json for security scanning, even though the actual libraries load from CDN at runtime. See the security dependency management post for why.

Dependabot creates PRs automatically. Each PR triggers the build and CodeQL workflows, so I get a test build before merging.

How the Pieces Interact

The workflows aren’t isolated — they form a feedback loop:

Dependabot creates a PR to bump actions/checkout from v5 to v6
The PR triggers CodeQL (scans the updated workflow YAML) and Build (tests the build with the new action version)
I merge the PR → push to main
Push triggers all three workflows: Build deploys, CodeQL scans, SEO Health Check validates
If the SEO check finds a regression (broken link, missing meta tag), I fix it and push again

The daily cron on the build workflow handles future-dated posts without any manual intervention. The weekly crons on CodeQL and SEO catch drift — a dependency that introduced a vulnerability, or an external link that went dead.

Cost

All of this runs on GitHub’s free tier for public repositories. The monthly usage is minimal:

Build & Deploy: ~30 runs/month (daily cron + pushes), ~2 min each
CodeQL: ~8 runs/month (weekly + pushes), ~1 min each
SEO Health Check: ~12 runs/month (weekly + content pushes), ~4 min each
Total: ~100 minutes/month, well within the 2,000 free minutes

What I’d Add Next

HTML-Proofer — More thorough internal link validation than my custom script
Pa11y — Automated accessibility testing beyond Lighthouse
Build time tracking — Alert if build time exceeds a threshold (currently ~90 seconds)
Deployment notifications — Slack or email on successful deploy of future-dated posts

Advanced Jekyll SEO Health Checks — Deep dive into the SEO workflow
Building This Blog: Jekyll on GitHub Pages — Setup guide with CI/CD overview
Using GitHub Actions with pip-audit — Similar CI pattern for Python projects
Building a Custom Tag and Category Generator Plugin — One of the custom plugins that requires this pipeline
Your Jekyll Sitemap Is 60% Garbage — The sitemap problem that led to the validation step
How the Sausage Is Made — Full feature inventory
Ruby Gem Release Automation — CI/CD for the Pandoc exports gem

Tags: jekyll, github-actions, ci-cd, codeql, seo, lighthouse, dependabot, automation, github-pages, security

Categories: web-development, technical, jekyll

About the Author: Michael McGarrah is a Cloud Architect with 25+ years in enterprise infrastructure, machine learning, and system administration. He holds an M.S. in Computer Science (AI/ML) from Georgia Tech and a B.S. in Computer Science from NC State University, and is currently pursuing an Executive MBA at UNC Wilmington. LinkedIn · Substack · GitHub · ORCID · Google Scholar · Resume