From Publish to Reader: The Content Distribution Pipeline Behind This Blog

By Michael McGarrah · May 15, 20269 min read

I write a markdown file, push to GitHub, and GitHub Actions builds the site. That’s the publishing step. But publishing isn’t distribution — a post that exists on a server isn’t a post that reaches readers.

This blog has six distribution channels, each serving a different audience and timeline. The same content distribution pipeline concept applies to any content platform — documentation sites, developer portals, knowledge bases — the channels differ but the architecture is the same. Here’s how they work together.

The Distribution Channels

graph TD
    A[git push] --> B[GitHub Actions Build]
    B --> C[GitHub Pages]
    B --> D[RSS Feed - feed.xml]
    B --> E[Sitemap - sitemap.xml]
    B --> F[Sitemap Index - sitemapindex.xml]
    F --> G[Google Search Console]
    C --> H[Direct Readers]
    D --> I[RSS Subscribers]
    G --> J[Google Search Results]
    C --> K[Substack Newsletter]
    K --> L[Newsletter Subscribers]
    C --> M[Social Sharing]
    M --> N[Reddit / LinkedIn / etc.]

Channel	Audience	Timeline	Effort
Direct URL	Bookmarkers, repeat visitors	Immediate	Zero
RSS feed	Technical readers, feed aggregators	Minutes after build	Zero (automated)
Google Search	New readers searching for solutions	Days to weeks	Zero (automated via sitemap)
Substack newsletter	Subscribers, broader audience	Manual, batched	2-4 hours per newsletter
Social sharing	Reddit, LinkedIn, Hacker News	Manual, per-post	15 minutes per post
Cross-references	Readers of related posts	At publish time	Built into writing process

RSS Feed

The RSS feed is the oldest and simplest distribution channel. The jekyll-feed plugin generates feed.xml automatically at build time.

# Gemfile
gem "jekyll-feed", "~> 0.17.0"

That’s it. Every post gets an entry in the feed with title, date, excerpt, and full content. Feed readers like Feedly, NewsBlur, and Miniflux pick it up automatically.

The feed URL is https://mcgarrah.org/feed.xml and is advertised via a <link> tag in the HTML head that feed readers auto-discover.

What RSS Gets Right

Zero ongoing effort — Once configured, every new post is automatically in the feed
Reader-controlled — Subscribers choose when and how to read. No algorithm, no inbox competition
Full content — The feed includes the complete post, not just a teaser. Readers don’t have to click through

What RSS Misses

Shrinking audience — RSS readership has declined since Google Reader shut down in 2013. Most non-technical readers don’t use feed readers
No analytics — I can’t tell how many people read via RSS (by design — that’s a feature for privacy-conscious readers)

Sitemap and Sitemap Index

The sitemap tells search engines what pages exist and when they were last modified. This blog has a two-level sitemap structure because it hosts two Jekyll sites under one domain.

The Multi-Site Problem

The main blog lives at mcgarrah.org/ and the resume lives at mcgarrah.org/resume/. Each has its own sitemap.xml generated by jekyll-sitemap. But Google Search Console wants a single sitemap entry point for the domain.

The solution (added April 8, 2026) is a sitemapindex.xml at the domain root:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://mcgarrah.org/sitemap.xml</loc>
  </sitemap>
  <sitemap>
    <loc>https://mcgarrah.org/resume/sitemap.xml</loc>
  </sitemap>
</sitemapindex>

The robots.txt points to the index, not the individual sitemaps:

Sitemap: https://mcgarrah.org/sitemapindex.xml

This was a direct fix for the fragmented sitemap problem documented in Managing Multiple Jekyll Sites: Sitemap Challenges.

Sitemap Hygiene

The sitemap went through significant cleanup — from 434 URLs down to ~172 after excluding auto-generated tag pages, category pages, and pagination. That story is told in Your Jekyll Sitemap Is 60% Garbage. The SEO health check workflow validates the sitemap on every build.

Google Search Console

Google Search Console (GSC) is where the sitemap meets Google’s crawler. Submitting the sitemap index tells Google about every page on both the blog and resume sites.

The Indexing Journey

Getting Google to properly index the site was a multi-month process:

Domain verification — Proved ownership of mcgarrah.org via DNS TXT record
Sitemap submission — Submitted sitemapindex.xml pointing to both sitemaps
Canonical URL fixes — Resolved “Duplicate without user-selected canonical” errors by aligning url and canonical_url in _config.yml (published December 2025)
404 cleanup — Removed testing artifacts from _site/ that were generating crawl errors
Sitemap bloat fix — Excluded thin tag/category/pagination pages that Google flagged as “Discovered – currently not indexed”

The SEO health check GitHub Actions workflow now validates all of this automatically on every push — canonical URL consistency, sitemap XML validity, correct domain usage, and broken links.

What GSC Tells You

Coverage — Which pages are indexed, which are excluded, and why
Performance — Search queries that lead to your site, click-through rates, average position
Core Web Vitals — Page speed and user experience metrics
Links — External sites linking to your content (this is where you see the Substack and Reddit inbound links)

Substack is the highest-effort, highest-impact distribution channel. Each newsletter is a curated collection of blog posts with narrative connecting them, aimed at a broader audience than the blog’s typical reader.

The Cross-Posting Workflow

Write blog posts — Individual technical articles published on the blog over weeks/months
Identify a theme — Group related posts into a narrative arc
Write the newsletter — A 2,000-3,000 word article that tells the story across multiple posts, with links back to each one
Archive in _substack/ — Keep a markdown copy in the repository for version control

The _substack/ directory is excluded from the Jekyll build (the _ prefix ensures Jekyll ignores it). It’s purely for archival:

_substack/
├── README.md                                    # Publication schedule and tags
├── 2026-04-04-from-homelabs-to-machine-learning.md  # Published
└── 2026-04-20-from-markdown-to-production.md        # Published

The Inbound Link Effect

Each Substack newsletter contains 20-25 links back to specific blog posts:

Newsletter	Date	Inbound Links
From Homelabs to Machine Learning	2026-04-04	24 links
From Markdown to Production	2026-04-20	23 links
Total		47 links

These aren’t generic “visit my blog” links — each one points to a specific post URL like https://mcgarrah.org/proxmox-ceph-nearfull/. This is why permalink stability matters so much. If I changed the permalink structure, 47 newsletter links would break instantly, and I can’t edit published Substack articles retroactively.

The inbound links also serve as backlinks for SEO — external sites linking to your content is one of Google’s strongest ranking signals.

Publication Scheduling

Blog posts must be live before the newsletter that references them goes out. The DRAFTS.md tracker includes a dependency checklist for each Substack publication:

The Apr 20 Substack references the following blog posts that should be live:
- 2026-04-14-ceph-osd-recovery-power-failure.md ✅
- 2026-04-15-zfs-ceph-overlapping-failures.md ✅
- 2026-04-18-jekyll-markdown-feature-reference.md ✅
- 2026-04-19-setting-up-jekyll-blog-github-pages.md ✅

Planned Newsletters

#	Theme	Status
1	Infrastructure (Proxmox, Ceph, Dell Wyse, monitoring)	Published 2026-04-04
2	Web Development (Jekyll, SEO, GDPR, Pandoc, Mermaid)	Published 2026-04-20
3	Machine Learning (AI/ML research, phonemes, cloud DS)	Planned

Reddit, LinkedIn, and other platforms are manual, per-post distribution. The effort is low (15 minutes to write a post title and context) but the reach is unpredictable — a Reddit post might get 3 views or 3,000.

What Makes Posts Shareable

Clean URLs — mcgarrah.org/proxmox-ceph-nearfull/ looks better than a date-heavy URL in a Reddit title
Open Graph meta tags — When someone pastes a link on LinkedIn or Twitter, the jekyll-seo-tag plugin provides the title, description, and image for the preview card
Descriptive titles — “Your Jekyll Sitemap Is 60% Garbage” gets more clicks than “Sitemap Optimization Notes”

The Permalink Contract

Every external share creates a permanent reference to a specific URL. A Reddit post from 2025 still points to mcgarrah.org/proxmox-8-dell-wyse-3040-upgrade/. That URL must work forever — or at least redirect via jekyll-redirect-from if the post is renamed.

This is the same constraint as Substack links, but harder to track. I know exactly which URLs my newsletters reference (they’re in the _substack/ archive). I don’t know which URLs have been shared on Reddit or bookmarked by readers.

Cross-References Between Posts

The newest distribution channel is the “Related Posts” section at the bottom of articles. Currently 16 of 139 posts have hand-curated cross-references — all from September 2025 onward.

These serve double duty:

Reader navigation — A reader finishing the Ceph OSD recovery post sees links to the ZFS failure post and the SSD acceleration post
Internal linking for SEO — Google uses internal link structure to understand which pages are most important. Posts with many inbound internal links rank higher

How the Channels Reinforce Each Other

The channels aren’t independent — they form a flywheel:

Blog post published → appears in RSS feed and sitemap automatically
Google indexes it → organic search traffic starts arriving (days to weeks)
Substack newsletter bundles multiple posts → drives traffic spike to all referenced posts
Reddit/LinkedIn share → drives traffic spike to individual post
Inbound links from Substack and social → improve Google ranking → more organic traffic
Cross-references in new posts → drive traffic to older posts → keep them relevant

The daily GitHub Actions cron build ensures future-dated posts enter this pipeline automatically. The SEO health check ensures the pipeline stays healthy.

What I’d Add Next

Social sharing buttons on posts — Currently on the TODO list. Would reduce friction for readers who want to share
Substack RSS import — Substack can auto-import from an RSS feed, which would reduce the manual cross-posting effort
Analytics per channel — Google Analytics shows referral sources, but I don’t track which Substack newsletter drove which traffic spike
Automated cross-references — The 123 older posts without “Related Posts” sections could benefit from tag-based automated suggestions

Jekyll Content Plumbing: Permalinks, Reading Time, Excerpts, and Redirects — Why permalink stability matters for distribution
Your Jekyll Sitemap Is 60% Garbage — Sitemap cleanup
Managing Multiple Jekyll Sites: Sitemap Challenges — The multi-site sitemap problem
Jekyll SEO, Sitemap, and Canonical URL Fixes — Google Search Console indexing fixes
The CI/CD Pipeline Behind This Jekyll Blog — The build system that powers the pipeline
Building This Blog: Jekyll on GitHub Pages — Overall setup guide

Tags: jekyll, rss, sitemap, substack, seo, google-search-console, content-distribution, github-pages, newsletter

Categories: web-development, technical, jekyll

About the Author: Michael McGarrah is a Cloud Architect with 25+ years in enterprise infrastructure, machine learning, and system administration. He holds an M.S. in Computer Science (AI/ML) from Georgia Tech and a B.S. in Computer Science from NC State University, and is currently pursuing an Executive MBA at UNC Wilmington. LinkedIn · Substack · GitHub · ORCID · Google Scholar · Resume