McGarrah Technical Blog

Advanced Jekyll SEO Health Checks: Comprehensive Automation and Monitoring

23 min read

Following up on my previous article about Jekyll SEO fixes, this post details the evolution of a basic SEO health check into a comprehensive monitoring system. What started as simple canonical URL validation has grown into a robust automation pipeline that combines custom Jekyll-specific checks with industry-standard tools like Lighthouse CI and Lychee link validation.

Evolution from Basic to Comprehensive SEO Monitoring

The original SEO health check workflow was functional but limited. Over several months of iteration and real-world usage, it evolved into a sophisticated monitoring system that provides actionable insights and maintains high SEO standards automatically.

Original Workflow Limitations

The initial implementation had several gaps:

Enhanced Workflow Capabilities

The current system addresses these limitations with:

Comprehensive SEO Health Check Architecture

Workflow Structure Overview

name: SEO Health Check

on:
  schedule:
    - cron: '0 6 * * 1'  # Weekly on Monday at 6 AM UTC
  workflow_dispatch:  # Allow manual trigger
  push:
    branches: [main]
    paths:
      - '_config.yml'
      - '_layouts/**'
      - '_posts/**'
      - 'robots.txt'

permissions:
  contents: read
  actions: write

The workflow triggers on:

Security and Permissions

Following GitHub Actions security best practices:

permissions:
  contents: read    # Required for checkout and file operations
  actions: write    # Required for artifact uploads

This implements the principle of least privilege, granting only necessary permissions while maintaining full functionality.

Jekyll Site Preparation and Validation

Build Process Enhancement

The workflow begins with comprehensive site preparation:

- name: Setup Ruby
  uses: ruby/setup-ruby@v1
  with:
    ruby-version: '3.2'
    bundler-cache: true
    
- name: Install XML tools
  run: sudo apt-get update && sudo apt-get install -y libxml2-utils
    
- name: Build Jekyll site
  run: bundle exec jekyll build

Key improvements:

Local Server for Testing

For Lighthouse integration, the workflow serves the site locally:

- name: Serve site for Lighthouse
  run: |
    bundle exec jekyll serve --detach --port 4000
    sleep 5  # Wait for server to start

This enables localhost testing within GitHub Actions, allowing Lighthouse to analyze the actual rendered site without external dependencies.

Lighthouse CI Integration

Performance and SEO Scoring

The workflow integrates Google’s official Lighthouse CI for comprehensive performance and SEO analysis:

- name: Run Lighthouse CI
  uses: treosh/lighthouse-ci-action@v10
  with:
    urls: |
      http://localhost:4000
      http://localhost:4000/about/
      http://localhost:4000/archive/
    configPath: './.lighthouserc.json'
    uploadArtifacts: false
    temporaryPublicStorage: true

Lighthouse Configuration

The .lighthouserc.json configuration enforces quality thresholds:

{
  "ci": {
    "collect": {
      "numberOfRuns": 3,
      "settings": {
        "chromeFlags": "--no-sandbox --headless"
      }
    },
    "assert": {
      "assertions": {
        "categories:performance": ["warn", {"minScore": 0.8}],
        "categories:accessibility": ["warn", {"minScore": 0.9}],
        "categories:best-practices": ["warn", {"minScore": 0.9}],
        "categories:seo": ["error", {"minScore": 0.9}],
        "categories:pwa": "off"
      }
    },
    "upload": {
      "target": "temporary-public-storage"
    }
  }
}

Quality gates:

Lighthouse Results Management

Custom artifact handling provides timestamped Lighthouse reports:

- name: Upload Lighthouse results
  run: |
    TIMESTAMP=$(date +%Y%m%d-%H%M%S)
    echo "LIGHTHOUSE_ARTIFACT_NAME=lighthouse-results-$TIMESTAMP" >> $GITHUB_ENV
  
- name: Upload Lighthouse artifacts
  uses: actions/upload-artifact@v4
  with:
    name: $
    path: .lighthouseci/
    retention-days: 90

This creates artifacts like lighthouse-results-20260101-143022 containing:

Handling Lighthouse AdSense Issues

When running Lighthouse on sites with Google AdSense, you may encounter console errors:

TagError: adsbygoogle.push() error: Only one 'enable_page_level_ads' allowed per page.

Root Cause: Multiple AdSense initialization calls, duplicate ad configurations, or Google Analytics cookie issues.

Common Issues:

Solutions:

  1. Exclude AdSense from Lighthouse Testing:
    {
      "ci": {
     "collect": {
       "settings": {
         "chromeFlags": "--no-sandbox --headless",
         "blockedUrlPatterns": [
           "**/pagead/js/adsbygoogle.js*",
           "**/googleads.g.doubleclick.net/**",
           "**/googlesyndication.com/**",
           "**/google-analytics.com/**",
           "**/googletagmanager.com/**",
           "**/analytics.google.com/**"
         ]
       }
     }
      }
    }
    
  2. Conditional AdSense Loading:
    // Only load AdSense in production, not during testing
    if (window.location.hostname !== 'localhost') {
      (adsbygoogle = window.adsbygoogle || []).push({
     google_ad_client: "ca-pub-your-id",
     enable_page_level_ads: true
      });
    }
    
  3. GDPR-Aware AdSense Implementation:
    // From your GDPR implementation
    if (consent === 'all' && window.location.hostname !== 'localhost') {
      loadAdSenseScript();
    }
    
  4. Hostname Check in Cookie Consent Script:
function loadConsentBasedScripts(consentLevel) {
  // Skip loading scripts during localhost testing
  if (window.location.hostname === 'localhost' || window.location.hostname === '127.0.0.1') {
    return;
  }
  
  // Load analytics and ads only in production
  if (GA_ID && !document.querySelector('script[src*="googletagmanager.com/gtag"]')) {
    // Load Google Analytics
  }
}

function loadAdSense() {
  // Skip loading AdSense during localhost testing
  if (window.location.hostname === 'localhost' || window.location.hostname === '127.0.0.1') {
    return;
  }
  
  // Load AdSense only in production
}

These console errors don’t affect SEO scores but can clutter Lighthouse reports. The blocking approach provides cleaner testing results while maintaining production functionality.

Expected Warnings After Blocking: When analytics/ads are blocked, you may see:

Third-party cookies may be blocked in some contexts.
Google Analytics: ar_debug /g/collect?v=...

These warnings are normal and expected - they indicate the blocking is working correctly. The analytics requests are being blocked as intended, preventing cookie issues during testing.

Additional Solution: Hostname Checks in JavaScript

For sites with GDPR cookie consent, add hostname checks to prevent script loading during localhost testing:

// In your cookie consent script
function loadConsentBasedScripts(consentLevel) {
  // Skip loading scripts during localhost testing
  if (window.location.hostname === 'localhost' || window.location.hostname === '127.0.0.1') {
    return;
  }
  // Load analytics/ads only in production
}

This prevents third-party cookie warnings by ensuring tracking scripts never load during Lighthouse testing, while maintaining full functionality in production.

Lighthouse flags generic link text like “read more” as an SEO issue:

Descriptive link text helps search engines understand your content.
Link Text: "read more"

Problem: Generic link text provides no context about the destination.

Solution: Add descriptive link text validation to the workflow:

- name: Check descriptive link text
  run: |
    echo "Checking for non-descriptive link text..."
    
    # Find generic link text patterns
    GENERIC_LINKS=$(find _site -name "*.html" -exec grep -o '<a[^>]*>[^<]*</a>' {} \; | grep -E '(read more|click here|here|more|continue reading)' | wc -l || true)
    echo "⚠️  Generic link text found: $GENERIC_LINKS"
    
    if [ "$GENERIC_LINKS" -gt 0 ]; then
      echo "Files with generic link text:"
      find _site -name "*.html" -exec grep -l -E '(read more|click here|here|more|continue reading)' {} \; | head -5
    fi

Jekyll Template Fix:

<!-- Instead of generic "read more" -->
<a href="">read more</a>

<!-- Use descriptive text -->
<a href=""></a>
<!-- or -->
<a href="" aria-label="Read full article: ">read more</a>

CSS-Only Solution (maintains design while improving SEO):

.read-more-link {
  position: relative;
}

.read-more-link::after {
  content: ": " attr(data-title);
  position: absolute;
  left: -9999px;
  width: 1px;
  height: 1px;
  overflow: hidden;
}
<a href="" class="read-more-link" data-title="">read more</a>

Anchor Elements Without Proper href Attributes

Lighthouse flags anchor elements that don’t have proper href attributes:

Search engines may use href attributes on links to crawl websites.
Element: <a class="disabled">

Problem: Anchor elements without href attributes or with empty/invalid hrefs prevent crawling.

Solution: Add validation for improper anchor elements:

- name: Check anchor elements
  run: |
    echo "Checking for anchor elements without proper href..."
    
    # Find anchor elements without href or with empty/invalid href
    INVALID_ANCHORS=$(find _site -name "*.html" -exec grep -o '<a[^>]*>' {} \; | grep -v 'href="[^"]*"' | wc -l || true)
    echo "⚠️  Anchor elements without proper href: $INVALID_ANCHORS"
    
    # Find disabled/empty anchors
    DISABLED_ANCHORS=$(find _site -name "*.html" -exec grep -o '<a[^>]*class="[^"]*disabled[^"]*"[^>]*>' {} \; | wc -l || true)
    echo "⚠️  Disabled anchor elements: $DISABLED_ANCHORS"
    
    if [ "$INVALID_ANCHORS" -gt 0 ] || [ "$DISABLED_ANCHORS" -gt 0 ]; then
      echo "Files with invalid anchors:"
      find _site -name "*.html" -exec grep -l '<a[^>]*class="[^"]*disabled' {} \; | head -5
    fi

Jekyll Pagination Fix:

<!-- Instead of disabled anchor -->

  <span class="disabled">« newer posts</span>


<!-- Or use proper href with aria-disabled -->

  <a href="#" aria-disabled="true" onclick="return false;">« newer posts</a>

CSS for Disabled State:

.pagination .disabled {
  color: #6c757d;
  pointer-events: none;
  cursor: default;
}

a[aria-disabled="true"] {
  color: #6c757d;
  text-decoration: none;
  pointer-events: none;
}

Tap Target Sizing Validation

Lighthouse flags interactive elements that are too small for mobile users:

Tap targets are not sized appropriately
Interactive elements should be large enough (48x48px)
Tap Target: Archive <a href="/archive/"> Size: 13x18

Problem: Small tap targets (< 48x48px) are difficult to use on mobile devices.

CSS Solutions for Navigation Links:

/* Ensure minimum tap target size */
.nav-links a {
  display: inline-block;
  min-height: 48px;
  min-width: 48px;
  padding: 12px 16px;
  line-height: 24px;
  text-align: center;
}

/* For icon-only links */
.social-links a {
  display: inline-block;
  width: 48px;
  height: 48px;
  padding: 12px;
  text-align: center;
  line-height: 24px;
}

Jekyll Template Enhancement:

<!-- Add proper spacing and sizing classes -->
<nav class="sidebar-nav">
  <a href="/archive/" class="nav-link">Archive</a>
  <a href="/tags/" class="nav-link">Tags</a>
  <a href="/categories/" class="nav-link">Categories</a>
  <a href="/search/" class="nav-link">Search</a>
</nav>

## Advanced Link Validation

### Lychee Link Checker Integration

The workflow uses Lychee for comprehensive link validation:

```yaml
- name: Check links with Lychee
  uses: lycheeverse/lychee-action@v1
  with:
    args: --verbose --no-progress --exclude-path '_site/resume' '_site/**/*.html'
    fail: true
  env:
    GITHUB_TOKEN: $

Lychee advantages over custom scripts:

Complementing Lychee, custom validation handles Jekyll-specific scenarios:

- name: Check for broken internal links (custom)
  run: |
    echo "Checking for broken internal links (Jekyll-specific)..."
    
    find _site -name "*.html" -exec grep -l 'href="/' {} \; | head -5 | while read file; do
      echo "Checking links in $file..."
      
      grep -o 'href="[^"]*"' "$file" | grep 'href="/' | sed 's/href="//;s/"//' | while read link; do
        clean_link=$(echo "$link" | sed 's/#.*//')
        
        # Skip external links, special cases, and /resume/ links (separate deployment)
        if [[ "$clean_link" =~ ^https?:// ]] || [[ "$clean_link" == "/" ]] || [[ "$clean_link" =~ ^/resume ]]; then
          continue
        fi
        
        target_file="_site${clean_link}"
        if [[ "$clean_link" == */ ]]; then
          target_file="${target_file}index.html"
        elif [[ ! "$clean_link" =~ \. ]]; then
          target_file="${target_file}/index.html"
        fi
        
        if [ ! -f "$target_file" ]; then
          echo "⚠️  Potential broken link: $link in $file"
        fi
      done
    done

Custom validation features:

Enhanced Content Quality Validation

Structured Data Validation

The workflow validates JSON-LD structured data implementation:

- name: Check structured data
  run: |
    echo "Checking structured data..."
    
    JSON_LD_PAGES=$(find _site -name "*.html" -exec grep -l 'application/ld+json' {} \; | wc -l)
    echo "📊 Pages with JSON-LD: $JSON_LD_PAGES"
    
    if grep -q 'application/ld+json' _site/index.html; then
      echo "✅ Homepage has structured data"
    else
      echo "⚠️  Homepage missing structured data"
    fi

Image Optimization Analysis

Comprehensive image validation ensures SEO and performance best practices:

- name: Check image optimization
  run: |
    echo "Checking image optimization..."
    
    # Check for images without alt text
    MISSING_ALT=$(find _site -name "*.html" -exec grep -o '<img[^>]*>' {} \; | grep -v 'alt=' | wc -l || true)
    echo "⚠️  Images without alt text: $MISSING_ALT"
    
    # Check for large images (basic check)
    find _site -name "*.jpg" -o -name "*.png" -o -name "*.jpeg" | while read img; do
      SIZE=$(stat -f%z "$img" 2>/dev/null || stat -c%s "$img" 2>/dev/null || echo 0)
      if [ "$SIZE" -gt 500000 ]; then  # 500KB
        echo "⚠️  Large image: $img ($(($SIZE/1024))KB)"
      fi
    done

Content Quality Indicators

The workflow analyzes content quality metrics:

- name: Check content quality indicators
  run: |
    echo "Checking content quality..."
    
    # Check for duplicate titles
    find _site -name "*.html" -exec grep -o '<title>[^<]*</title>' {} \; | sort | uniq -d > duplicate-titles.txt
    DUPLICATE_TITLES=$(wc -l < duplicate-titles.txt)
    if [ "$DUPLICATE_TITLES" -gt 0 ]; then
      echo "⚠️  Duplicate titles found: $DUPLICATE_TITLES"
      head -5 duplicate-titles.txt
    else
      echo "✅ No duplicate titles"
    fi
    
    # Check for duplicate meta descriptions with file paths
    find _site -name "*.html" -exec grep -l 'name="description"' {} \; | xargs grep -o 'name="description" content="[^"]*"' | sort | uniq -d > duplicate-descriptions.txt
    DUPLICATE_DESC=$(wc -l < duplicate-descriptions.txt)
    if [ "$DUPLICATE_DESC" -gt 0 ]; then
      echo "⚠️  Duplicate meta descriptions: $DUPLICATE_DESC"
      echo "Files with duplicate descriptions:"
      find _site -name "*.html" -exec grep -l 'name="description"' {} \; | xargs grep -l "$(head -1 duplicate-descriptions.txt | sed 's/.*content="//;s/".*//')" | head -5
    else
      echo "✅ No duplicate meta descriptions"
    fi

Mobile Optimization Validation

Viewport Meta Tag Analysis

Ensuring mobile-friendly implementation across all pages:

- name: Check mobile optimization
  run: |
    echo "Checking mobile optimization..."
    
    VIEWPORT_PAGES=$(find _site -name "*.html" -exec grep -l 'name="viewport"' {} \; | wc -l)
    TOTAL_PAGES=$(find _site -name "*.html" | wc -l)
    echo "📊 Pages with viewport meta: $VIEWPORT_PAGES/$TOTAL_PAGES"
    
    if [ "$VIEWPORT_PAGES" -eq "$TOTAL_PAGES" ]; then
      echo "✅ All pages have viewport meta tag"
    else
      echo "⚠️  Some pages missing viewport meta tag"
      echo "Pages missing viewport meta:"
      find _site -name "*.html" -exec grep -L 'name="viewport"' {} \; | head -5
    fi

Comprehensive Reporting System

Detailed Issue Reporting

The enhanced reporting system provides specific file paths for quick issue resolution:

- name: Generate SEO report
  run: |
    echo "## SEO Health Check Report" > seo-report.md
    echo "Generated: $(date)" >> seo-report.md
    echo "" >> seo-report.md
    
    # Site statistics
    echo "### Site Statistics" >> seo-report.md
    echo "- Total HTML pages: $(find _site -name "*.html" | wc -l)" >> seo-report.md
    echo "- Sitemap URLs: $(grep -c "<url>" _site/sitemap.xml)" >> seo-report.md
    echo "- RSS feed items: $(grep -c "<item>" _site/feed.xml || echo "0")" >> seo-report.md
    echo "- Pages with JSON-LD: $(find _site -name "*.html" -exec grep -l 'application/ld+json' {} \; | wc -l)" >> seo-report.md
    echo "- Pages with viewport meta: $(find _site -name "*.html" -exec grep -l 'name="viewport"' {} \; | wc -l)" >> seo-report.md
    echo "" >> seo-report.md
    
    # SEO Issues with specific file paths
    echo "### SEO Issues" >> seo-report.md
    
    # Pages without meta descriptions (excluding /assets/)
    MISSING_DESC=$(find _site -name "*.html" -not -path "_site/assets/*" -exec grep -L 'name="description"' {} \; | wc -l)
    echo "- Pages missing meta descriptions: $MISSING_DESC" >> seo-report.md
    if [ "$MISSING_DESC" -gt 0 ]; then
      echo "  Files missing descriptions:" >> seo-report.md
      find _site -name "*.html" -not -path "_site/assets/*" -exec grep -L 'name="description"' {} \; | head -5 | sed 's/^/    - /' >> seo-report.md
    fi
    
    # Pages without titles
    MISSING_TITLE=$(find _site -name "*.html" -exec grep -L '<title>' {} \; | wc -l)
    echo "- Pages missing titles: $MISSING_TITLE" >> seo-report.md
    if [ "$MISSING_TITLE" -gt 0 ]; then
      echo "  Files missing titles:" >> seo-report.md
      find _site -name "*.html" -exec grep -L '<title>' {} \; | head -5 | sed 's/^/    - /' >> seo-report.md
    fi

Interactive Log Display

Reports are displayed directly in GitHub Actions logs for immediate visibility:

- name: Display SEO report
  run: |
    echo "📋 SEO Health Check Report Contents:"
    echo "==========================================="
    cat seo-report.md
    echo "==========================================="

Timestamped Artifact Management

Both SEO reports and Lighthouse results use consistent timestamped naming:

- name: Upload SEO report
  run: |
    TIMESTAMP=$(date +%Y%m%d-%H%M%S)
    echo "ARTIFACT_NAME=seo-health-report-$TIMESTAMP" >> $GITHUB_ENV
  
- name: Upload SEO report artifact
  uses: actions/upload-artifact@v4
  with:
    name: $
    path: seo-report.md
    retention-days: 90

This creates organized artifacts like:

Smart Filtering and Exclusions

Assets Directory Exclusion

The workflow intelligently excludes /assets/ directory from meta description requirements:

# Pages without meta descriptions (excluding /assets/)
MISSING_DESC=$(find _site -name "*.html" -not -path "_site/assets/*" -exec grep -L 'name="description"' {} \; | wc -l)

Rationale: Asset files (PDFs, generated content) don’t need SEO meta descriptions, reducing false positives.

Custom link validation excludes /resume/ links handled by separate deployment:

# Skip external links, special cases, and /resume/ links (separate deployment)
if [[ "$clean_link" =~ ^https?:// ]] || [[ "$clean_link" == "/" ]] || [[ "$clean_link" =~ ^/resume ]]; then
  continue
fi

This prevents false positives for links that are valid in production but not in the Jekyll build.

Performance Optimization Features

Page Performance Indicators

The workflow analyzes performance-related SEO factors:

- name: Check page performance indicators
  run: |
    echo "Checking performance indicators..."
    
    # Check for inline CSS (should be minimal)
    INLINE_CSS=$(find _site -name "*.html" -exec grep -o '<style[^>]*>.*</style>' {} \; | wc -l || true)
    echo "📊 Pages with inline CSS: $INLINE_CSS"
    
    # Check for external script count
    EXTERNAL_SCRIPTS=$(find _site -name "*.html" -exec grep -o 'src="http[^"]*"' {} \; | sort -u | wc -l || true)
    echo "📊 Unique external scripts: $EXTERNAL_SCRIPTS"

These metrics help identify performance bottlenecks that impact SEO rankings.

Real-World Results and Impact

Comprehensive Coverage Metrics

The enhanced workflow now provides complete SEO health visibility:

Site Statistics Example:

Issue Detection Improvements:

Actionable Issue Resolution

Before Enhancement:

❌ 2 pages missing meta descriptions

After Enhancement:

❌ Pages missing meta descriptions: 1
  Files missing descriptions:
    - _site/music/index.html

The enhanced reporting provides exact file paths for immediate issue resolution.

Performance Monitoring Integration

Lighthouse integration provides ongoing performance visibility:

Maintenance and Evolution

Automated Scheduling

The workflow runs automatically every Monday morning, providing:

Continuous Improvement Process

The workflow has evolved through iterative enhancement:

  1. Basic validationComprehensive monitoring
  2. Manual analysisAutomated reporting
  3. Generic alertsSpecific file paths
  4. Single toolMulti-tool integration
  5. Basic artifactsTimestamped organization

Best Practices and Recommendations

Implementation Guidelines

Start Simple: Begin with basic checks and gradually add complexity

# Phase 1: Basic Jekyll validation
- Canonical URL consistency
- Sitemap validation
- Basic meta tag checks

# Phase 2: Enhanced reporting
- Specific file paths
- Detailed statistics
- Artifact management

# Phase 3: Tool integration
- Lighthouse CI
- Advanced link checking
- Performance monitoring

Security Considerations: Always use minimal permissions

permissions:
  contents: read    # Only what's needed
  actions: write    # For artifacts only

Artifact Management: Implement consistent naming and retention

# Timestamped artifacts for organization
name: $
retention-days: 90  # Balance storage and utility

Customization for Different Sites

Jekyll-Specific Adaptations:

Multi-Site Management:

Future Enhancements

Planned Improvements

Advanced Analytics Integration:

Enhanced Reporting:

Performance Optimization:

Conclusion

The evolution from basic SEO validation to comprehensive health monitoring demonstrates the value of iterative improvement in automation. What began as simple canonical URL checking has become a robust system that:

The comprehensive SEO health check system ensures consistent site quality while reducing manual monitoring effort. By combining custom Jekyll validation with industry-standard tools like Lighthouse CI and Lychee, the workflow provides both depth and breadth in SEO monitoring.

For Jekyll site owners serious about SEO performance, implementing a similar comprehensive monitoring system pays dividends in search visibility, user experience, and maintenance efficiency. The investment in automation infrastructure enables focus on content creation while maintaining technical excellence.


This comprehensive SEO health check system has been running in production since late 2025, processing weekly validations and maintaining consistent site quality. The complete workflow implementation is available in the GitHub repository for reference and adaptation.