Web Harvesting Services - Library of Congress

SOL #: 030ADV26R0010Combined Synopsis/Solicitation

Overview

Buyer

Library Of Congress
Library Of Congress
CONTRACTS SERVICES
Washington, DC, 20540, United States

Place of Performance

Washington, DC

NAICS

Computing Infrastructure Providers (518210)

PSC

Cloud Solutions Delivered As A Service. (DK10)

Set Aside

No set aside specified

Timeline

1
Posted
Jan 20, 2026
2
Last Updated
Feb 25, 2026
3
Submission Deadline
Feb 24, 2026, 10:00 PM

Qualification Details

Fit reasons
  • NAICS alignment with historical contract wins in similar service areas.
  • Scope strongly matches core technical capabilities and delivery model.
Risks
  • Past performance thresholds may require one additional teaming partner.
  • Potential clarification needed on staffing minimums before bid/no-bid.
Next steps

Validate eligibility requirements, assign capture owner, and schedule partner outreach to confirm teaming strategy before submission planning.

Quick Summary

The Library of Congress has issued a Combined Synopsis/Solicitation (RFP 030ADV26R0010) for Web Harvesting Services. This Unrestricted opportunity seeks contract support for systematic, at-scale web content harvesting, including temporary access, crawl reports, and content transfer for preservation and public access. The contract is an Indefinite-Delivery, Indefinite-Quantity (IDIQ) with firm-fixed-price Task Orders, with an estimated value between $300,000 and $15,000,000. Proposals are due February 24, 2026, at 5:00 PM EST.

Purpose & Scope

The Library of Congress requires services to systematically harvest web content based on staff instructions, provide temporary access to content and crawl reports for quality review, and enable content transfer for preservation and public access. The scope includes capturing an estimated 350-700 Terabytes (TB) of data through various crawl types: weekly (1,700 seeds), monthly (4,000 seeds), extended (up to 10,000 seeds), and specific weekly crawls for the US Election 2026 (up to 1,500 seeds). Crawls generally ignore robots.txt and require deduplication.

Key Requirements

Contractors must perform web content harvesting, packaging captured content into valid WARC (Web ARChive) files (ISO 28500_2017) with 11-field CDX indexes for transfer to the Library's S3 bucket via secure internet (HTTPS). Single BagIt bags should not exceed 1 TB, with target WARC files around 1 GB. Services include providing an access tool for quality review, generating detailed reports (ASCII text and XML) within 5 days of crawl completion, and developing a Quality Control Program (QCP). Infrastructure must utilize US-based servers with reliable and secure data storage. Strict information security policies apply, including restrictions on Generative AI use and mandatory IT Security Training. Key personnel (Program Manager, Crawl Engineer, Quality Assurance Lead) with specified experience are required.

Contract Details

  • Contract Type: Indefinite-Delivery, Indefinite-Quantity (IDIQ) with firm-fixed-price Task Orders.
  • Period of Performance: Base period from June 1, 2026, to May 31, 2031.
  • Estimated Value: Minimum order of $300,000.00; Maximum order of $15,000,000.00.
  • Place of Performance: Contractor's own facilities.
  • Set-Aside: Unrestricted.
  • Product Service Code: DK10 - Cloud Solutions Delivered As A Service.

Submission Requirements & Deadlines

  • Proposals Due: February 24, 2026, at 5:00 PM EST.
  • Questions Due: February 2, 2026, at 12:00 PM EST.
  • Sample Web Crawl Size Notification Due: February 17, 2026, by 5:00 PM EST (via email to cdaly@loc.gov and jzwa@loc.gov).
  • Past Performance Questionnaires (PPQs) Due: February 24, 2026, by noon EST (submitted directly by references to the Contracting Team).
  • Proposal Content: Must include four volumes: Technical Approach (including a sample web crawl), Corporate Experience and Capabilities, Past Performance (using Attachment J3), and Price (using Attachment J4).
  • Submission Method: Electronically via email to cdaly@loc.gov and jzwa@loc.gov. Total email attachment size not to exceed 20MB. Proposals must be valid through June 6, 2026.

Evaluation Criteria

Award will be based on a Best-Value Trade-Off (BVTO) approach. Evaluation factors, in descending order of importance, are: Technical Approach, Corporate Experience and Capabilities, Past Performance, and Price. Non-price factors combined are significantly more or equally important to price. The Library may award without discussions.

Technical Environment

The Library's existing environment uses Digiboard for seed management, provides seed lists in SURT format, and supports open-source tools like Heritrix, Brozzler, OpenWayback, and pywb. Harvested content is stored in WARC format, and data transfer occurs via AWS S3. Detailed crawl reports are required, including specific metrics on hosts, documents, MIME types, HTTP codes, and data sizes.

Contact Information

People

Points of Contact

Colleen DalyPRIMARY
Jennifer ZwahlenSECONDARY

Files

Files

Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download

Versions

Version 8
Combined Synopsis/Solicitation
Posted: Feb 25, 2026
View
Version 7
Combined Synopsis/Solicitation
Posted: Feb 19, 2026
View
Version 6
Combined Synopsis/Solicitation
Posted: Feb 18, 2026
View
Version 5
Combined Synopsis/Solicitation
Posted: Feb 13, 2026
View
Version 4
Combined Synopsis/Solicitation
Posted: Feb 3, 2026
View
Version 3Viewing
Combined Synopsis/Solicitation
Posted: Jan 28, 2026
Version 2
Combined Synopsis/Solicitation
Posted: Jan 23, 2026
View
Version 1
Combined Synopsis/Solicitation
Posted: Jan 20, 2026
View
Web Harvesting Services - Library of Congress | GovScope