Web Harvesting Services - Library of Congress

SOL #: 030ADV26R0010Combined Synopsis/Solicitation

Overview

Buyer

Library Of Congress
Library Of Congress
CONTRACTS SERVICES
Washington, DC, 20540, United States

Place of Performance

Washington, DC

NAICS

Computing Infrastructure Providers (518210)

PSC

Cloud Solutions Delivered As A Service. (DK10)

Set Aside

No set aside specified

Timeline

1
Posted
Jan 20, 2026
2
Last Updated
Feb 25, 2026
3
Submission Deadline
Mar 3, 2026, 10:00 PM

Qualification Details

Fit reasons
  • NAICS alignment with historical contract wins in similar service areas.
  • Scope strongly matches core technical capabilities and delivery model.
Risks
  • Past performance thresholds may require one additional teaming partner.
  • Potential clarification needed on staffing minimums before bid/no-bid.
Next steps

Validate eligibility requirements, assign capture owner, and schedule partner outreach to confirm teaming strategy before submission planning.

Quick Summary

The Library of Congress is soliciting proposals for Web Harvesting Services under an Unrestricted Indefinite-Delivery, Indefinite-Quantity (IDIQ) contract. This opportunity seeks support for systematic, at-scale web content harvesting, temporary access, crawl reports, and content transfer for preservation and public access. Proposals are due by March 3, 2026, at 5:00 PM EST.

Purpose and Scope

The Library of Congress requires contract support to enable the systematic harvesting of web content at scale, based on instructions from Library staff. This includes providing temporary access to the content, generating required crawl reports for quality review, and facilitating the transfer of content to the Library for preservation and public access. The goal is to enrich the Library's digital collections.

Key requirements include:

  • Performing web crawls based on Library specifications, seed lists, and scoping instructions.
  • Comprehensive capture of various digital objects (HTML, images, PDFs, multimedia) to accurately replicate webpages.
  • Packaging captured content in valid WARC (Web ARChive) files with 11-field CDX indexes for transfer to the Library's AWS S3 bucket.
  • Providing an access tool for Library staff to review crawl results prior to transfer.
  • Generating detailed crawl reports (ASCII text and XML) within five days of crawl completion.
  • Utilizing US-based servers for crawling and maintaining secure data storage.
  • Adhering to strict information security policies, including restrictions on Generative AI use.
  • Key Personnel: Program Manager/Alternate, Crawl Engineer, and Quality Assurance Lead.
  • Estimated annual crawl volume ranges from 300-700 Terabytes.

Contract Details

  • Contract Type: Indefinite-Delivery, Indefinite-Quantity (IDIQ) with firm-fixed-price Task Orders.
  • Set-Aside: Unrestricted.
  • Product Service Code: DK10 (Cloud Solutions Delivered As A Service).
  • Period of Performance: A base period from June 1, 2026, to May 31, 2031.
  • Estimated Value: Minimum order of $300,000.00; Maximum order of $15,000,000.00.
  • Place of Performance: Contractor's own facilities.

Submission and Evaluation

  • Proposal Due Date: March 3, 2026, 5:00 PM EST.
  • Past Performance Questionnaires (PPQs) Due Date: March 3, 2026, noon ET. PPQs must be sent directly from the past performance reference.
  • Sample Web Crawl Transfer Information Due Date: February 17, 2026, 5:00 PM ET.
  • Submission Method: Electronically via email to Jennifer Zwahlen (jzwa@loc.gov) and Colleen Daly (cdaly@loc.gov). Total email attachment size must not exceed 20MB, and no zipped files are permitted.
  • Proposal Content: Must include four volumes: Technical Approach (including a sample web crawl), Corporate Experience and Capabilities (including Key Personnel resumes), Past Performance (using Attachment J3), and Price (using Attachment J4).
  • Evaluation Criteria: Best-Value Trade-Off (BVTO) approach. Factors in descending order of importance are Technical Approach, Corporate Experience and Capabilities, Past Performance, and Price. Non-price factors combined are significantly more or equally important to price.
  • SAM Registration: Offerors must be registered in SAM to be considered for award.

Technical Requirements

The Library's technical environment utilizes tools like Digiboard, Heritrix, Brozzler, OpenWayback, pywb, and OutbackCDX. Harvested content is stored in WARC format, and data transfer occurs via AWS S3. The required sample web crawl must be completed within 48 hours, should not respect robots.txt, and results (WARC, CDX, reports) must be delivered via SFTP.

People

Points of Contact

Colleen DalyPRIMARY
Jennifer ZwahlenSECONDARY

Files

Files

Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download
Download

Versions

Version 8Viewing
Combined Synopsis/Solicitation
Posted: Feb 25, 2026
Version 7
Combined Synopsis/Solicitation
Posted: Feb 19, 2026
View
Version 6
Combined Synopsis/Solicitation
Posted: Feb 18, 2026
View
Version 5
Combined Synopsis/Solicitation
Posted: Feb 13, 2026
View
Version 4
Combined Synopsis/Solicitation
Posted: Feb 3, 2026
View
Version 3
Combined Synopsis/Solicitation
Posted: Jan 28, 2026
View
Version 2
Combined Synopsis/Solicitation
Posted: Jan 23, 2026
View
Version 1
Combined Synopsis/Solicitation
Posted: Jan 20, 2026
View
Web Harvesting Services - Library of Congress | GovScope