Web Harvesting Services - Library of Congress

SOL #: 030ADV26R0010Combined Synopsis/Solicitation

Overview

Buyer

Library Of Congress
Library Of Congress
CONTRACTS SERVICES
Washington, DC, 20540, United States

Place of Performance

Washington, DC

NAICS

Computing Infrastructure Providers (518210)

PSC

Cloud Solutions Delivered As A Service. (DK10)

Set Aside

No set aside specified

Timeline

1
Posted
Jan 20, 2026
2
Last Updated
Feb 25, 2026
3
Submission Deadline
Feb 24, 2026, 10:00 PM

Qualification Details

Fit reasons
  • NAICS alignment with historical contract wins in similar service areas.
  • Scope strongly matches core technical capabilities and delivery model.
Risks
  • Past performance thresholds may require one additional teaming partner.
  • Potential clarification needed on staffing minimums before bid/no-bid.
Next steps

Validate eligibility requirements, assign capture owner, and schedule partner outreach to confirm teaming strategy before submission planning.

Quick Summary

The Library of Congress is soliciting proposals for Web Harvesting Services under a Combined Synopsis/Solicitation (RFP 030ADV26R0010). This opportunity seeks contract support for systematic, at-scale web content harvesting, including temporary access, crawl reports, and content transfer for preservation and public access. The contract is an Indefinite-Delivery, Indefinite-Quantity (IDIQ) with firm-fixed-price Task Orders. Proposals are due by 5:00 PM EST on February 24, 2026.

Purpose & Scope

The Library of Congress requires a contractor to perform web content harvesting based on Library staff instructions. This includes capturing various digital objects (HTML, images, PDFs, multimedia), providing temporary access for quality review, generating detailed crawl reports, and transferring content to the Library's S3 bucket in WARC format for preservation and public access. The goal is to enrich the Library's digital collections.

Key Requirements & Deliverables

  • Web Content Harvesting: Perform crawls based on Library specifications, seed lists (e.g., Attachment J2b), and scoping instructions, generally ignoring robots.txt. This includes weekly, monthly, extended, and US Election 2026 specific crawls, with estimated data collection of 350-700 Terabytes.
  • Data Packaging & Transfer: Package captured content in valid WARC (ISO 28500_2017) files with 11-field CDX indexes for transfer to the Library's S3 bucket via secure internet (HTTPS).
  • Quality Review & Reporting: Provide an access tool for Library staff to review crawl results. Generate detailed reports (ASCII text and XML) within 5 days of crawl completion, including statistical information on captured content, resources, and crawl performance. Develop and maintain a Quality Control Program (QCP).
  • Infrastructure & Security: Utilize US-based servers for crawling, maintain reliable and secure data storage, and provide a web-based communication tool. Adhere to strict information security policies, including restrictions on Generative AI use and mandatory IT Security Training.
  • Key Personnel: Provide qualified Program Manager/Alternate, Crawl Engineer, and Quality Assurance Lead.
  • Technical Test Crawl: Offerors must perform a one-time technical test crawl using a Library-provided seed list (Attachment J2b) within 48 hours, delivering results in WARC format with CDX files and reports via SFTP.

Contract Details

  • Contract Type: Indefinite-Delivery, Indefinite-Quantity (IDIQ) with firm-fixed-price Task Orders.
  • Period of Performance: Base period from June 1, 2026, to May 31, 2031.
  • Estimated Value: Minimum order of $300,000.00; Maximum order of $15,000,000.00.
  • Place of Performance: Contractor's own facilities.
  • Product Service Code: DK10 (Cloud Solutions Delivered As A Service).

Submission & Evaluation

  • Questions Due: February 2, 2026, at 12:00 PM EST to Jennifer Zwahlen (jzwa@loc.gov) and Colleen Daly (cdaly@loc.gov).
  • Proposals Due: February 24, 2026, at 5:00 PM EST.
  • Submission Method: Electronically via email to jzwa@loc.gov and cdaly@loc.gov. Total email attachment size not to exceed 20MB. No zipped files.
  • Proposal Content: Must include four volumes: Technical Approach (including a sample web crawl), Corporate Experience and Capabilities (including Key Personnel resumes), Past Performance (using Attachment J3), and Price (using Attachment J4).
  • Evaluation Criteria: Best-Value Trade-Off (BVTO) approach, with non-price factors (Technical Approach, Corporate Experience and Capabilities, Past Performance) combined being significantly more or equally important to Price. The Library may award without discussions.

Eligibility & Set-Aside

  • Set-Aside: Unrestricted.
  • Contractor Employee Fitness: Personnel with access to Library facilities or IT systems will undergo background investigations and continuous vetting.

Important Attachments

Bidders must review all attachments, including J1 (Description of Tech Environment), J2a (Sample Web Crawl Requirement), J2b (Sample Seed List), J3 (Past Performance Questionnaire), J4 (Price Schedule), J5 (Web Archiving FAQ), J6 (Heritrix 3 Configuration), J7 (ISO WARC file format), and J8 (Sample Task Order SOW) for detailed instructions and technical requirements.

People

Points of Contact

Colleen DalyPRIMARY
Jennifer ZwahlenSECONDARY

Files

Files

No files attached to this opportunity

Versions

Version 8
Combined Synopsis/Solicitation
Posted: Feb 25, 2026
View
Version 7
Combined Synopsis/Solicitation
Posted: Feb 19, 2026
View
Version 6
Combined Synopsis/Solicitation
Posted: Feb 18, 2026
View
Version 5
Combined Synopsis/Solicitation
Posted: Feb 13, 2026
View
Version 4
Combined Synopsis/Solicitation
Posted: Feb 3, 2026
View
Version 3
Combined Synopsis/Solicitation
Posted: Jan 28, 2026
View
Version 2
Combined Synopsis/Solicitation
Posted: Jan 23, 2026
View
Version 1Viewing
Combined Synopsis/Solicitation
Posted: Jan 20, 2026
Web Harvesting Services - Library of Congress | GovScope