Web Harvesting Services - Library of Congress
Overview
Buyer
Place of Performance
NAICS
PSC
Set Aside
Original Source
Timeline
Qualification Details
Fit reasons
- NAICS alignment with historical contract wins in similar service areas.
- Scope strongly matches core technical capabilities and delivery model.
Risks
- Past performance thresholds may require one additional teaming partner.
- Potential clarification needed on staffing minimums before bid/no-bid.
Next steps
Validate eligibility requirements, assign capture owner, and schedule partner outreach to confirm teaming strategy before submission planning.
Quick Summary
The Library of Congress is soliciting proposals for Web Harvesting Services to systematically collect web content at-scale for preservation and public access. This is an Unrestricted competition for a Firm-Fixed-Price Indefinite-Delivery, Indefinite-Quantity (IDIQ) contract. Proposals are due February 24, 2026, at 5:00 PM EST.
Scope of Work
The contractor will provide comprehensive web harvesting services, including:
- Systematic Content Capture: Perform crawls based on Library specifications, seed lists, and scoping instructions, capturing various digital objects (HTML, images, PDFs, multimedia) while employing politeness factors.
- Data Packaging & Transfer: Package captured content into valid WARC files (ISO 28500_2017) with 11-field CDX indexes, transferring them to the Library's S3 bucket via secure HTTPS. Single BagIt bags should not exceed 1 TB.
- Quality Review & Reporting: Provide an access tool for Library staff to review crawl results and generate detailed reports (ASCII text and XML) within five days of crawl completion. A Quality Control Program (QCP) must be developed and maintained.
- Infrastructure & Security: Utilize US-based servers for crawling, maintain reliable and secure data storage, and provide a web-based communication tool. Adherence to strict information security policies, including restrictions on Generative AI use, is mandatory.
- Key Personnel: The contractor must provide qualified Program Manager/Alternate, Crawl Engineer, and Quality Assurance Lead.
- Estimated Volume: The anticipated crawl volume is 300-700 TB per year.
Contract Details
- Contract Type: Firm-Fixed-Price Indefinite-Delivery, Indefinite-Quantity (IDIQ) with Task Orders.
- Period of Performance: Base period from June 1, 2026, to May 31, 2031.
- Estimated Value: Minimum order of $300,000.00; Maximum order of $15,000,000.00.
- Place of Performance: Contractor's own facilities.
- Set-Aside: Unrestricted.
- Product Service Code: DK10 - Cloud Solutions Delivered As A Service.
Submission Requirements
- Proposal Due Date: February 24, 2026, at 5:00 PM EST.
- Sample Web Crawl Size Information Due: February 17, 2026, by 5:00 PM ET.
- Past Performance Questionnaires (PPQs) Due: February 24, 2026, by noon ET. PPQs must be sent directly from the past performance reference to the Contracting Team.
- Submission Method: Electronically via email to Jennifer Zwahlen (jzwa@loc.gov) and Colleen Daly (cdaly@loc.gov). Total email attachment size must not exceed 20MB, and no zipped files are permitted. Proposals must remain valid through June 6, 2026.
- Proposal Content: Must include four volumes: Technical Approach (including a sample web crawl), Corporate Experience and Capabilities (including Key Personnel resumes), Past Performance (using Attachment J3), and Price (using Attachment J4).
Evaluation Criteria
Proposals will be evaluated using a Best-Value Trade-Off (BVTO) approach. Evaluation factors, in descending order of importance, are: Technical Approach, Corporate Experience and Capabilities, Past Performance, and Price. Non-price factors combined are significantly more or equally important to price. The Library may award without discussions.
Key Clarifications
The Library will not consider approaches where crawl data is written directly into Library-owned cloud infrastructure. The pricing metric will be based on the total compressed WARC size.