Web Harvesting Services - Library of Congress
Overview
Buyer
Place of Performance
NAICS
PSC
Set Aside
Original Source
Timeline
Qualification Details
Fit reasons
- NAICS alignment with historical contract wins in similar service areas.
- Scope strongly matches core technical capabilities and delivery model.
Risks
- Past performance thresholds may require one additional teaming partner.
- Potential clarification needed on staffing minimums before bid/no-bid.
Next steps
Validate eligibility requirements, assign capture owner, and schedule partner outreach to confirm teaming strategy before submission planning.
Quick Summary
The Library of Congress is soliciting proposals for Web Harvesting Services under a Combined Synopsis/Solicitation (RFP 030ADV26R0010). This opportunity seeks contract support for the systematic, at-scale harvesting of web content, providing temporary access for quality review, and enabling transfer to the Library for preservation and public access. The contract is an Indefinite-Delivery, Indefinite-Quantity (IDIQ) with firm-fixed-price task orders. Proposals are due February 24, 2026, at 5:00 PM EST.
Scope of Work
The contractor will perform web content harvesting based on Library specifications, seed lists, and scoping instructions, generally ignoring robots.txt. This includes comprehensive capture of various digital objects (HTML, images, PDFs, multimedia) to accurately replicate webpages, employing politeness factors. Key deliverables involve packaging captured content in valid WARC files (ISO 28500_2017) with 11-field CDX indexes for transfer to the Library's S3 bucket via secure internet. The Library estimates collecting 350-700 Terabytes (TB) of data.
The scope also requires providing an access tool for Library staff to review crawl results, generating detailed reports (ASCII text and XML) within 5 days of crawl completion, and developing a Quality Control Program (QCP). Infrastructure must utilize US-based servers with reliable and secure data storage. Contractors must adhere to strict information security policies, including restrictions on Generative AI use and mandatory IT Security Training. Key personnel, including a Program Manager, Crawl Engineer, and Quality Assurance Lead, are required. Specific crawl types include weekly, monthly, extended, and US Election 2026 weekly crawls.
Contract Details
- Contract Type: Indefinite-Delivery, Indefinite-Quantity (IDIQ) with firm-fixed-price Task Orders.
- Period of Performance: Base period from June 1, 2026, to May 31, 2031.
- Estimated Value: Minimum order of $300,000.00; Maximum order of $15,000,000.00.
- Set-Aside: Unrestricted.
- Place of Performance: Contractor's own facilities.
- Product/Service Code: DK10 (Cloud Solutions Delivered As A Service).
Submission Requirements
Proposals must be submitted electronically via email to both the Contracting Officer (jzwa@loc.gov) and Contract Specialist (cdaly@loc.gov). Total email attachment size must not exceed 20MB, and zipped files are not permitted. Proposals must be valid through June 6, 2026. The proposal should include four volumes:
- Technical Approach: Including a sample web crawl.
- Corporate Experience and Capabilities: Including Key Personnel resumes.
- Past Performance: Using Attachment J3.
- Price: Using Attachment J4, proposing firm fixed-priced amounts per terabyte for each of the five ordering periods.
A mandatory technical test crawl is required, using a Library-provided seed list (Attachment J2b) within 48 hours. Offerors must notify the Library of the sample crawl data size by February 17, 2026, at 5:00 PM EST.
Evaluation Criteria
Award will be based on a Best-Value Trade-Off (BVTO) approach. Evaluation factors, in descending order of importance, are: Technical Approach, Corporate Experience and Capabilities, Past Performance, and Price. Non-price factors combined are significantly more or equally important to price. The Library reserves the right to award without discussions.
Key Dates & Contacts
- Questions Due: February 2, 2026, 12:00 Noon EST.
- Past Performance Questionnaires Due: February 9, 2026, 12:00 Noon EST.
- Sample Web Crawl Notification Due: February 17, 2026, 5:00 PM EST.
- Proposals Due: February 24, 2026, 5:00 PM EST.
- Primary Contact: Colleen Daly (cdaly@loc.gov)
- Secondary Contact: Jennifer Zwahlen (jzwa@loc.gov)