Web Harvesting Services - Library of Congress
Overview
Buyer
Place of Performance
NAICS
PSC
Set Aside
Original Source
Timeline
Qualification Details
Fit reasons
- NAICS alignment with historical contract wins in similar service areas.
- Scope strongly matches core technical capabilities and delivery model.
Risks
- Past performance thresholds may require one additional teaming partner.
- Potential clarification needed on staffing minimums before bid/no-bid.
Next steps
Validate eligibility requirements, assign capture owner, and schedule partner outreach to confirm teaming strategy before submission planning.
Quick Summary
The Library of Congress is soliciting proposals for Web Harvesting Services under a Combined Synopsis/Solicitation (RFP 030ADV26R0010). This opportunity seeks contract support for systematic, at-scale web content harvesting, including temporary access, crawl reports, and content transfer for preservation and public access. The contract is an Indefinite-Delivery, Indefinite-Quantity (IDIQ) with firm-fixed-price Task Orders. Proposals are due by 5:00 PM EST on February 24, 2026.
Purpose & Scope
The Library of Congress requires a contractor to perform web content harvesting based on Library staff instructions. This includes capturing various digital objects (HTML, images, PDFs, multimedia), providing temporary access for quality review, generating detailed crawl reports, and transferring content to the Library's S3 bucket in WARC format for preservation and public access. The goal is to enrich the Library's digital collections.
Key Requirements & Deliverables
- Web Content Harvesting: Perform crawls based on Library specifications, seed lists (e.g., Attachment J2b), and scoping instructions, generally ignoring robots.txt. This includes weekly, monthly, extended, and US Election 2026 specific crawls, with estimated data collection of 350-700 Terabytes.
- Data Packaging & Transfer: Package captured content in valid WARC (ISO 28500_2017) files with 11-field CDX indexes for transfer to the Library's S3 bucket via secure internet (HTTPS).
- Quality Review & Reporting: Provide an access tool for Library staff to review crawl results. Generate detailed reports (ASCII text and XML) within 5 days of crawl completion, including statistical information on captured content, resources, and crawl performance. Develop and maintain a Quality Control Program (QCP).
- Infrastructure & Security: Utilize US-based servers for crawling, maintain reliable and secure data storage, and provide a web-based communication tool. Adhere to strict information security policies, including restrictions on Generative AI use and mandatory IT Security Training.
- Key Personnel: Provide qualified Program Manager/Alternate, Crawl Engineer, and Quality Assurance Lead.
- Technical Test Crawl: Offerors must perform a one-time technical test crawl using a Library-provided seed list (Attachment J2b) within 48 hours, delivering results in WARC format with CDX files and reports via SFTP.
Contract Details
- Contract Type: Indefinite-Delivery, Indefinite-Quantity (IDIQ) with firm-fixed-price Task Orders.
- Period of Performance: Base period from June 1, 2026, to May 31, 2031.
- Estimated Value: Minimum order of $300,000.00; Maximum order of $15,000,000.00.
- Place of Performance: Contractor's own facilities.
- Product Service Code: DK10 (Cloud Solutions Delivered As A Service).
Submission & Evaluation
- Questions Due: February 2, 2026, at 12:00 PM EST to Jennifer Zwahlen (jzwa@loc.gov) and Colleen Daly (cdaly@loc.gov).
- Proposals Due: February 24, 2026, at 5:00 PM EST.
- Submission Method: Electronically via email to jzwa@loc.gov and cdaly@loc.gov. Total email attachment size not to exceed 20MB. No zipped files.
- Proposal Content: Must include four volumes: Technical Approach (including a sample web crawl), Corporate Experience and Capabilities (including Key Personnel resumes), Past Performance (using Attachment J3), and Price (using Attachment J4).
- Evaluation Criteria: Best-Value Trade-Off (BVTO) approach, with non-price factors (Technical Approach, Corporate Experience and Capabilities, Past Performance) combined being significantly more or equally important to Price. The Library may award without discussions.
Eligibility & Set-Aside
- Set-Aside: Unrestricted.
- Contractor Employee Fitness: Personnel with access to Library facilities or IT systems will undergo background investigations and continuous vetting.
Important Attachments
Bidders must review all attachments, including J1 (Description of Tech Environment), J2a (Sample Web Crawl Requirement), J2b (Sample Seed List), J3 (Past Performance Questionnaire), J4 (Price Schedule), J5 (Web Archiving FAQ), J6 (Heritrix 3 Configuration), J7 (ISO WARC file format), and J8 (Sample Task Order SOW) for detailed instructions and technical requirements.