Documentation Index Fetch the complete documentation index at: https://mintlify.com/JanuaryLabs/deepagents/llms.txt
Use this file to discover all available pages before exploring further.
The RSS connector ingests content from RSS and Atom feeds, with optional full article extraction using Mozilla Readability.
Import
import { rss } from '@deepagents/retrieval/connectors' ;
Basic Usage
import { rss } from '@deepagents/retrieval/connectors' ;
import { ingest , fastembed , SqliteStore } from '@deepagents/retrieval' ;
import Database from 'better-sqlite3' ;
const db = new Database ( './vectors.db' );
const store = new SqliteStore ( db , 384 );
const embedder = fastembed ();
// Ingest RSS feed
await ingest ({
connector: rss ( 'https://hnrss.org/frontpage' ),
store ,
embedder ,
});
Configuration
function rss ( feedUrl : string , options ?: {
maxItems ?: number ; // Max items to ingest (default: 50)
fetchFullArticles ?: boolean ; // Extract full article content (default: false)
}) : Connector
Feed URL
Any valid RSS or Atom feed URL:
const connector = rss ( 'https://hnrss.org/frontpage' );
Max Items
Limit the number of items to ingest:
const connector = rss ( 'https://example.com/feed' , {
maxItems: 10 , // Only ingest latest 10 items
});
Default is 50 items.
Full Article Extraction
Fetch and extract full article content:
const connector = rss ( 'https://blog.example.com/feed' , {
fetchFullArticles: true , // Extract full article text
});
When enabled:
Fetches the article URL from each feed item
Uses Mozilla Readability to extract main content
Falls back to RSS content if extraction fails
Significantly slower but provides complete content
Feed Parsing
The connector supports:
RSS 2.0 - Standard RSS format
RSS 1.0 - RDF-based RSS
Atom - Atom Syndication Format
Parsed Fields
{
title : string ;
description : string ;
link : string ;
language : string ;
lastBuildDate : string ;
items : Array <{
title : string ;
description : string ; // Summary or snippet
link : string ;
pubDate : string ;
author : string ;
categories : string [];
guid : string ;
contentEncoded : string ; // Full content (if available)
}>;
}
Each feed item is ingested as:
Title: {title}
Author: {author}
Published: {pubDate}
Categories: {categories}
Link: {link}
Content:
{content}
Summary: {title} - {description}
A special document contains feed metadata:
RSS Feed: {feed.title}
Description: {feed.description}
Website: {feed.link}
Language: {feed.language}
Last Updated: {lastBuildDate}
Total Items: {count}
This feed provides: {description}
Document ID: feed-info
Full Article Extraction
When fetchFullArticles: true:
How It Works
Fetch HTML - Download article page
Extract Content - Use Mozilla Readability
Validate - Ensure content is substantial (>200 chars)
Fallback - Use RSS content if extraction fails
Example
const connector = rss ( 'https://blog.example.com/feed' , {
maxItems: 5 ,
fetchFullArticles: true ,
});
await ingest ({ connector , store , embedder });
Readability Features
Removes navigation, ads, and clutter
Extracts main article text
Preserves title and structure
Works with most news sites and blogs
Error Handling
Extraction failures are logged but don’t stop ingestion:
// If extraction fails, falls back to RSS content
console . warn ( `Failed to fetch article content from ${ url } :` , error . message );
return '' ; // Empty string fallback
Source ID
const connector = rss ( 'https://example.com/feed' );
console . log ( connector . sourceId );
// "rss:https://example.com/feed"
Instructions
The connector includes AI agent instructions:
connector . instructions = `
You answer questions about articles and content from the RSS feed: ${ feedUrl } .
Always cite the article title and link when referencing specific content.
The feed contains recent articles, blog posts, and news items.
When referencing content, include the publication date and author when available.
${ fetchFullArticles ? 'Full article content has been extracted...' : 'Content includes RSS summaries...' }
` ;
These instructions can be used with AI agents for context.
Examples
Hacker News Feed
import { rss } from '@deepagents/retrieval/connectors' ;
import { similaritySearch } from '@deepagents/retrieval' ;
const connector = rss ( 'https://hnrss.org/frontpage' , {
maxItems: 20 ,
});
await ingest ({ connector , store , embedder });
const results = await similaritySearch (
'What are the latest AI developments?' ,
{ connector , store , embedder }
);
console . log ( results [ 0 ]. content );
Blog with Full Articles
const connector = rss ( 'https://blog.example.com/feed' , {
maxItems: 10 ,
fetchFullArticles: true , // Extract complete articles
});
await ingest ({ connector , store , embedder });
Multiple Feeds
const feeds = [
rss ( 'https://hnrss.org/frontpage' ),
rss ( 'https://news.ycombinator.com/rss' ),
rss ( 'https://blog.example.com/feed' ),
];
for ( const connector of feeds ) {
await ingest ({ connector , store , embedder });
console . log ( `Ingested: ${ connector . sourceId } ` );
}
Search Across Multiple Feeds
const feeds = [
rss ( 'https://techcrunch.com/feed/' ),
rss ( 'https://theverge.com/rss/index.xml' ),
];
// Ingest all feeds
for ( const connector of feeds ) {
await ingest ({ connector , store , embedder });
}
// Search across all
const allResults = [];
for ( const connector of feeds ) {
const results = await similaritySearch ( 'AI news' , {
connector ,
store ,
embedder ,
});
allResults . push ( ... results );
}
allResults . sort (( a , b ) => b . similarity - a . similarity );
console . log ( `Found ${ allResults . length } results across all feeds` );
Each document includes metadata:
metadata : {
title : 'Article Title' ,
author : 'Author Name' ,
pubDate : '2024-01-01T12:00:00Z' ,
categories : [ 'tech' , 'AI' ],
link : 'https://example.com/article' ,
}
Access metadata in search results:
const results = await similaritySearch ( 'query' , config );
results . forEach ( r => {
console . log ( `Title: ${ r . metadata ?. title } ` );
console . log ( `Author: ${ r . metadata ?. author } ` );
console . log ( `Link: ${ r . metadata ?. link } ` );
});
Without Full Articles
Fast ingestion using RSS content:
const connector = rss ( 'https://example.com/feed' , {
maxItems: 50 ,
fetchFullArticles: false , // Fast
});
With Full Articles
Slower due to article fetching:
const connector = rss ( 'https://example.com/feed' , {
maxItems: 10 , // Reduce items for faster ingestion
fetchFullArticles: true , // Slower
});
Timeout
Article fetches have a 10-second timeout:
signal : AbortSignal . timeout ( 10000 ) // 10 seconds
User Agent
Article requests use a custom user agent:
'User-Agent' : 'Mozilla/5.0 (compatible; RSS-RAG-Bot/1.0)'
Error Handling
Feed Parsing Errors
try {
await ingest ({
connector: rss ( 'https://invalid-feed.com/feed' ),
store ,
embedder ,
});
} catch ( error ) {
console . error ( 'RSS parsing failed:' , error . message );
}
Article Extraction Errors
Logged as warnings, don’t stop ingestion:
// Article extraction failure
console . warn ( `Failed to fetch article: ${ error . message } ` );
// Falls back to RSS content
Caching Strategy
Use expiry for time-sensitive content:
const connector = rss ( 'https://news.example.com/feed' , {
maxItems: 20 ,
// Re-ingest after 1 hour
});
// Manual expiry in ingestion
await ingest ({
connector ,
store ,
embedder ,
});
Or use connector-level strategies:
import { local } from '@deepagents/retrieval/connectors' ;
// Note: RSS connector doesn't support ingestWhen directly
// Use manual expiry logic or re-ingest periodically
Content Validation
Full articles must be >200 characters:
if ( fullContent . length < 200 ) {
throw new Error ( 'Extracted content too short' );
}
This ensures quality content extraction.
Best Practices
Limit Items for Full Extraction
Full article extraction is slow. Limit items:
rss ( 'https://example.com/feed' , {
maxItems: 10 ,
fetchFullArticles: true ,
})
Use RSS Content for Speed
For many feeds, RSS content is sufficient:
rss ( 'https://example.com/feed' , {
maxItems: 50 ,
fetchFullArticles: false ,
})
Re-ingest Periodically
For news feeds, re-ingest regularly to get latest content:
// Every hour
setInterval ( async () => {
await ingest ({ connector , store , embedder });
}, 60 * 60 * 1000 );
Handle Metadata
Use metadata for filtering and display:
const recent = results . filter ( r => {
const pubDate = new Date ( r . metadata ?. pubDate );
return pubDate > oneDayAgo ;
});
Next Steps
GitHub Connector Ingest from GitHub
Local Files Work with local files
Search Search ingested content