RSS Feed Connector

The RSS connector ingests content from RSS and Atom feeds, with optional full article extraction using Mozilla Readability.

Import

import { rss } from '@deepagents/retrieval/connectors';

Basic Usage

import { rss } from '@deepagents/retrieval/connectors';
import { ingest, fastembed, SqliteStore } from '@deepagents/retrieval';
import Database from 'better-sqlite3';

const db = new Database('./vectors.db');
const store = new SqliteStore(db, 384);
const embedder = fastembed();

// Ingest RSS feed
await ingest({
  connector: rss('https://hnrss.org/frontpage'),
  store,
  embedder,
});

Configuration

function rss(feedUrl: string, options?: {
  maxItems?: number;          // Max items to ingest (default: 50)
  fetchFullArticles?: boolean; // Extract full article content (default: false)
}): Connector

Feed URL

Any valid RSS or Atom feed URL:

const connector = rss('https://hnrss.org/frontpage');

Max Items

Limit the number of items to ingest:

const connector = rss('https://example.com/feed', {
  maxItems: 10, // Only ingest latest 10 items
});

Default is 50 items.

Full Article Extraction

Fetch and extract full article content:

const connector = rss('https://blog.example.com/feed', {
  fetchFullArticles: true, // Extract full article text
});

When enabled:

Fetches the article URL from each feed item
Uses Mozilla Readability to extract main content
Falls back to RSS content if extraction fails
Significantly slower but provides complete content

Feed Parsing

The connector supports:

RSS 2.0 - Standard RSS format
RSS 1.0 - RDF-based RSS
Atom - Atom Syndication Format

Parsed Fields

{
  title: string;
  description: string;
  link: string;
  language: string;
  lastBuildDate: string;
  items: Array<{
    title: string;
    description: string;    // Summary or snippet
    link: string;
    pubDate: string;
    author: string;
    categories: string[];
    guid: string;
    contentEncoded: string; // Full content (if available)
  }>;
}

Document Format

Each feed item is ingested as:

Title: {title}
Author: {author}
Published: {pubDate}
Categories: {categories}
Link: {link}
Content:
{content}

Summary: {title} - {description}

Feed Information

A special document contains feed metadata:

RSS Feed: {feed.title}
Description: {feed.description}
Website: {feed.link}
Language: {feed.language}
Last Updated: {lastBuildDate}
Total Items: {count}

This feed provides: {description}

Document ID: feed-info

Full Article Extraction

When fetchFullArticles: true:

How It Works

Fetch HTML - Download article page
Extract Content - Use Mozilla Readability
Validate - Ensure content is substantial (>200 chars)
Fallback - Use RSS content if extraction fails

Example

const connector = rss('https://blog.example.com/feed', {
  maxItems: 5,
  fetchFullArticles: true,
});

await ingest({ connector, store, embedder });

Readability Features

Removes navigation, ads, and clutter
Extracts main article text
Preserves title and structure
Works with most news sites and blogs

Error Handling

Extraction failures are logged but don’t stop ingestion:

// If extraction fails, falls back to RSS content
console.warn(`Failed to fetch article content from ${url}:`, error.message);
return ''; // Empty string fallback

Source ID

const connector = rss('https://example.com/feed');
console.log(connector.sourceId);
// "rss:https://example.com/feed"

Instructions

The connector includes AI agent instructions:

connector.instructions = `
You answer questions about articles and content from the RSS feed: ${feedUrl}.
Always cite the article title and link when referencing specific content.
The feed contains recent articles, blog posts, and news items.
When referencing content, include the publication date and author when available.
${fetchFullArticles ? 'Full article content has been extracted...' : 'Content includes RSS summaries...'}
`;

These instructions can be used with AI agents for context.

Examples

Hacker News Feed

import { rss } from '@deepagents/retrieval/connectors';
import { similaritySearch } from '@deepagents/retrieval';

const connector = rss('https://hnrss.org/frontpage', {
  maxItems: 20,
});

await ingest({ connector, store, embedder });

const results = await similaritySearch(
  'What are the latest AI developments?',
  { connector, store, embedder }
);

console.log(results[0].content);

Blog with Full Articles

const connector = rss('https://blog.example.com/feed', {
  maxItems: 10,
  fetchFullArticles: true, // Extract complete articles
});

await ingest({ connector, store, embedder });

Multiple Feeds

const feeds = [
  rss('https://hnrss.org/frontpage'),
  rss('https://news.ycombinator.com/rss'),
  rss('https://blog.example.com/feed'),
];

for (const connector of feeds) {
  await ingest({ connector, store, embedder });
  console.log(`Ingested: ${connector.sourceId}`);
}

Search Across Multiple Feeds

const feeds = [
  rss('https://techcrunch.com/feed/'),
  rss('https://theverge.com/rss/index.xml'),
];

// Ingest all feeds
for (const connector of feeds) {
  await ingest({ connector, store, embedder });
}

// Search across all
const allResults = [];
for (const connector of feeds) {
  const results = await similaritySearch('AI news', {
    connector,
    store,
    embedder,
  });
  allResults.push(...results);
}

allResults.sort((a, b) => b.similarity - a.similarity);
console.log(`Found ${allResults.length} results across all feeds`);

Metadata

Each document includes metadata:

metadata: {
  title: 'Article Title',
  author: 'Author Name',
  pubDate: '2024-01-01T12:00:00Z',
  categories: ['tech', 'AI'],
  link: 'https://example.com/article',
}

Access metadata in search results:

const results = await similaritySearch('query', config);
results.forEach(r => {
  console.log(`Title: ${r.metadata?.title}`);
  console.log(`Author: ${r.metadata?.author}`);
  console.log(`Link: ${r.metadata?.link}`);
});

Performance Considerations

Without Full Articles

Fast ingestion using RSS content:

const connector = rss('https://example.com/feed', {
  maxItems: 50,
  fetchFullArticles: false, // Fast
});

With Full Articles

Slower due to article fetching:

const connector = rss('https://example.com/feed', {
  maxItems: 10, // Reduce items for faster ingestion
  fetchFullArticles: true, // Slower
});

Timeout

Article fetches have a 10-second timeout:

signal: AbortSignal.timeout(10000) // 10 seconds

User Agent

Article requests use a custom user agent:

'User-Agent': 'Mozilla/5.0 (compatible; RSS-RAG-Bot/1.0)'

Error Handling

Feed Parsing Errors

try {
  await ingest({
    connector: rss('https://invalid-feed.com/feed'),
    store,
    embedder,
  });
} catch (error) {
  console.error('RSS parsing failed:', error.message);
}

Article Extraction Errors

Logged as warnings, don’t stop ingestion:

// Article extraction failure
console.warn(`Failed to fetch article: ${error.message}`);
// Falls back to RSS content

Caching Strategy

Use expiry for time-sensitive content:

const connector = rss('https://news.example.com/feed', {
  maxItems: 20,
  // Re-ingest after 1 hour
});

// Manual expiry in ingestion
await ingest({
  connector,
  store,
  embedder,
});

Or use connector-level strategies:

import { local } from '@deepagents/retrieval/connectors';

// Note: RSS connector doesn't support ingestWhen directly
// Use manual expiry logic or re-ingest periodically

Content Validation

Full articles must be >200 characters:

if (fullContent.length < 200) {
  throw new Error('Extracted content too short');
}

This ensures quality content extraction.

Best Practices

Limit Items for Full Extraction Full article extraction is slow. Limit items:

rss('https://example.com/feed', {
  maxItems: 10,
  fetchFullArticles: true,
})

Use RSS Content for Speed For many feeds, RSS content is sufficient:

rss('https://example.com/feed', {
  maxItems: 50,
  fetchFullArticles: false,
})

Re-ingest Periodically For news feeds, re-ingest regularly to get latest content:

// Every hour
setInterval(async () => {
  await ingest({ connector, store, embedder });
}, 60 * 60 * 1000);

Handle Metadata Use metadata for filtering and display:

const recent = results.filter(r => {
  const pubDate = new Date(r.metadata?.pubDate);
  return pubDate > oneDayAgo;
});

Next Steps

GitHub Connector

Ingest from GitHub

Local Files

Work with local files

Search

Search ingested content

Documentation Index

​RSS Feed Connector

​Import

​Basic Usage

​Configuration

​Feed URL

​Max Items

​Full Article Extraction

​Feed Parsing

​Parsed Fields

​Document Format

​Feed Information

​Full Article Extraction

​How It Works

​Example

​Readability Features

​Error Handling

​Source ID

​Instructions

​Examples

​Hacker News Feed

​Blog with Full Articles

​Multiple Feeds

​Search Across Multiple Feeds

​Metadata

​Performance Considerations

​Without Full Articles

​With Full Articles

​Timeout

​User Agent

​Error Handling

​Feed Parsing Errors

​Article Extraction Errors

​Caching Strategy

​Content Validation

​Best Practices

​Next Steps

GitHub Connector

Local Files

Search

RSS Feed Connector

Import

Basic Usage

Configuration

Feed URL

Max Items

Full Article Extraction

Feed Parsing

Parsed Fields

Document Format

Feed Information

Full Article Extraction

How It Works

Example

Readability Features

Error Handling

Source ID

Instructions

Examples

Hacker News Feed

Blog with Full Articles

Multiple Feeds

Search Across Multiple Feeds

Metadata

Performance Considerations

Without Full Articles

With Full Articles

Timeout

User Agent

Error Handling

Feed Parsing Errors

Article Extraction Errors

Caching Strategy

Content Validation

Best Practices

Next Steps