Fetch Safety

Resource Limits

Limit	Default	Configurable
Maximum response body	256 MiB	`FetchConfig.max_body_size`
Maximum redirect chain	10	`FetchConfig.max_redirects`
Connect timeout	30 seconds	`FetchConfig.connect_timeout`
Total request timeout	120 seconds	`FetchConfig.total_timeout`

Responses with Content-Encoding: gzip (or brotli, deflate) are decompressed with size validation. The decompressed size is checked against Content-Length * reasonable_ratio to prevent zip bomb attacks.

Unsafe URL Schemes

Link extraction blocks unsafe URL schemes. These are logged but never followed:

javascript: — code execution
data: — embedded content (can be arbitrarily large)
blob: — browser-internal references

HTML Sanitization

Before link extraction, <script> and <style> tag content is stripped entirely. This prevents extracting junk URLs from JavaScript source code (e.g., minified variable names that look like relative paths).

#![allow(unused)]
fn main() {
pub fn extract_links(html: &str, base_url: &Url) -> Vec<Url> {
    let cleaned = strip_tag_content(html, &["script", "style"]);
    // ... scan for href, src attributes
}
}

robots.txt Enforcement

Palimpsest respects robots.txt per RFC 9309:

Fetches and caches robots.txt per origin before crawling
Respects Disallow directives for the configured user agent
Honors Crawl-delay when specified
Blocked URLs are counted in metrics (palimpsest_robots_blocked)

Palimpsest Documentation

Fetch Safety

Resource Limits

Decompression Bomb Protection

Unsafe URL Schemes

HTML Sanitization

robots.txt Enforcement

Keyboard shortcuts

Palimpsest Documentation

Fetch Safety

Resource Limits

Decompression Bomb Protection

Unsafe URL Schemes

HTML Sanitization

robots.txt Enforcement