Introduction
Amazon S3 (Simple Storage Service) is the industry-standard object storage platform, commonly used for document archives, media libraries, data lakes, and backups. With Alchemyst’s Amazon S3 integration, you can sync entire buckets, search semantically across documents and media, and keep your agent’s knowledge synchronized with your storage. Your agent sees files as meaning, not just bytes.Why Connect Amazon S3?
Traditional file-based workflows break down because:- Files are scattered across services
- Manual operations are error-prone
- Large files exceed context windows
- There’s no way to query across files semantically
How to Connect
Prerequisites:- AWS account with S3 access
- S3 bucket with files to sync
- IAM credentials with read permissions
- Bucket Name (S3 bucket identifier)
- AWS Access Key ID (IAM user access key)
- AWS Secret Access Key (IAM user secret key)
- Region (AWS region, e.g.,
us-east-1) - Prefix/Folder (optional path filter)
What Gets Indexed
Alchemyst can index: Documents:- PDF, DOCX, PPTX, TXT, MD
- CSV, JSON, JSONL, XML, Parquet, YAML
- PNG, JPG, SVG (with OCR and vision models)
- All text-based source files
IAM Permissions
Create a dedicated IAM user with read-only S3 access. Grant only the following permissions to the specific buckets you want to sync:s3:GetObject- Read objects from the buckets3:ListBucket- List objects in the bucket
Security Best Practices
For Amazon S3 integrations:- Use IAM roles instead of access keys when possible
- Grant only read-only permissions
- Enable bucket encryption (SSE-S3 or SSE-KMS)
- Restrict public access to sensitive buckets
- Use HTTPS-only access policies
- Enable S3 access logging to monitor activity
- Rotate credentials regularly
- Use bucket policies to enforce encryption in transit
- Enable versioning for critical data
Performance & Cost Optimization
Sync Strategies:- Full Sync: For small buckets with static content
- Incremental Sync: For large buckets with frequent updates
- Event-Driven: For real-time updates using S3 event notifications
- Use S3 Intelligent-Tiering for infrequent access
- Set lifecycle policies to archive old files
- Limit sync frequency for static content
- Use prefix filters to avoid listing entire buckets
- Filter by file type to exclude unnecessary files
- Monitor data transfer costs and optimize accordingly
Prefix Filtering
Use the Prefix/Folder field to sync only specific directories within your bucket:- Leave empty to sync the entire bucket
- Use
documents/to sync only the documents folder - Use
data/2024/to sync a specific year’s data - Combine with file type filters for precise control
Next Steps
Once Amazon S3 is connected, you can:- Search across your files semantically
- Combine cloud storage with databases and other sources
- Enable real-time sync with webhooks
- Process multimodal content (PDFs, images, CSVs, JSON)

