Self Learning Postgres DB

Self Learning Postgres DB

by ruv

Self-learning vector database with GNN-powered index optimization. Features: vector search, RAG queries, embeddings, clustering, deduplication, batch ...

22 runs

2 users

Opens on Apify.com

About Self Learning Postgres DB

Self-learning vector database with GNN-powered index optimization. Features: vector search, RAG queries, embeddings, clustering, deduplication, batch ops, and data import/export. Scales with Raft consensus.

What does this actor do?

Self Learning Postgres DB is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

Self-Learning Postgres DB - Vector Database for AI Agents A distributed vector database that truly learns. Store embeddings, query with semantic search, and let the index improve itself through TRM (Tiny Recursive Models), SONA (Self-Optimizing Neural Architecture), and Graph Neural Networks. ## Key AI Features | Feature | Description | |---------|-------------| | TRM | 7M parameter recursive reasoning (83% on GSM8K) | | SONA | 3-tier learning (Instant/Background/Deep) | | EWC++ | Anti-forgetting protection (λ=2000) | | GNN | Graph Neural Network index optimization | | Trajectory Tracking | Learn from query patterns | --- ## Features 30+ Operations for complete vector database management: - Semantic Search - Find documents by meaning, not just keywords - Batch Operations - Insert and search thousands of documents efficiently - Hybrid Search - Combine vector similarity with keyword matching - RAG Support - Built-in Retrieval-Augmented Generation queries - Self-Learning - GNN training for index optimization - Clustering - K-means document clustering - Deduplication - Find and remove duplicate content - Export/Import - JSON and CSV data migration Zero Setup Required: - Embedded PostgreSQL with ruvector extension - Local AI embeddings (no OpenAI API key needed) - Automatic table and index creation --- ## Quick Start (30 Seconds) ### Full Demo json { "action": "full_workflow", "query": "How does machine learning work?", "documents": [ {"content": "Machine learning is AI that learns patterns from data.", "metadata": {"category": "AI"}}, {"content": "PostgreSQL is a powerful relational database.", "metadata": {"category": "Database"}}, {"content": "Neural networks consist of layers of nodes.", "metadata": {"category": "AI"}}, {"content": "Vector databases store embeddings for similarity search.", "metadata": {"category": "Database"}} ] } Result: Documents ranked by semantic relevance to your query. --- ## All 38 Actions ### Document Operations | Action | Description | |--------|-------------| | `insert` | Add documents with auto-generated embeddings | | `batch_insert` | Efficiently insert large document sets | | `get` | Retrieve single document by ID | | `list` | List documents with filtering | | `update` | Modify existing document content/metadata | | `delete` | Remove documents by ID, IDs, or filter | | `upsert` | Insert or update (smart merge) | ### Search Operations | Action | Description | |--------|-------------| | `search` | Semantic similarity search | | `batch_search` | Multiple queries in one call | | `hybrid_search` | Vector + BM25 keyword combined | | `multi_query_search` | Aggregate results from multiple queries | | `mmr_search` | Maximal Marginal Relevance (diverse results) | | `graph_search` | Graph-based similarity traversal | | `range_search` | All results within distance threshold | ### Table Operations | Action | Description | |--------|-------------| | `create_table` | Create new vector collection | | `drop_table` | Delete collection | | `list_tables` | Show all vector tables | | `table_stats` | Collection statistics and metrics | | `create_index` | Add HNSW or IVFFlat index | | `reindex` | Rebuild indexes | ### Self-Learning / GNN / SONA | Action | Description | |--------|-------------| | `train_gnn` | Train Graph Neural Network on data | | `optimize_index` | Auto-tune HNSW parameters | | `analyze_patterns` | Analyze data distribution | | `sona_learn` | Trigger TRM/SONA background learning cycle | | `sona_status` | Check SONA learning status and capabilities | ### Clustering & Deduplication | Action | Description | |--------|-------------| | `cluster` | K-means document clustering | | `find_duplicates` | Detect similar document pairs | | `deduplicate` | Remove duplicate documents | ### Data Operations | Action | Description | |--------|-------------| | `export` | Export to JSON or CSV | | `import` | Import from JSON data | ### AI / RAG | Action | Description | |--------|-------------| | `rag_query` | Build RAG context from search results | | `summarize` | Document statistics and previews | ### Utility | Action | Description | |--------|-------------| | `ping` | Test database connection | | `version` | Get version and feature info | | `embedding_models` | List available models | | `generate_embedding` | Create embeddings without storing | | `similarity` | Compare similarity of two texts | --- ## Use Cases ### 1. AI Agent Memory `json { "action": "insert", "tableName": "agent_memory", "documents": [ {"content": "User prefers dark mode", "metadata": {"user_id": "123", "type": "preference"}}, {"content": "User asked about Python tutorials", "metadata": {"user_id": "123", "type": "history"}} ] }` Retrieve memories: `json { "action": "search", "tableName": "agent_memory", "query": "What does this user like?", "filter": "metadata->>'user_id' = '123'" }` ### 2. RAG Pipeline `json { "action": "rag_query", "query": "How do I return a product?", "topK": 5, "ragMaxTokens": 2000 }` Returns context ready to feed to your LLM. ### 3. Batch Document Processing `json { "action": "batch_insert", "batchSize": 100, "documents": [ // ... thousands of documents ] }` ### 4. Find & Remove Duplicates `json { "action": "find_duplicates", "similarityThreshold": 0.95 }` Then: `json { "action": "deduplicate", "similarityThreshold": 0.95 }` ### 5. Document Clustering `json { "action": "cluster", "numClusters": 10, "clusteringAlgorithm": "kmeans" }` ### 6. Index Optimization `json { "action": "optimize_index", "enableLearning": true }` ### 7. SONA Self-Learning Check learning status: `json { "action": "sona_status" }` Trigger learning cycle: `json { "action": "sona_learn", "ewcLambda": 2000, "patternThreshold": 0.7 }` --- ## Parameters Reference ### Core Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `action` | string | `search` | Operation to perform | | `connectionString` | string | embedded | PostgreSQL URL for persistence | | `tableName` | string | `documents` | Table/collection name | ### Search Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `query` | string | - | Natural language search query | | `queryVector` | array | - | Pre-computed embedding vector | | `topK` | integer | 10 | Number of results | | `distanceMetric` | string | `cosine` | cosine, l2, inner_product, manhattan | | `filter` | string | - | SQL WHERE clause | | `minScore` | number | 0 | Minimum similarity score (0-1) | | `maxDistance` | number | - | Maximum distance threshold | ### Embedding Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `embeddingModel` | string | `all-MiniLM-L6-v2` | AI embedding model | | `generateEmbeddings` | boolean | true | Auto-generate embeddings | | `dimensions` | integer | 384 | Vector dimensions | ### Index Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `indexType` | string | `hnsw` | hnsw, ivfflat, none | | `hnswM` | integer | 16 | HNSW max connections | | `hnswEfConstruction` | integer | 64 | HNSW build quality | | `hnswEfSearch` | integer | 100 | HNSW search quality | | `ivfLists` | integer | 100 | IVFFlat partitions | ### GNN Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `enableLearning` | boolean | false | Enable self-learning | | `learningRate` | number | 0.01 | GNN learning rate | | `gnnLayers` | integer | 2 | GNN layer count | | `trainEpochs` | integer | 10 | Training epochs | ### SONA / TRM Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `sonaEnabled` | boolean | true | Enable TRM/SONA self-learning | | `ewcLambda` | number | 2000 | EWC++ anti-forgetting strength | | `patternThreshold` | number | 0.7 | Pattern recognition confidence | | `maxTrajectories` | integer | 100 | Max trajectory steps to track | | `sonaLearningTiers` | array | ["instant", "background"] | Learning tiers to enable | ### Clustering Parameters | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `numClusters` | integer | 10 | K-means clusters | | `similarityThreshold` | number | 0.95 | Duplicate detection threshold | --- ## Embedding Models | Model | Dimensions | Speed | Quality | Best For | |-------|------------|-------|---------|----------| | `all-MiniLM-L6-v2` | 384 | Fast | Good | Prototyping | | `bge-small-en-v1.5` | 384 | Fast | Excellent | Production | | `bge-base-en-v1.5` | 768 | Medium | Better | High accuracy | | `nomic-embed-text-v1` | 768 | Medium | Good | Long documents (8K) | | `gte-small` | 384 | Fast | Good | General use | | `e5-small-v2` | 384 | Fast | Good | Multilingual | --- ## Persistent Storage ### Hybrid Persistence Architecture ┌─────────────────────────────────────────────────────────┐ │ Actor Run │ │ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │ │ │ Key-Value │───▶│ Embedded │───▶│ Key-Value │ │ │ │ Store (load) │ │ PostgreSQL │ │ (save) │ │ │ └──────────────┘ └──────────────┘ └───────────┘ │ │ START WORK END │ └─────────────────────────────────────────────────────────┘ Flow: 1. On Start → Load documents from Key-Value Store into embedded PostgreSQL 2. During Run → Full vector search capabilities (HNSW, cosine, etc.) 3. On End → Export documents back to Key-Value Store ### Storage Options Comparison | Feature | External PostgreSQL | Apify Key-Value Store | |---------|---------------------|----------------------| | Setup required | Yes | No | | Cost | Separate billing | Included in Apify | | Max size | Unlimited | ~9GB per store | | Cold start | Fast | Slower (load data) | | Best for | Large/production | Small-medium datasets | ### External PostgreSQL For persistent storage with external database: `json { "connectionString": "postgresql://user:password@host:5432/database", "action": "search", "query": "Your query" }` Supported: - PostgreSQL 14+ with ruvector extension - PostgreSQL with pgvector (compatibility mode) - Supabase, Neon, AWS RDS, etc. --- ## API Integration ### Python `python from apify_client import ApifyClient client = ApifyClient("your-api-token") run = client.actor("ruv/self-learning-postgres-db").call(run_input={ "action": "search", "query": "machine learning basics", "topK": 5 }) results = client.dataset(run["defaultDatasetId"]).list_items().items` ### JavaScript `javascript import { ApifyClient } from 'apify-client'; const client = new ApifyClient({ token: 'your-api-token' }); const run = await client.actor('ruv/self-learning-postgres-db').call({ action: 'search', query: 'machine learning basics', topK: 5 }); const results = await client.dataset(run.defaultDatasetId).listItems();` ### cURL `bash curl -X POST "https://api.apify.com/v2/acts/ruv~self-learning-postgres-db/runs" \ -H "Authorization: Bearer YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "action": "search", "query": "machine learning", "topK": 10 }'` --- ## Performance Built on PostgreSQL 17.7 with AVX-512 SIMD acceleration: | Dataset Size | Search Time | Accuracy | |--------------|-------------|----------| | 10,000 docs | ~0.3ms | 99%+ | | 100,000 docs | ~0.5ms | 99%+ | | 1,000,000 docs | ~1.2ms | 98%+ | --- ## Pricing (Apify Pay-per-event) ### Core Events | Event | Price | Description | |-------|-------|-------------| | Actor Start | $0.001 | Per GB memory used | | Document Insert | $0.001 | Per document stored | | Vector Search | $0.001 | Per search query | | Result | $0.0005 | Per result returned | ### Advanced Operations | Event | Price | Description | |-------|-------|-------------| | Batch Operation | $0.002 | Per batch insert/search | | RAG Query | $0.002 | Per RAG context build | | GNN Training | $0.01 | Per training session | | Clustering | $0.005 | Per cluster operation | | Deduplication | $0.003 | Per dedupe run | | Data Export | $0.002 | Per export | | Data Import | $0.002 | Per import | | Table Operation | $0.001 | Create/drop table | | Index Operation | $0.002 | Create/optimize index | | Similarity Check | $0.001 | Per comparison | | Embedding Generation | $0.001 | Per embedding | Volume Discounts: - Bronze: -14% off results - Silver: -26% off results - Gold: -40% off results --- ## Development ### Local Testing `bash # Start ruvector-postgres docker run -d --name ruvector-pg -e POSTGRES_PASSWORD=secret -p 5432:5432 ruvnet/ruvector-postgres:latest # Run tests DATABASE_URL="postgresql://postgres:secret@localhost:5432/postgres" npm test` ### Deployment `bash # Set your API token in root .env echo "APIFY_API_TOKEN=your_token" >> ../../../.env # Deploy npm run deploy` --- ## Links - GitHub Repository - Apify Store - Docker Image - RuVector Documentation --- ## Support - Open an Issue - Apify Community --- Built with RuVector - High-performance vector search with TRM/SONA self-learning for the AI era.

Categories

AGENTS AUTOMATION

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Self Learning Postgres DB now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: ruv
Pricing: Paid
Total Runs: 22
Active Users: 2

Related Actors

YouTube Video Transcript

YouTube Video Transcript

by starvibe

Reddit Scraper

by macrocosmos

Perplexity 2.0

by winbayai

Idealista.com

Idealista.com

by lukass

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support