PGVector Integration

Name: PGVector Integration
Author: apify

by apify

Effortlessly transfer data from Apify actors to a Postgres database with PGVector for vector search. Perfect for building RAG apps and similarity search pipelines.

103 runs

16 users

Try This Actor

Opens on Apify.com

About PGVector Integration

Need to get your scraped web data into a Postgres database with vector search capabilities? This actor is for you. It’s a straightforward integration that moves data directly from your Apify actors into a Postgres database that has the PGVector extension installed. Think of it as a reliable bridge between your data collection pipelines and a powerful, searchable vector store. I use it when I've scraped product catalogs or document sets and need to run similarity searches or build a RAG application without a complicated setup. It handles the connection and data mapping, so you can focus on querying your data with SQL and vector operations. The main benefit is simplicity—you configure your dataset and database credentials, and it handles the transfer. It’s perfect for developers who want to combine Apify's scraping strength with the flexibility of Postgres for AI-driven data analysis, all while keeping their stack open-source and under their control.

What does this actor do?

PGVector Integration is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

Cloud-based execution - no local setup required
Scalable infrastructure for large-scale operations
API access for integration with your applications
Built-in proxy rotation and anti-blocking measures
Scheduled runs and webhooks for automation

How to Use

Click "Try This Actor" to open it on Apify
Create a free Apify account if you don't have one
Configure the input parameters as needed
Run the actor and download your results

Documentation

PGVector Integration

Transfers data from Apify Actors to PostgreSQL with the PGVector extension. It processes datasets, optionally splits text into chunks, computes embeddings, and stores them for efficient search and retrieval, particularly useful for Retrieval Augmented Generation (RAG) applications.

Overview

This Actor is an integration tool designed to work with other Apify Actors. For example, you can connect it to the Website Content Crawler to automatically save crawled web content as vector embeddings in your PostgreSQL database. It uses LangChain for text processing and embedding computation.

Key Features

Vector Storage: Computes text embeddings and stores them in a PostgreSQL database with the PGVector extension.
Incremental Updates: Can be configured to update only changed data, reducing unnecessary compute and storage operations.
Text Chunking: Optionally splits long text into smaller chunks using LangChain's RecursiveCharacterTextSplitter for better embedding quality.
Embedding Provider Flexibility: Supports multiple providers like OpenAI and Cohere for generating embeddings.
Dataset Field Mapping: Lets you specify which dataset fields to store as content and which to keep as metadata.

How to Use

This integration runs automatically when configured within another Actor's integration settings. You don't run it as a standalone task.

Prerequisites:
- A PostgreSQL database with the PGVector extension installed.
- Your database connection string (postgresSqlConnectionStr) and a target collection/table name (postgresCollectionName).
- An API key for your chosen embeddings provider (e.g., from OpenAI).
Configuration: In your source Actor (e.g., Website Content Crawler), activate the PGVector integration and provide the required input. The main configuration sections are:
- Database Connection: Your PostgreSQL credentials and target collection.
- Embeddings Provider: Your chosen provider (e.g., OpenAIEmbeddings) and its API key/model settings.
- Data Mapping: Which fields from the source dataset contain the text and metadata.
Process Flow:
- The integration fetches the dataset from the source Actor.
- (Optional) It splits the text data into chunks based on your chunkSize and chunkOverlap settings.
- (Optional) It identifies and processes only new or modified data if incremental updates are enabled.
- It sends the text to the configured embeddings API to compute vector representations.
- It saves the vectors, along with the original text and any metadata, to your PostgreSQL database.

Input / Output

Input Schema

The integration is configured via input fields when setting it up on a source Actor. Full details are on the Input page.

Essential Configuration Example:

{
  "postgresSqlConnectionStr": "postgresql://user:password@host:5432/dbname",
  "postgresCollectionName": "my_docs",
  "embeddingsProvider": "OpenAIEmbeddings",
  "embeddingsApiKey": "your-openai-key",
  "embeddingsConfig": {"model": "text-embedding-3-small"},
  "datasetFields": ["text"],
  "metadataDatasetFields": {"url": "url", "title": "metadata.title"}
}

postgresSqlConnectionStr: Your PostgreSQL connection string.
postgresCollectionName: The table name where vectors will be stored.
embeddingsProvider & embeddingsApiKey: Defines which service computes your embeddings.
datasetFields: An array specifying which dataset field(s) contain the main text to embed (e.g., ["text"]).
metadataDatasetFields: A mapping of metadata column names to their source fields in the dataset.

Optional Settings:
* performChunking: Set to true to enable text splitting.
* chunkSize & chunkOverlap: Control the size and overlap of text chunks.
* dataUpdatesStrategy: Choose how to handle updates (e.g., incremental).

Output

The Actor does not produce a separate dataset. Its output is the populated (or updated) PostgreSQL table, which contains columns for the vector embeddings, the source text, and the mapped metadata. Ensure your PostgreSQL vector column dimension matches the output size of your chosen embedding model (e.g., 1536 for text-embedding-3-small).

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try PGVector Integration now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer: apify
Pricing: Paid
Total Runs: 103
Active Users: 16

Related Actors

Google Search Results Scraper

by apify

Website Content Crawler

by apify

🔥 Leads Generator - $3/1k 50k leads like Apollo

by microworlds

Video Transcript Scraper: Youtube, X, Facebook, Tiktok, etc.

by invideoiq

Browse All Actors

Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support

PGVector Integration

About PGVector Integration

What does this actor do?

Key Features

How to Use

Documentation

PGVector Integration

Overview

Key Features

How to Use

Input / Output

Input Schema

Output

Categories

Common Use Cases

Market Research

Lead Generation

Price Monitoring

Content Aggregation

Ready to Get Started?

Actor Information

Related Actors

Need Professional Help?