Opengauss Integration

Opengauss Integration

by wyswyz

Effortlessly transfer Apify datasets to openGauss. Perfect for building AI search, Q&A bots, or RAG systems with your scraped data.

11 runs
2 users
Try This Actor

Opens on Apify.com

About Opengauss Integration

Need to get your scraped data from Apify into an openGauss database? This actor is the straightforward connector you've been looking for. It takes the datasets you've built with other actors and pipes them directly into your openGauss tables, automating what's usually a manual and error-prone process. I've used it to set up a foundation for several projects, and it just works. The real value is what you build on top of that data. Once your information is structured and sitting in openGauss, you're perfectly set up to create intelligent search functions, build a question-answering bot, or implement a full Retrieval-Augmented Generation (RAG) pipeline. It handles the data transfer so you can focus on the actual AI and data analysis work. Think of it as the essential plumbing that lets you turn raw, scraped data into a functional knowledge base for your applications. If you're working with open-source AI tools and need a reliable way to feed them fresh data, this integration is a practical first step.

What does this actor do?

Opengauss Integration is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.

Key Features

  • Cloud-based execution - no local setup required
  • Scalable infrastructure for large-scale operations
  • API access for integration with your applications
  • Built-in proxy rotation and anti-blocking measures
  • Scheduled runs and webhooks for automation

How to Use

  1. Click "Try This Actor" to open it on Apify
  2. Create a free Apify account if you don't have one
  3. Configure the input parameters as needed
  4. Run the actor and download your results

Documentation

openGauss Integration

This Apify Actor transfers data from Apify datasets into an openGauss database. It processes text, optionally splits it into chunks, computes vector embeddings, and stores them in openGauss. It's designed for search and Retrieval-Augmented Generation (RAG) applications, with support for incremental updates to save on compute and storage.

Overview

The Actor takes a dataset from another Apify Actor (like the Website Content Crawler), processes the text, generates embeddings using providers like OpenAI or Cohere via LangChain, and saves the vectors to your openGauss database. You control chunking, embedding models, and update strategies.

Key Features

  • Vector Storage: Computes and stores text embeddings in openGauss for semantic search.
  • Incremental Updates: Updates only changed data, reducing unnecessary embedding recomputation.
  • Configurable Chunking: Uses LangChain's RecursiveCharacterTextSplitter to handle long documents (configurable chunk size/overlap).
  • Embedding Provider Flexibility: Supports multiple providers (e.g., OpenAI, Cohere).
  • Metadata Handling: Lets you specify which dataset fields to store as metadata alongside vectors.

How to Use

This Actor is typically configured via the "Integrations" section of another Actor's run (e.g., Website Content Crawler).

Prerequisites:
* A running openGauss database (you'll need host, port, user, password, DB name, and table name).
* An API key for your chosen embeddings provider (e.g., from OpenAI).

Basic Flow:
1. An upstream Actor (like a crawler) runs and produces a dataset.
2. This integration Actor is triggered, fetching that dataset.
3. (Optional) Text is split into chunks.
4. Embeddings are computed for the text/chunks.
5. Data and embeddings are saved to your specified openGauss table.

Input / Output

Input Configuration:

Your configuration needs three main parts: database connection, embeddings provider, and data mapping.

1. Database (openGauss):

{
  "opengaussHost": "your-host",
  "opengaussPort": "your-port",
  "opengaussUser": "your-user",
  "opengaussPassword": "your-password",
  "opengaussDBname": "your-database-name",
  "opengaussTableName": "apify_collection"
}

2. Embeddings Provider (Example: OpenAI):

{
  "embeddingsProvider": "OpenAIEmbeddings",
  "embeddingsApiKey": "your-openai-api-key",
  "embeddingsConfig": {"model": "text-embedding-3-large"}
}

Important: Ensure your openGauss table's vector column size matches the output dimensions of your chosen embedding model (e.g., 1536 for text-embedding-3-small).

3. Data Mapping (Example using Website Content Crawler output):
The crawler's dataset contains fields like url, text, and metadata. You specify which fields to embed and which to store as metadata.

{
  "datasetFields": ["text"],
  "metadataDatasetFields": {"title": "metadata.title"}
}

4. Optional Chunking:
To split long text before embedding, enable and configure chunking:

{
  "performChunking": true,
  "chunkSize": 1000,
  "chunkOverlap": 200
}

Output:
The Actor does not produce a separate dataset. Its output is the processed data (text, embeddings, and metadata) written directly to your specified openGauss table. Success or errors are reported in the Actor run log.

Common Use Cases

Market Research

Gather competitive intelligence and market data

Lead Generation

Extract contact information for sales outreach

Price Monitoring

Track competitor pricing and product changes

Content Aggregation

Collect and organize content from multiple sources

Ready to Get Started?

Try Opengauss Integration now on Apify. Free tier available with no credit card required.

Start Free Trial

Actor Information

Developer
wyswyz
Pricing
Paid
Total Runs
11
Active Users
2
Apify Platform

Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.

Learn more about Apify

Need Professional Help?

Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.

Find a Specialist

Trusted by millions | Money-back guarantee | 24/7 Support