Opengauss Integration
by wyswyz
Effortlessly transfer Apify datasets to openGauss. Perfect for building AI search, Q&A bots, or RAG systems with your scraped data.
Opens on Apify.com
About Opengauss Integration
Need to get your scraped data from Apify into an openGauss database? This actor is the straightforward connector you've been looking for. It takes the datasets you've built with other actors and pipes them directly into your openGauss tables, automating what's usually a manual and error-prone process. I've used it to set up a foundation for several projects, and it just works. The real value is what you build on top of that data. Once your information is structured and sitting in openGauss, you're perfectly set up to create intelligent search functions, build a question-answering bot, or implement a full Retrieval-Augmented Generation (RAG) pipeline. It handles the data transfer so you can focus on the actual AI and data analysis work. Think of it as the essential plumbing that lets you turn raw, scraped data into a functional knowledge base for your applications. If you're working with open-source AI tools and need a reliable way to feed them fresh data, this integration is a practical first step.
What does this actor do?
Opengauss Integration is a web scraping and automation tool available on the Apify platform. It's designed to help you extract data and automate tasks efficiently in the cloud.
Key Features
- Cloud-based execution - no local setup required
- Scalable infrastructure for large-scale operations
- API access for integration with your applications
- Built-in proxy rotation and anti-blocking measures
- Scheduled runs and webhooks for automation
How to Use
- Click "Try This Actor" to open it on Apify
- Create a free Apify account if you don't have one
- Configure the input parameters as needed
- Run the actor and download your results
Documentation
openGauss Integration
This Apify Actor transfers data from Apify datasets into an openGauss database. It processes text, optionally splits it into chunks, computes vector embeddings, and stores them in openGauss. It's designed for search and Retrieval-Augmented Generation (RAG) applications, with support for incremental updates to save on compute and storage.
Overview
The Actor takes a dataset from another Apify Actor (like the Website Content Crawler), processes the text, generates embeddings using providers like OpenAI or Cohere via LangChain, and saves the vectors to your openGauss database. You control chunking, embedding models, and update strategies.
Key Features
- Vector Storage: Computes and stores text embeddings in openGauss for semantic search.
- Incremental Updates: Updates only changed data, reducing unnecessary embedding recomputation.
- Configurable Chunking: Uses LangChain's
RecursiveCharacterTextSplitterto handle long documents (configurable chunk size/overlap). - Embedding Provider Flexibility: Supports multiple providers (e.g., OpenAI, Cohere).
- Metadata Handling: Lets you specify which dataset fields to store as metadata alongside vectors.
How to Use
This Actor is typically configured via the "Integrations" section of another Actor's run (e.g., Website Content Crawler).
Prerequisites:
* A running openGauss database (you'll need host, port, user, password, DB name, and table name).
* An API key for your chosen embeddings provider (e.g., from OpenAI).
Basic Flow:
1. An upstream Actor (like a crawler) runs and produces a dataset.
2. This integration Actor is triggered, fetching that dataset.
3. (Optional) Text is split into chunks.
4. Embeddings are computed for the text/chunks.
5. Data and embeddings are saved to your specified openGauss table.
Input / Output
Input Configuration:
Your configuration needs three main parts: database connection, embeddings provider, and data mapping.
1. Database (openGauss):
{
"opengaussHost": "your-host",
"opengaussPort": "your-port",
"opengaussUser": "your-user",
"opengaussPassword": "your-password",
"opengaussDBname": "your-database-name",
"opengaussTableName": "apify_collection"
}
2. Embeddings Provider (Example: OpenAI):
{
"embeddingsProvider": "OpenAIEmbeddings",
"embeddingsApiKey": "your-openai-api-key",
"embeddingsConfig": {"model": "text-embedding-3-large"}
}
Important: Ensure your openGauss table's vector column size matches the output dimensions of your chosen embedding model (e.g., 1536 for text-embedding-3-small).
3. Data Mapping (Example using Website Content Crawler output):
The crawler's dataset contains fields like url, text, and metadata. You specify which fields to embed and which to store as metadata.
{
"datasetFields": ["text"],
"metadataDatasetFields": {"title": "metadata.title"}
}
4. Optional Chunking:
To split long text before embedding, enable and configure chunking:
{
"performChunking": true,
"chunkSize": 1000,
"chunkOverlap": 200
}
Output:
The Actor does not produce a separate dataset. Its output is the processed data (text, embeddings, and metadata) written directly to your specified openGauss table. Success or errors are reported in the Actor run log.
Categories
Common Use Cases
Market Research
Gather competitive intelligence and market data
Lead Generation
Extract contact information for sales outreach
Price Monitoring
Track competitor pricing and product changes
Content Aggregation
Collect and organize content from multiple sources
Ready to Get Started?
Try Opengauss Integration now on Apify. Free tier available with no credit card required.
Start Free TrialActor Information
- Developer
- wyswyz
- Pricing
- Paid
- Total Runs
- 11
- Active Users
- 2
Related Actors
Tecdoc Car Parts
by making-data-meaningful
OpenRouter - Unified LLM Interface for ChatGPT, Claude, Gemini
by xyzzy
Google Sheets Import & Export
by lukaskrivka
Send Email
by apify
Apify provides a cloud platform for web scraping, data extraction, and automation. Build and run web scrapers in the cloud.
Learn more about ApifyNeed Professional Help?
Couldn't solve your problem? Hire a verified specialist on Fiverr to get it done quickly and professionally.
Trusted by millions | Money-back guarantee | 24/7 Support