Category Archives: Database

Databases

Posted by Richy George on 30 July, 2024

This post was originally published on this site

video

How to use the watch command

In this Linux tip, we will try out the watch command. It’s a command that will run repeatedly, overwriting its previous output until you stop it with a ^c (Ctrl + “c”) command. It can be used to sit and wait for some change in the output that you’re waiting to see.
By default, a command that is run through watch will run two seconds. You can change the time with the -t option. If you, for example, use the command “watch who”, the output will not change except for the date/time in the upper right corner – at least not until someone logs in or out of the system.
Every 2.0s: who fedora: Sat May 25 15:11:22 2024

fedora seat0 2024-05-25 14:24 (login screen)
fedora tty2 2024-05-25 14:24 (tty2)
shs pts/1 2024-05-25 14:25 (192.168.0.11)
Once another person logs in or someone logs out, a line will be added or removed from the list of logged in users.
Closing: Well, that’s your Linux tip for the watch command. It can be useful when you’re waiting for some change to happen on your Linux system.
If you have questions or would like to suggest a topic, please add a comment below. And don’t forget to subscribe to the InfoWorld channel on YouTube.
If you like this video, please hit the like and share buttons. For more Linux tips, be sure to follow us on Facebook, YouTube and NetworkWorld.com.

Jul 30, 2024 2 mins

Open Source

Technology Industry

Posted by Richy George on 30 July, 2024

This post was originally published on this site

video

How to use the watch command

Jul 30, 2024 2 mins

Open Source

What’s new in MySQL 9.0

Posted by Richy George on 4 July, 2024

This post was originally published on this site

Oracle celebrated the beginning of July with the general availability of three releases of its open source database, MySQL: MySQL 8.0.38, the first update of its long-term support (LTS) version, MySQL 8.4, and the first major version of its 9.x innovation release, MySQL 9.0.

While the v8 releases are bug fixes and security releases only, MySQL 9.0 Innovation is a shiny new version with additional features, as well as some changes that may require attention when upgrading from a previous version.

The new 9.0 versions of MySQL Clients, Tools, and Connectors are also live, and Oracle recommends that they be used with MySQL Server 8.0, and 8.4 LTS as well as with 9.0 Innovation.

New features in MySQL 9.0.0

This initial 9.x Innovation release, Oracle says, is preparation for new features in upcoming releases. But it still contains useful things and can be upgraded to from MySQL 8.4 LTS; the MySQL Configurator automatically does the upgrade without user intervention during MSI installations on Windows.

The major changes include:

A new Vector datatype is supported in CREATE and ALTER statements.
JavaScript Stored Programs, which support JavaScript-based stored programs and functions, has come to MySQL Enterprise Edition. JavaScript Stored Programs can call SQL, and SQL can call them.
MySQL 9.0 Innovation has moved to newer versions of libraries and compilers: Linux 8 and 9 on GCC13, and Boost 1.85.
In the Event Scheduler, users can now prepare SQL statements CREATE EVENT, ALTER EVENT, and DROP EVENT.

What’s going away in MySQL 9.0

Insecure and elderly SHA-1, after being deprecated in MySQL 8, is gone, and the server now rejects mysql_native authentication requests from older client programs which do not have CLIENT_PLUGIN_AUTH capability. Before upgrading to 9.0, Oracle says, user accounts in 8.0 and 8.4 must be altered from mysql_native_password to caching_sha2_password.

In the Optimizer, ER_SUBQUERY_NO_1_ROW has been removed from the list of errors which are ignored by statements which include the IGNORE keyword. This change can make an UPDATE, DELETE, or INSERT statement which includes the IGNORE keyword raise errors if it contains a SELECT statement with a scalar subquery that produces more than one row.

What’s next after MySQL 9.0

MySQL is now on a three-month release cadence, with major LTS releases every two years. In October, Oracle says we can expect bug and security releases MySQL 8.4.2 LTS and MySQL 8.0.39, and the MySQL 9.1 Innovation release, with new features as well as bug and security fixes.

Next read this:

Qdrant unveils vector-based hybrid search for RAG

Posted by Richy George on 2 July, 2024

This post was originally published on this site

Open-source vector database provider Qdrant has launched BM42, a vector-based hybrid search algorithm intended to provide more accurate and efficient retrieval for retrieval-augmented generation (RAG) applications. BM42 combines the best of traditional text-based search and vector-based search to lower the costs for RAG and AI applications, Qdrant said.

Qdrant’s BM42 was announced July 2. Traditional keyword search engines, using algorithms such as BM25, have been around for more than 50 years and are not optimized for the precise retrieval needed in modern applications, according to Qdrant. As a result they struggle with specific RAG demands, particularly with short segments requiring further context to inform successful search and retrieval. Moving away from a keyword-based search to a fully vectorized based offers a new industry standard, Qdrant said.

“BM42, for short texts which are more prominent in RAG scenarios, provides the efficiency of traditional text search approaches, plus the context of vectors, so is more flexible, precise, and efficient,” Andrey Vasnetsov, Qdrant CTO and co-founder, said. This helps to make vector search more universally applicable, he added.

Unlike traditional keyword-based search suited for long-form content, BM42 integrates sparse and dense vectors to pinpoint relevant information within a document. A sparse vector handles exact term matching, while dense vectors handle semantic relevance and deep meaning, according to the company.

Next read this:

SingleStoreDB joins the Apache Iceberg bandwagon

Posted by Richy George on 26 June, 2024

This post was originally published on this site

Buoyed by customer demand, SingleStore, the company behind the relational database SingleStoreDB, has decided to natively integrate Apache Iceberg into its offering to help its enterprise customers make use of data stored in data lakehouses.

“With this new integration, SingleStore aims to transform the dormant data inside lakehouses into a valuable real-time asset for enterprise applications. Apache Iceberg, a popular open standard for data lakehouses, provides CIOs with cost-efficient storage and querying of large datasets,” said Dion Hinchcliffe, senior analyst at The Futurum Group.

Hinchcliffe pointed out that SingleStore’s integration includes updates that help its customers bypass the challenges that they may typically face when adopting traditional methods to make the data in Iceberg tables more immediate.

These challenges include complex, extensive ETL (extract, transform, load) workflows and compute-intensive Spark jobs.

Some of the key features of the integration are low-latency ingestion, bi-directional data flow, and real-time performance at lower costs, the company said.

Explaining how SingleStore achieves low latency across queries and updates, IDC research vice president Carl Olofson said that the company —formerly known as MemSQL — a memory-optimized and high-performance version of the relational database management system — uses memory features as a sort of cache.

“By doing so, the company can dramatically improve the speed with which Iceberg tables can be queried and updated,” Olofson explained, adding that the company might be proactively loading data from Iceberg into their internal memory-optimized format.

Before the Iceberg integration, SingleStore held data in a form or format that is optimized for rapid swapping into memory, where all data processing took place, the analyst said.

Several other database vendors, notably Databricks, have made attempts to adopt the Apache Iceberg table format due to its rising popularity with enterprises.

Earlier this month, Databricks agreed to acquire Tabular, the storage platform vendor led by the creators of Apache Iceberg, in order to promote data interoperability in lakehouses.

Another data lakehouse format — Delta Live Tables — developed by Databricks and later open sourced via The Linux Foundation, competes with Iceberg tables.

Currently, the company is working on another format that allows enterprises to use both Iceberg and Delta Live tables.

Both Olofson and Hinchcliffe pointed out that several vendors and offerings — such as Google’s BigQuery, Starburst, IBM’s Watsonx.data, SAP’s DataSphere, Teradata, Cloudera, Dremio, Presto, Hive, Impala, StarRocks, and Doris — have integrated Iceberg as an open source analytics table format for very large datasets.

The native integration of Iceberg into SingleStoreDB is currently in public preview.

Updates to search and deployment options

As part of the updates to SingleStoreDB, the company is adding new capabilities to its full-text search feature that improve relevance scoring, phonetic similarity, fuzzy matching, and keyword proximity-based ranking.

The combination of these capabilities allows enterprises to eliminate the need for additional specialty databases to build generative AI-based applications, the company explained.

Additionally, the company has introduced an autoscaling feature in public preview that allows enterprises to manage workloads or applications by scaling compute resources up or down.

It also lets users define thresholds for CPU and memory usage for autoscaling, to avoid any unnecessary consumption.

Further, the company said it is introducing a new deployment option for the database via Helios -BYOC, which is a managed version of the database via a virtual private cloud.

This offering is now available in private preview in AWS and enterprise customers can run SingleStore in their own tenants while complying with data residency and governance policies, the company said.

Next read this:

Oracle HeatWave’s in-database LLMs to help reduce infra costs

Posted by Richy George on 26 June, 2024

This post was originally published on this site

Oracle is adding new generative AI-focused features to its Heatwave data analytics cloud service, previously known as MySQL HeatWave.

The new name highlights how HeatWave offers more than just MySQL support, and also includes HeatWave Gen AI, HeatWave Lakehouse, and HeatWave AutoML, said Nipun Agarwal, senior vice president of HeatWave at Oracle.

At its annual CloudWorld conference in September 2023, Oracle previewed a series of generative AI-focused updates for what was then MySQL HeatWave.

These updates included an interface driven by a large language model (LLM), enabling enterprise users to interact with different aspects of the service in natural language, a new Vector Store, Heatwave Chat, and AutoML support for HeatWave Lakehouse.

Some of these updates, along with additional capabilities, have been combined to form the HeatWave Gen AI offering inside HeatWave, Oracle said, adding that all these capabilities and features are now generally available at no additional cost.

In-database LLM support to reduce cost

In a first among database vendors, Oracle has added support for LLMs inside a database, analysts said.

HeatWave Gen AI’s in-database LLM support, which leverages smaller LLMs with fewer parameters such as Mistral-7B and Meta’s Llama 3-8B running inside the database, is expected to reduce infrastructure cost for enterprises, they added.

“This approach not only reduces memory consumption but also enables the use of CPUs instead of GPUs, making it cost-effective, which given the cost of GPUs will become a trend at least in the short term until AMD and Intel catch up with Nvidia,” said Ron Westfall, research director at The Futurum Group.

Another reason to use smaller LLMs inside the database is the ability to have more influence on the model with fine tuning, said David Menninger, executive director at ISG’s Ventana Research.

“With a smaller model the context provided via retrieval augmented generation (RAG) techniques has a greater influence on the results,” Menninger explained.

Westfall also gave the example of IBM’s Granite models, saying that the approach to using smaller models, especially for enterprise use cases, was becoming a trend.

The in-database LLMs, according to Oracle, will allow enterprises to search data, generate or summarize content, and perform RAG with HeatWave’s Vector Store.

Separately, HeatWave Gen AI also comes integrated with the company’s OCI Generative Service, providing enterprises with access to pre-trained and other foundational models from LLM providers.

Rebranded Vector Store and scale-out vector processing

A number of database vendors that didn’t already offer specialty vector databases have added vector capabilities to their wares over the last 12 months—MongoDB, DataStax, Pinecone, and CosmosDB for NoSQL among them — enabling customers to build AI and generative AI-based use cases over data stored in these databases without moving data to a separate vector store or database.

Oracle’s Vector Store, already showcased in September, automatically creates embeddings after ingesting data in order to process queries faster.

Another capability added to HeatWave Gen AI is scale-out vector processing that will allow HeatWave to support VECTOR as a data type and in turn help enterprises process queries faster.

“Simply put, this is like adding RAG to a standard relational database,” Menninger said. “You store some text in a table along with an embedding of that text as a VECTOR data type. Then when you query, the text of your query is converted to an embedding. The embedding is compared to those in the table and the ones with the shortest distance are the most similar.”

A graphical interface via HeatWave Chat

Another new capability added to HeatWave Gen AI is HeatWave Chat—a Visual Code plug-in for MySQL Shell which provides a graphical interface for HeatWave GenAI and enables developers to ask questions in natural language or SQL.

The retention of chat history makes it easier for developers to refine search results iteratively, Menninger said.

HeatWave Chat comes in with another feature dubbed the Lakehouse Navigator, which allows enterprise users to select files from object storage to create a new vector store.

This integration is designed to enhance user experience and efficiency of developers and analysts building out a vector store, Westfall said.

Next read this:

DataStax updates tools for building gen AI applications

Posted by Richy George on 25 June, 2024

This post was originally published on this site

DataStax is updating its tools for building generative AI-based applications in an effort to ease and accelerate application development for enterprises, databases, and service providers.

One of these tools is Langflow, which DataStax acquired in April. It is an open source, web-based no-code graphical user interface (GUI) that allows developers to visually prototype LangChain flows and iterate them to develop applications faster.

LangChain is a modular framework for Python and JavaScript that simplifies the development of applications that are powered by generative AI language models or LLMs.

According to the company’s Chief Product Officer Ed Anuff, the update to Langflow is a new version dubbed Langflow 1.0, which is the official open source release that comes after months of community feedback on the preview.

“Langflow 1.0 adds more flexible, modular components and features to support complex AI pipelines required for more advanced retrieval augmented generation (RAG) techniques and multi-agent architectures,” Anuff said, adding that Langflow’s execution engine was now Turing complete.

Turing complete or completeness is a term used in computer science to describe a programmable system that can carry out or solve any computational problem.

Langflow 1.0 also comes with LangSmith integration that will allow enterprise developers to monitor LLM-based applications and perform observability on them, the company said.

A managed version of Langflow is also being made available via DataStax in a public preview.

“Astra DB environment details will be available in Langflow and users will be able to access Langflow via the Astra Portal, and usage will be free,” Anuff explained.

RAGStack 1.0 gets new capabilities

DataStax has also released a new version of RAGStack, its curated stack of open-source software for implementing RAG in generative AI-based applications using Astra DB Serverless or Apache Cassandra as a vector store.

The new version, dubbed RAGStack 1.0, comes with new features such as Langflow, Knowledge Graph RAG, and ColBERT among others.

The Knowledge Graph RAG feature, according to the company, provides an alternative way to retrieve information using a graph-based representation. This alternative method can be more accurate than vector-based similarity search alone with Astra DB, it added.

Other features include the introduction of Text2SQL and Text2CQL (Cassandra Query Language) to bring all kinds of data into the generative AI flow for application development.

While DataStax offers a separate non-managed version of RAGStack 1.0 under the name Luna for RAGStack, Anuff said that the managed version offers more value for enterprises.

“RAGStack is based on open source components, and you could take all of those projects and stitch them together yourself. However, we think there is a huge amount of value for companies in getting their stack tested and integrated for them, so they can trust that it will deliver at scale in the way that they want,” the chief product officer explained.

Other updates related to easing RAG

The company has also partnered with several other companies such as Unstructured to help developers extract and transform data to be stored in AstraDB for building generative AI-based applications.

“The partnership with Unstructured provides DataStax customers with the ability to use the latter’s capabilities to extract and transform data in multiple formats – including HTML, PDF, CSV, PNG, PPTX – and convert it into JSON files for use in AI initiatives,” said Matt Aslett, director at ISG’s Ventana Research.

Other partnerships include collaboration with the top embedding providers, such as OpenAI, Hugging Face, Mistral AI, and Nvidia among others.

Next read this:

4 highlights from EDB Postgres AI

Posted by Richy George on 13 June, 2024

This post was originally published on this site

35% of enterprise leaders will consider Postgres for their next project, based on this research conducted by EDB, which also revealed that out of this group, the great majority believe that AI is going mainstream in their organization. Add to this, for the first time ever, analytical workloads have begun to surpass transactional workloads.

Enterprises see the potential of Postgres to fundamentally transform the way they use and manage data, and they see AI as a huge opportunity and advantage. But the diverse data teams within these organizations face increasing fragmentation and complexity when it comes to their data. To operationalize data for AI apps, they demand better observability and control across the data estate, not to mention a solution that works seamlessly across clouds.

It’s clear that Postgres has the right to play and deliver for the AI generation of apps, and EDB has taken recent strides to do just this with the release of EDB Postgres AI, an intelligent platform for transactional, analytical, and AI workloads.

The new platform product offers a unified approach to data management and is designed to streamline operations across hybrid cloud and multi-cloud environments, meeting enterprises wherever they are in their digital transformation journey.

EDB Postgres AI helps elevate data infrastructure to a strategic technology asset, by bringing analytical and AI systems closer to customers’ core operational and transactional data—all managed through the popular open source database, Postgres.

Let’s take a look at the key features and advantages of EDB Postgres AI.

Rapid analytics for transactional data

Analysts and data scientists need to launch critical new projects, and they need access to up-to-the-second transactional and operational data within their core Postgres databases. Yet these teams are often forced to default to clunky ETL or ELT processes that result in latency, data inconsistency, and quality issues that hamper efficiency-extracting insights.

EDB Postgres AI introduces a simple platform for deploying new analytics and data science projects rapidly, without the need for operationally expensive data pipelines and multiple platforms. EDB Postgres AI’s Lakehouse capabilities allow for the rapid execution of analytical queries on transactional data without impacting performance, all using the same intuitive interface. By storing operational data in a columnar format, EDB Postgres AI boosts query speeds by up to 30x faster compared to standard Postgres and reduces storage costs, making real-time analytics more accessible.

Enterprise observability and data estate management

Even if data teams have made Postgres their primary database, chances are their data estate is still sprawled across a diverse mix of fully-managed and self-managed Postgres deployments. Managing these systems becomes increasingly difficult and costly, particularly when it comes to ensuring uptime, security and compliance.

The new capabilities of the recent EDB release will help customers create and deliver value greater than the sum of all the data parts, no matter where it is. EDB Postgres AI provides comprehensive observability tools that offer a unified view of Postgres deployments across different environments. This means that users can monitor and tune their databases, with automatic suggestions on improving query performance, AI-driven event detection and log analysis, and smart alerting when metrics exceed configurable thresholds.

Support for vector databases

With the surge in AI advancements, EDB sees a significant opportunity to enhance data management for our customers through AI integration. The strategy of the new platforms is twofold: integrate AI capabilities into Postgres, and simultaneously, optimize Postgres for AI workloads.

Firstly, this release includes an AI-driven migration copilot, which is trained on EDB documentation and knowledge bases and helps answer common questions about migration errors including command line and schema issues, with instant error resolution and guidance tailored to database needs.

In addition, EDB remains focused on optimizing Postgres for AI workloads through support for vector databases and AI workloads. With capabilities like the pgvector extension and EDB’s pgai extension, the platform enables the storage and querying of vector embeddings, crucial for AI applications. This support allows developers to build sophisticated AI models directly within the Postgres ecosystem.

In addition, EDB remains focused on optimizing Postgres for AI workloads through support for vector databases and AI workloads. The EDB Postgres AI platform streamlines capabilities by providing a single place for storing vector embeddings and doing similarity search with both pgai and pgvector, which simplifies the AI application pipeline for builders. This support allows developers to build sophisticated AI models directly within the Postgres ecosystem. The platform also enables users to leverage the mature data management features of PostgreSQL such as reliability with high availability, security with Transparent Data Encryption (TDE), and scalability with on-premises, hybrid, and cloud deployments.

EDB Postgres AI transforms unstructured data management with its new powerful “retriever” functionality that enables similarity search across vector data. The auto embedding feature automatically generates AI embeddings for data in Postgres tables, keeping them up-to-date via triggers. Coupled with the retriever’s ability to create embeddings for Amazon S3 data on demand, pgai provides a seamless solution to making unstructured sources searchable by similarity. Users can also leverage a broad list of state-of-the-art encoder models like Hugging Face and OpenAI. With just pgai.create_retriever() and pgai.retrieve(), developers gain vector similarity capabilities within their trusted Postgres database.

This dual approach ensures that Postgres becomes a comprehensive solution for both traditional and AI-driven data management needs.

Continuous high availability and legacy modernization

EDB Postgres AI maintains the critical, enterprise-grade capabilities that EDB is known for. This includes the comprehensive Oracle Compatibility Mode, which helps customers break free from legacy systems while lowering TCO by up to 80% compared to legacy commercial databases. The product also supports EDB’s geo-distributed high-availability solutions, meaning customers can deploy multi-region clusters with five-nines availability to guarantee that data is consistent, timely, and complete—even during disruptions.

The release of EDB Postgres AI marks EDB’s 20th year as a leader of enterprise-grade Postgres and introduces the next evolution of the company—one even more proudly associated with Postgres. Why? Because we know that the flexibility and extensibility make Postgres uniquely positioned to solve for the most complex and critical data challenges. Learn more about how EDB can help you use EDB Postgres AI for your most demanding applications.

Aislinn Shea Wright is VP of product management at EDB.

—

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

Next read this:

Multicloud: Oracle links database with Google, Microsoft to speed operations

Posted by Richy George on 12 June, 2024

This post was originally published on this site

Oracle is connecting its cloud to Google’s to offer Google customers high-speed access to database services. The move comes just nine months after it struck a similar deal with Microsoft to offer its database services on Azure. Separately, Microsoft is extending its Azure platform into Oracle’s cloud to give OpenAI access to more computing capacity on which to train its models.

“What started as a simple interconnect is becoming a more defined multicloud strategy for Oracle. The announcement is the beginning of a new trend—cloud providers are willing to work together to serve the needs of shared customers,” said Dave McCarthy, Research Vice President at IDC.

The Oracle-Google partnership will see the companies create a series of points of interconnect enabling customers of one to access services in the other’s cloud. Customers will be able to deploy general-purpose workloads with no cross-cloud data transfer charges, the companies said.

The two clouds will initially interconnect in 11 regions: Sydney, Melbourne, São Paulo, Montreal, Frankfurt, Mumbai, Tokyo, Singapore, Madrid, London, and Ashburn.

Oracle also plans to collocate its database hardware and software in Google’s datacenters, initially in North America and Europe, making it possible for joint customers to deploy, manage, and use Oracle database instances on Google Cloud without having to retool applications.

The two companies will market that service under the catchy name of Oracle Database@Google Cloud. Oracle Exadata Database Service, Oracle Autonomous Database Service, and Oracle Real Application Clusters (RAC) will all launch later this year across four regions: US East, US West, UK South, and Germany Central, with more planned later.

Oracle Database@Google Cloud customers will have access to a unified support service, and will be able to make purchases via the Google Cloud Marketplace using their existing Google Cloud commitments and Oracle license benefits.

Oracle’s continued multicloud strategy

The partnership with Google Cloud can be seen as a continuation of Oracle’s multicloud strategy that it started executing with the Microsoft partnership, analysts said, adding that Oracle expects that the new offerings will help many of its customers fully migrate from on-premises infrastructure to the cloud.

By adopting a multicloud approach, Oracle avoids going head-to-head “entrenched” cloud providers. Instead, McCarthy said, Oracle is leveraging its strengths in data management to solve problems that other cloud providers cannot.

Oracle may have been swayed by the experience of its partnership with Microsoft Azure, dbInsight’s chief analyst Tony Baer said. Although AWS may have been a more obvious target to partner with next due to its reach, Google Cloud was probably “more hungry” for a partnership, he said.

McCarthy expected AWS to soon start exploring a similar partnership with Oracle as the Azure and Google Cloud partnerships will put pressure on the hyperscaler.

“AWS faces the same challenges as the other clouds when it comes to Oracle workloads. I expect this increased competition from Azure and Google Cloud will force them to explore a similar route,” he said, adding that migrating Oracle workloads has always been tricky and cloud providers need to offer the combination of Oracle’s hardware and software to allow enterprises to unlock top notch performance across workloads.

Open AI starts using OCI for extra capacity

Separately, Oracle is partnering with Microsoft to provide additional capacity for OpenAl by extending Microsoft’s Azure Al platform to Oracle Cloud Infrastructure (OCI).

“OCI will extend Azure’s platform and enable OpenAI to continue to scale,” said OpenAI CEO Sam Altman in a statement.

This partnership, according to independent semiconductor technology analyst Mike Demler, is all about increasing compute capacity.

“OpenAI runs on Microsoft’s Azure AI platform, and the models they’re creating continue to grow in size exponentially from one generation to the next,” Demler said.

While GPT-3 uses 175 billion parameters, the latest GPT-MoE (Mixture of Experts) is 10 times that large, with 1.8 trillion parameters, the independent analyst said, adding that the latter needs a lot more GPUs than Microsoft alone can supply in its cloud platform.

Next read this: