All posts by Richy George

4 highlights from EDB Postgres AI

Posted by on 13 June, 2024

This post was originally published on this site

35% of enterprise leaders will consider Postgres for their next project, based on this research conducted by EDB, which also revealed that out of this group, the great majority believe that AI is going mainstream in their organization. Add to this, for the first time ever, analytical workloads have begun to surpass transactional workloads.

Enterprises see the potential of Postgres to fundamentally transform the way they use and manage data, and they see AI as a huge opportunity and advantage. But the diverse data teams within these organizations face increasing fragmentation and complexity when it comes to their data. To operationalize data for AI apps, they demand better observability and control across the data estate, not to mention a solution that works seamlessly across clouds.

It’s clear that Postgres has the right to play and deliver for the AI generation of apps, and EDB has taken recent strides to do just this with the release of EDB Postgres AI, an intelligent platform for transactional, analytical, and AI workloads.

The new platform product offers a unified approach to data management and is designed to streamline operations across hybrid cloud and multi-cloud environments, meeting enterprises wherever they are in their digital transformation journey.

EDB Postgres AI helps elevate data infrastructure to a strategic technology asset, by bringing analytical and AI systems closer to customers’ core operational and transactional data—all managed through the popular open source database, Postgres.

Let’s take a look at the key features and advantages of EDB Postgres AI.

Rapid analytics for transactional data

Analysts and data scientists need to launch critical new projects, and they need access to up-to-the-second transactional and operational data within their core Postgres databases. Yet these teams are often forced to default to clunky ETL or ELT processes that result in latency, data inconsistency, and quality issues that hamper efficiency-extracting insights.

EDB Postgres AI introduces a simple platform for deploying new analytics and data science projects rapidly, without the need for operationally expensive data pipelines and multiple platforms. EDB Postgres AI’s Lakehouse capabilities allow for the rapid execution of analytical queries on transactional data without impacting performance, all using the same intuitive interface. By storing operational data in a columnar format, EDB Postgres AI boosts query speeds by up to 30x faster compared to standard Postgres and reduces storage costs, making real-time analytics more accessible.

Enterprise observability and data estate management

Even if data teams have made Postgres their primary database, chances are their data estate is still sprawled across a diverse mix of fully-managed and self-managed Postgres deployments. Managing these systems becomes increasingly difficult and costly, particularly when it comes to ensuring uptime, security and compliance.

The new capabilities of the recent EDB release will help customers create and deliver value greater than the sum of all the data parts, no matter where it is. EDB Postgres AI provides comprehensive observability tools that offer a unified view of Postgres deployments across different environments. This means that users can monitor and tune their databases, with automatic suggestions on improving query performance, AI-driven event detection and log analysis, and smart alerting when metrics exceed configurable thresholds.

edb data plane diagram EDB

Support for vector databases

With the surge in AI advancements, EDB sees a significant opportunity to enhance data management for our customers through AI integration. The strategy of the new platforms is twofold: integrate AI capabilities into Postgres, and simultaneously, optimize Postgres for AI workloads.

Firstly, this release includes an AI-driven migration copilot, which is trained on EDB documentation and knowledge bases and helps answer common questions about migration errors including command line and schema issues, with instant error resolution and guidance tailored to database needs.

In addition, EDB remains focused on optimizing Postgres for AI workloads through support for vector databases and AI workloads. With capabilities like the pgvector extension and EDB’s pgai extension, the platform enables the storage and querying of vector embeddings, crucial for AI applications. This support allows developers to build sophisticated AI models directly within the Postgres ecosystem.

In addition, EDB remains focused on optimizing Postgres for AI workloads through support for vector databases and AI workloads. The EDB Postgres AI platform streamlines capabilities by providing a single place for storing vector embeddings and doing similarity search with both pgai and pgvector, which simplifies the AI application pipeline for builders. This support allows developers to build sophisticated AI models directly within the Postgres ecosystem. The platform also enables users to leverage the mature data management features of PostgreSQL such as reliability with high availability, security with Transparent Data Encryption (TDE), and scalability with on-premises, hybrid, and cloud deployments.

EDB Postgres AI transforms unstructured data management with its new powerful “retriever” functionality that enables similarity search across vector data. The auto embedding feature automatically generates AI embeddings for data in Postgres tables, keeping them up-to-date via triggers. Coupled with the retriever’s ability to create embeddings for Amazon S3 data on demand, pgai provides a seamless solution to making unstructured sources searchable by similarity. Users can also leverage a broad list of state-of-the-art encoder models like Hugging Face and OpenAI. With just pgai.create_retriever() and pgai.retrieve(), developers gain vector similarity capabilities within their trusted Postgres database.

This dual approach ensures that Postgres becomes a comprehensive solution for both traditional and AI-driven data management needs.

Continuous high availability and legacy modernization

EDB Postgres AI maintains the critical, enterprise-grade capabilities that EDB is known for. This includes the comprehensive Oracle Compatibility Mode, which helps customers break free from legacy systems while lowering TCO by up to 80% compared to legacy commercial databases. The product also supports EDB’s geo-distributed high-availability solutions, meaning customers can deploy multi-region clusters with five-nines availability to guarantee that data is consistent, timely, and complete—even during disruptions.

The release of EDB Postgres AI marks EDB’s 20th year as a leader of enterprise-grade Postgres and introduces the next evolution of the company—one even more proudly associated with Postgres. Why? Because we know that the flexibility and extensibility make Postgres uniquely positioned to solve for the most complex and critical data challenges. Learn more about how EDB can help you use EDB Postgres AI for your most demanding applications.

Aislinn Shea Wright is VP of product management at EDB.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

Next read this:

Posted Under: Database
Multicloud: Oracle links database with Google, Microsoft to speed operations

Posted by on 12 June, 2024

This post was originally published on this site

Oracle is connecting its cloud to Google’s to offer Google customers high-speed access to database services. The move comes just nine months after it struck a similar deal with Microsoft to offer its database services on Azure. Separately, Microsoft is extending its Azure platform into Oracle’s cloud to give OpenAI access to more computing capacity on which to train its models.

“What started as a simple interconnect is becoming a more defined multicloud strategy for Oracle. The announcement is the beginning of a new trend—cloud providers are willing to work together to serve the needs of shared customers,” said Dave McCarthy, Research Vice President at IDC.

The Oracle-Google partnership will see the companies create a series of points of interconnect enabling customers of one to access services in the other’s cloud. Customers will be able to deploy general-purpose workloads with no cross-cloud data transfer charges, the companies said.

The two clouds will initially interconnect in 11 regions: Sydney, Melbourne, São Paulo, Montreal, Frankfurt, Mumbai, Tokyo, Singapore, Madrid, London, and Ashburn.

Oracle also plans to collocate its database hardware and software in Google’s datacenters, initially in North America and Europe, making it possible for joint customers to deploy, manage, and use Oracle database instances on Google Cloud without having to retool applications.

The two companies will market that service under the catchy name of Oracle Database@Google Cloud. Oracle Exadata Database Service, Oracle Autonomous Database Service, and Oracle Real Application Clusters (RAC) will  all launch later this year across four regions: US East, US West, UK South, and Germany Central, with more planned later.

Oracle Database@Google Cloud customers will have access to a unified support service, and will be able to make purchases via the Google Cloud Marketplace using their existing Google Cloud commitments and Oracle license benefits.

Oracle’s continued multicloud strategy

The partnership with Google Cloud can be seen as a continuation of Oracle’s multicloud strategy that it started executing with the Microsoft partnership, analysts said, adding that Oracle expects that the new offerings will help many of its customers fully migrate from on-premises infrastructure to the cloud.

By adopting a multicloud approach, Oracle avoids going head-to-head “entrenched” cloud providers. Instead, McCarthy said, Oracle is leveraging its strengths in data management to solve problems that other cloud providers cannot.

Oracle may have been swayed by the experience of its partnership with Microsoft Azure, dbInsight’s chief analyst Tony Baer said. Although AWS may have been a more obvious target to partner with next due to its reach, Google Cloud was probably “more hungry” for a partnership, he said.

McCarthy expected AWS to soon start exploring a similar partnership with Oracle as the Azure and Google Cloud partnerships will put pressure on the hyperscaler.

“AWS faces the same challenges as the other clouds when it comes to Oracle workloads. I expect this increased competition from Azure and Google Cloud will force them to explore a similar route,” he said, adding that migrating Oracle workloads has always been tricky and cloud providers need to offer the combination of Oracle’s hardware and software to allow enterprises to unlock top notch performance across workloads.

Open AI starts using OCI for extra capacity

Separately, Oracle is partnering with Microsoft to provide additional capacity for OpenAl by extending Microsoft’s Azure Al platform to Oracle Cloud Infrastructure (OCI).

“OCI will extend Azure’s platform and enable OpenAI to continue to scale,” said OpenAI CEO Sam Altman in a statement.  

This partnership, according to independent semiconductor technology analyst Mike Demler, is all about increasing compute capacity.

“OpenAI runs on Microsoft’s Azure AI platform, and the models they’re creating continue to grow in size exponentially from one generation to the next,” Demler said.

While GPT-3 uses 175 billion parameters, the latest GPT-MoE (Mixture of Experts) is 10 times that large, with 1.8 trillion parameters, the independent analyst said, adding that the latter needs a lot more GPUs than Microsoft alone can supply in its cloud platform.

Next read this:

Posted Under: Database
Data storage for front-end JavaScript

Posted by on 29 May, 2024

This post was originally published on this site

Like every other programming environment, you need a place to store your data when coding in the browser with JavaScript. Beyond simple JavaScript variables, there are a variety of options ranging in sophistication, from using localStorage to cookies to IndexedDB and the service worker cache API. This article is a quick survey of the common mechanisms for storing data in your JavaScript programs.

JavaScript variables

You are probably already familiar with JavaScript’s set of highly flexible variable types. We don’t need to review them here; they are very powerful and capable of modeling any kind of data from the simplest numbers to intricate cyclical graphs and collections. 

The downside of using variables to store data is that they are confined to the life of the running program. When the program exits, the variables are destroyed. Of course, they may be destroyed before the program ends, but the longest-lived global variable will vanish with the program. In the case of the web browser and its JavaScript programs, even a single click of the refresh button annihilates the program state. This fact drives the need for data persistence; that is, data that outlives the life of the program itself.

An additional complication with browser JavaScript is that it’s a sandboxed environment. It doesn’t have direct access to the operating system because it isn’t installed. A JavaScript program relies on the agency of the browser APIs it runs within.

Saving data on the server

The other end of the spectrum from using built-in variables to store JavaScript data objects is sending the data off to a server. You can do this readily with a fetch() POST request. Provided everything works out on the network and the back-end API, you can trust that the data will be stored and made available in the future with another GET request.

So far, we’re choosing between the transience of variables and the permanence of server-side persistence. Each approach has a particular profile in terms of longevity and simplicity. But a few other options are worth exploring.

Web storage API

There are two types of built-in “web storage” in modern browsers: localStorage and sessionStorage. These give you convenient access to longer-lived data. They both give you a key-value and each has its own lifecycle that governs how data is handled:

  • localStorage saves a key-value pair that survives across page loads on the same domain.
  • sessionStorage operates similarly to localStorage but the data only lasts as long as the page session.

In both cases, values are coerced to a string, meaning that a number will become a string version of itself and an object will become “[object Object].” That’s obviously not what you want, but if you want to save an object, you can always use JSON.stringify() and JSON.parse().

Both localStorage and sessionStorage use getItem and setItem to set and retrieve values:


localStorage.setItem("foo","bar");
sessionStorage.getItem("foo"); // returns “bar”

You can most clearly see the difference between the two by setting a value on them and then closing the browser tab, then reopening a tab on the same domain and checking for your value. Values saved using localStorage will still exist, whereas sessionStorage will be null. You can use the devtools console to run this experiment:


localStorage.setItem("foo",”bar”);
sessionStorage.setItem("foo","bar");
// close the tab, reopen it
localStorage.getItem('bar2'); // returns “bar”
sessionStorage.getItem("foo") // returns null

Cookies

Whereas localStorage and sessionStorage are tied to the page and domain, cookies give you a longer-lived option tied to the browser itself. They also use key-value pairs. Cookies have been around for a long time and are used for a wide range of cases, including ones that are not always welcome. Cookies are useful for tracking values across domains and sessions. They have specific expiration times, but the user can choose to delete them anytime by clearing their browser history.

Cookies are attached to requests and responses with the server, and can be modified (with restrictions governed by rules) by both the client and the server. Handy libraries like JavaScript Cookie simplify dealing with cookies.

Cookies are a bit funky when used directly, which is a legacy of their ancient origins. They are set for the domain on the document.cookie property, in a format that includes the value, the expiration time (in RFC 5322 format), and the path. If no expiration is set, the cookie will vanish after the browser is closed. The path sets what path on the domain is valid for the cookie.

Here’s an example of setting a cookie value:


document.cookie = "foo=bar; expires=Thu, 18 Dec 2024 12:00:00 UTC; path=/";

And to recover the value:


function getCookie(cname) {
  const name = cname + "=";
  const decodedCookie = decodeURIComponent(document.cookie);
  const ca = decodedCookie.split(';');
  for (let i = 0; i < ca.length; i++) {
    let c = ca[i];
    while (c.charAt(0) === ' ') {
      c = c.substring(1);
    }
    if (c.indexOf(name) === 0) {
      return c.substring(name.length, c.length);
    }
  }
  return "";
}
const cookieValue = getCookie("foo");
console.log("Cookie value for 'foo':", cookieValue);

In the above, we use decodeURIComponent to unpack the cookie and then break it along its separator character, the semicolon (;), to access its component parts. To get the value we match on the name of the cookie plus the equals sign.

An important consideration with cookies is security, specifically cross-site scripting (XSS) and cross-site request forgery (CSRF) attacks. (Setting HttpOnly on a cookie makes it only accessible on the server, which increases security but eliminates the cookie’s utility on the browser.)

IndexedDB

IndexedDB is the most elaborate and capable in-browser data store. It’s also the most complicated. IndexedDB uses asynchronous calls to manage operations. That’s good because it lets you avoid blocking the thread, but it also makes for a somewhat clunky developer experience.

IndexedDB is really a full-blown object-oriented database. It can handle large amounts of data, modeled essentially like JSON. It supports sophisticated querying, sorting, and filtering. It’s also available in service workers as a reliable persistence mechanism between thread restarts and between the main and workers threads.

When you create an object store in IndexedDB, it is associated with the domain and lasts until the user deletes it. It can be used as an offline datastore to handle offline functionality in progressive web apps, in the style of Google Docs.

To get a flavor of using IndexedDB, here’s how you might create a new store:


let db = null; // A handle for the DB instance

llet request = indexedDB.open("MyDB", 1); // Try to open the “MyDB” instance (async operation)
request.onupgradeneeded = function(event) { // onupgradeneeded is the event indicated the MyDB is either new or the schema has changed
  db = event.target.result; // set the DB handle to the result of the onupgradeneeded event
  if (!db.objectStoreNames.contains("myObjectStore")) { // Check for the existence of myObjectStore. If it doesn’t exist, create it in the next step
    let tasksObjectStore = db.createObjectStore("myObjectStore", { autoIncrement: true }); // create myObjectStore
  }
};

The call to request.onsuccess = function(event) { db = event.target.result; }; // onsuccess fires when the database is successfully opened. This will fire without onupgradeneeded firing if the DB and Object store already exist. In this case, we save the db reference:


request.onerror = function(event) { console.log("Error in db: " + event); }; // If an error occurs, onerror will fire

The above IndexedDB code is simple—it just opens or creates a database and object store—but the code gives you a sense of IndexedDB‘s asynchronous nature.

Service worker cache API

Service workers include a specialized data storage mechanism called cache. Cache makes it easy to intercept requests, save responses, and modify them if necessary. It’s primarily designed to cache responses (as the name implies) for offline use or to optimize response times. This is something like a customizable proxy cache in the browser that works transparently from the viewpoint of the main thread.

Here’s a look at caching a response using a cache-first strategy, wherein you try to get the response from the cache first, then fallback to the network (saving the response to the cache):


self.addEventListener('fetch', (event) => {
  const request = event.request;
  const url = new URL(request.url);
  // Try serving assets from cache first
  event.respondWith(
    caches.match(request)
      .then((cachedResponse) => {
        // If found in cache, return the cached response
        if (cachedResponse) {
          return cachedResponse;
        }
        // If not in cache, fetch from network
        return fetch(request)
          .then((response) => {
            // Clone the response for potential caching
            const responseClone = response.clone();
            // Cache the new response for future requests
            caches.open('my-cache')
              .then((cache) => {
                cache.put(request, responseClone);
              });
            return response;
          });
      })
  );
});

This gives you a highly customizable approach because you have full access to the request and response objects.

Conclusion

We’ve looked at the commonly used options for persisting data in the browser of varying profiles. When deciding which one to use, a useful algorithm is: What is the simplest option that meets my needs? Another concern is security, especially with cookies.

Other interesting possibilities are emerging with using WebAssembly for persistent storage. Wasm’s ability to run natively on the device could give performance boosts. We’ll look at using Wasm for data persistence another day.

Next read this:

Posted Under: Database
EDB unveils EDB Postgres AI

Posted by on 23 May, 2024

This post was originally published on this site

Relational database provider EnterpriseDB (EDB) on Thursday introduced EDB Postgres AI, a new database aimed at transactional, analytical, and AI workloads.

EDB Postgres AI, which was internally named Project Beacon during its development, started its life as a data lakehouse project with support for Delta Live Tables and later evolved into a product that combines EDB’s PostgreSQL software and other components such as data lakehouse analytics into a singular unified offering.  

PostgreSQL, which is an object-oriented relational database, has been gaining popularity because it is an open-source database with many potential providers and can be adapted to multiple workloads, said Matt Aslett, director at ISG’s Ventana Research.

“As a general-purpose database, PostgreSQL is suitable for both transactional and analytic applications,” Aslett explained.

The huge ecosystem of PostgreSQL-based databases that leverage the core technology and skills base makes PostgreSQL impossible to ignore, positioning it as a default standard enterprise-grade open-source database, experts said.

According to data from database knowledge base DB-Engines, PostgreSQL has been steadily rising in popularity and is currently the fourth most popular RDBMS (relational database management system) and fourth most popular database product overall in their rankings.

The constant rise in popularity has forced hyperscalers such as AWS, Google Cloud Platform, and Microsoft Azure to create database services built on PostgreSQL. Examples of these databases are AlloyDB, CitiusDB (PostgreSQL on Azure), Amazon Aurora, and Amazon RDS for PostgreSQL. Other rivals of EDB include YugabyteDB and CockroachDB.

What’s new in EDB Postgres AI?

The components of of EDB Postgres AI, which the company describes as an “intelligent data platform,” include a central management console with AI assistance, EDP Postgres databases, data lakehouse analytics, and AI/ML including vector database support.

The console, according to the company, provides a single pane to all EDB Postgres AI operations and helps manage the database landscape of an enterprise, in turn providing better observability. The console comes with an AI agent that can help enterprises manage on-premises databases.

EDB Postgres AI supports most databases that EDB offers including EDB Postgres, EDB Postgres Advanced Server, EDB Postgres Extended Server, and their respective distributed high availability versions. EDB Postgres Advanced Server also provides Oracle Database compatibility, the company said.

The lakehouse analytics module, according to EDB, brings structured and unstructured data together with the help of Nodes to be analyzed. Nodes support multiple formats, the company said, adding that it has built a custom store to support multiple data formats.

The AI/ML module includes vector support, which effectively gives the platform its capability to build AI-powered applications.

Additionally, Postgres AI comes with support for extensions such as Postgres Enterprise Manager, Barman, Query Advisor, and migration tools, such as the Migration Toolkit and the Migration Portal.

The Migration Portal, according to the company, is among the first EDB tools to include embedded AI via an AI copilot that can assist users in developing migration strategies.

The combination of these components or modules result in Postgres AI’s key capabilities such as rapid analytics, observability, vector support, high availability, and legacy modernization.

Explaining rapid analytics as a capability, EDB said that Postgres AI allows enterprises to spin up analytics clusters on demand.

With EDB Postgres Lakehouse capabilities, operational data can be stored in a columnar format, optimizing it for fast analytics,” the company said in a statement.

EDB added that its acquisition of Splitgraph, a startup that provides a PostgreSQL-compatible serverless SQL API for building data-driven applications from multiple data sources, last year played a foundational role in building out the analytics capability.

The release of EDB Postgres AI saw the company partner with the likes of Red Hat, Nutanix, and SADA.

EDB’s collaboration with Red Hat will enable enterprises to build AI models on Red Hat OpenShift AI and deliver enterprise-grade, day-two operations with EDB Postgres AI, the company said.

edb login screen EDB

EDB Postgres AI availability and pricing

EDB Postgres AI, according to Jozef de Vries, the chief engineering officer at EDB, is available as a managed service on AWS, GCP, and Azure.

“The analytics functionality is initially available only on AWS, with the other public clouds to follow soon,” Vries said, adding that Postgres AI can also be self-managed on a public cloud and private cloud environment of the customer’s choice.

EDB prices its EDB Postgres offerings on a per vCPU-hour basis. The company also provides a free tier across all of its database offerings.

The EDB Postgres offering costs $0.0856 per vCPU-hour, the company’s subscription listing showed. Other options, such as EDB Postgres Extended Server, EDB Postgres Advanced Server, and their distributed high availability versions cost $0.1655 per vCPU-hour, $0.2568 per vCPU-hour, $0.3424 per vCPU-hour, and $0.2511 per vCPU-hour respectively.

Why is EDB launching Postgres AI?

EDB Postgres AI, according to Aslett of Ventana Research, is being positioned to help enterprises bring AI capabilities to a variety of workloads regardless of deployment location.

“With EDB Postgres AI, EDB is addressing the requirement for data storage and processing both on-premises and in the cloud. This is important for AI-infused applications, which require high-performance AI inference at the point of interaction,” the research director explained.

Omdia’s chief analyst Bradley Shimmin sees the release of EDB Postgres AI as the repeat of the market branding mania from 1980s.

“It does seem like the 1980s with its ‘Turbo’ market branding mania, as we’re seeing a sudden and very pervasive influx of AI-branded products entering the market,” Shimmin said. Shimmin cited recent examples such as Red Hat Enterprise Linux AI and Oracle Database 23ai.

Shimmin said that he sees the EDB Postgres AI release as a mix of marketing hype and maturation of Postgres offerings.

“The Postgres database was certainly capable of supporting AI workloads before this release, plying vector data as a means of steering large language models away from confabulation and toward fact. What we are seeing from vendors like EDB with these AI branded releases is the vertical integration of functionality geared toward streamlining, simplifying, and accelerating AI-infused use cases,” the analyst explained.

The release of EDB Postgres AI coincides with EDB’s strategic move to a new corporate identity, EDB said.

Next read this:

Posted Under: Database
MariaDB plc: Shareholders speak, but execs are quiet

Posted by on 22 May, 2024

This post was originally published on this site

There appears to be many questions and few answers about MariaDB plc’s long-term strategy following an announcement that its shareholders have accepted an offer by California-based investment firm K1 Investment Management.

News that the company that provides database and SaaS services around the open-source database MariaDB had been acquired came on Monday, when it was announced that a trio of companies—K1; Meridian Bidco LLC, a K1 affiliate; and K5 Capital Advisors—“now have irrevocable shareholder support in respect of 68.51% of MariaDB shares.”

The company has had a litany of financial issues over the past 12 months, but when it released financial results for the quarter ended March 31 last week there was one bright spot: Its net loss had shrunk to $3.5 million, compared to a net Loss of $11.9 million a year earlier.

“We have demonstrated our ability to quickly turn our financial story around and are optimistic about the future performance of the business,” Paul O’Brien, CEO of MariaDB plc said in a statement accompanying the financial results.

As reported in InfoWorld in February, after going public in December 2022, the company saw its market capitalization plummet from $445 million to around $10 million by the end of 2023.

Carl Olofson, research vice president and database analyst with IDC, said that the key to determining what happens next is why the acquisition happened in the first place.

While executives at K1 and MariaDB plc have yet to comment on their future plans, Olofson said that “when you see something like this, there is one of two motivations. One is that you want to dismantle the company, and make a profit from the assets, which is not going to be the case here, because they do not really have assets.

“The other option is to really believe that with proper management, the right approach, the company can grow far beyond where it’s at now—make fabulous profits, sell it off and everybody walks away happy.”

In the open source database space, he said, there is a big difference when it comes to intellectual property (IP) between MariaDB and MySQL, the open-source database of which MariaDB is a fork. In the case of MySQL, the IP is owned by Oracle, but for MariaDB “it is owned by the MariaDB community, which is not part of the company. There is a clear distinction between MariaDB, the company, and MariaDB, the community.”

Olofson added that regardless of what happens to the corporate entity, the community will continue.

The open source project was created by Michael “Monty” Widenius, who was also a creator of MySQL. He set up the company Monty Program Ab which later became MariaDB, the company, and also the MariaDB Foundation, which is the custodian of the project’s open-source code.

Olofson spoke with Widenius briefly last year at a MariaDB user conference and described him as an “open source purist” who wants to “just put technology out there and let people do what they will with it.” But while that attitude might be great for users, it’s not good for MariaDB, the company: “That does not really help it in trying to survive commercially and meet payroll,” Olofson said.

As for the commercial side, while Olofson has no idea what K1’s corporate strategy might be moving forward, he said one option is to “go down the well-worn path of other open source software companies that are trying to make a living from the technology by what’s called an open core approach.”

That was already MariaDB’s strategy—it offers paid add-ons such as MaxScale, ColumnStore, or Galera Cluster, as well as consulting, migration and managed cloud services—“but they basically just ran out of money,” said Olofson.

“They were adding some really exciting and innovative features ,” he said, “and then they pulled back from some of that. I interpreted that as meaning they did not have the money to continue to support development.”

InfoWorld reached out to both MariaDB plc and K1 for comment, but at press time had not heard from either organization.

Next read this:

Posted Under: Database
Why you should use SQLite

Posted by on 22 May, 2024

This post was originally published on this site

Lift the hood on most business applications, and you’ll find they have some way to store and use structured data. Whether it’s a client-side app, an app with a web front end, or an edge-device app, chances are a business application needs a database. In many cases, an embedded database will do. Embedded databases are lightweight, compact, and portable—and for some applications, they are a better choice than a traditional server.

SQLite is an embeddable open source database, written in C and queryable with conventional SQL. SQLite is designed to be fast, portable, and reliable, whether you’re storing only kilobytes of data or multi-gigabyte blobs. We’ll take a look at SQLite, including where and when to use it and how it compares to alternatives such as MySQL, MariaDB, and other popular embedded databases.

What is SQLite used for?

The most common and obvious use case for SQLite is serving as a conventional, table-oriented relational database. SQLite supports transactions and atomic behaviors, so a program crash or even a power outage won’t leave you with a corrupted database. SQLite also has other features found in higher-end databases, such as full-text indexing, and support for large databases—up to 281 terabytes with row sizes up to 1GB.

SQLite also provides a fast and powerful way to store configuration data for a program. Instead of parsing a file format like JSON or YAML, a developer can use SQLite as an interface to those files—often far faster than operating on them manually. SQLite can work with in-memory data or external files (e.g., CSV files) as if they were native database tables, providing a handy way to query that data. It also natively supports JSON data, so data can be stored as JSON or queried in-place.

Advantages of SQLite

SQLite has many advantages, starting with its platform and language portability. Here are the main benefits of using SQLite:

  • It’s cross-platform: One of SQLite’s greatest advantages is that it can run nearly anywhere. SQLite has been ported to a wide variety of platforms: Windows, macOS, Linux, iOS, Android, and more. Windows users in particular can use precompiled binaries for regular Win32, UWP, WinRT, and .Net. Whatever your app’s deployment target is, odds are there’s an edition of SQLite available for it, or a way to port the C source code to that target.
  • It’s compatible with most programming languages: Applications that use SQLite don’t have to be written in any particular language, as long as there is some way to bind and work with external libraries written in C. SQLite’s binaries are self-contained, so they require no particular magic to deploy—you can simply drop them into the same directory as your application.
  • Yes, SQLite works with Python: Many languages have high-level bindings for SQLite as a library, and can use that in conjunction with other database access layers for the language. Python, for instance, bundles the SQLite library as a standard-issue element with the stock version of the Python interpreter. In addition, third parties have written a wide variety of ORMs and data layers that use SQLite, so you’re not stuck accessing SQLite via raw SQL strings (which is not only clumsy but also potentially dangerous).
  • Self-contained: Because SQLite is a single standalone binary, it’s easy to deploy with an app and then move with the app as needed. Each database created by SQLite also comprises a single file, which can be compacted or optimized using SQL commands.
  • Third-party extensions: Third-party binary extensions for SQLite add even more functionality. SQLCipher adds 256-bit AES encryption to SQLite database files. Another, sqlean, expands SQLite’s native functions to include many more not available by default, such as generating UUIDs or regular expression matching.
  • Extensive tooling: Many other third-party projects provide additional tooling for SQLite, such as the Visual Studio Code extension that allows browsing databases from within Visual Studio Code, or the LiteCLI interactive command-line for SQLite. A curated list of SQLite resources on GitHub includes many more options.
  • SQLite is open source: Finally, the source code for SQLite is public domain, so it can be reused in other programs with no practical restrictions.

SQLite vs. MySQL

SQLite is frequently compared to MySQL, the widely used open source database product that is a staple of today’s application stacks. As much as SQLite resembles MySQL, there are good reasons to favor one over the other, depending on the use case. The same is true for MariaDB, another popular database that is sometimes compared to SQLite.

Data types

SQLite has relatively few native data types—BLOB, NULL, INTEGER, REAL, and TEXT. Both MySQL and MariaDB, on the other hand, have dedicated data types for dates and times, various precisions of integers and floats, and much more.

If you’re storing relatively few data types, or you want to use your data layer to perform validation on the data, SQLite is useful. However, if you want your data layer to provide its own validation and normalization, go with MySQL or MariaDB.

Configuration and tuning

SQLite’s configuration and tuning options are minimal. Most of its internal or command-line flags deal with edge cases or backward compatibility. This fits with SQLite’s overall philosophy of simplicity: the default options are well-suited to most common use cases.

MySQL and MariaDB offer a veritable forest of database- and installation-specific configuration options—collations, indexing, performance tuning, storage engines, etc. The plethora of options is because these database products offer far more features. You may have to tweak them more, but it’s likely because you’re trying to do more in the first place.

Single-user vs. multi-user database

SQLite is best suited for applications with a single concurrent user, such as in desktop or mobile apps. MySQL and MariaDB are designed to handle multiple concurrent users. They can also provide clustered and scale-out solutions, whereas SQLite can’t.

Some projects add scaling features to SQLite, although not as a direct substitute for MySQL or MariaDB. Canonical has created its own variant of SQLite, dqlite, designed to scale out across a cluster. Data is kept consistent across nodes by way of a Raft algorithm, and deploying dqlite has only marginally more administrative overhead than SQLite.

SQLite vs. embedded databases

SQLite is far from the only embeddable database. Many others deliver similar features but emphasize different use cases or deployment models.

  • Apache Derby: An embeddable SQL engine, also repackaged by Oracle as Java DB. Since Apache Derby is written in Java and requires the JVM, it’s mainly designed for embedding in Java apps.
  • Firebird Embedded: The Firebird database, which runs cross-platform and sports many high-end features, is available as a library that can be embedded in a client application. Its feature set compares well to SQLite, but SQLite has a far larger user community and support base.
  • Realm: A high-performance relational database designed for mobile environments (mainly Android) that is also able to support desktop environments like Windows. Realm is object-based, however, and doesn’t use SQL queries—good if you’d rather not use SQL, but bad if SQL is familiar and comfortable. Realm is now a MongoDB project, and comes with the caveat that it is “not in itself an ‘end-user’ product with a publicly stable and supported API.”
  • VistaDB: An embedded database for the .Net runtime. VistaDB is available in versions specific to the various flavors and incarnations of .Net and with many enterprise features like full-database encryption. However, it’s a commercial product, not open source.
  • Berkeley DB: An Oracle project, nominally a key/value store, but one that uses SQLite in recent editions as a way to handle SQL queries. Berkeley DB’s underlying database engine has performance enhancements that SQLite can’t match, such as being able to handle multiple simultaneous write operations. Berkeley DB is dual-licensed, under either the GNU Affero GPL 3 or via a commercial license, depending on your use case.

Limitations of SQLite

SQLite’s design choices make it well-suited for some scenarios but poorly suited for others. Here are some places where SQLite doesn’t work well:

  • Apps that use features SQLite doesn’t support: SQLite does not support—and in many cases will not try to support—various relational database features. Many are corner cases, but even one of those can break the deal.
  • Apps that require scale-out designs: SQLite instances are singular and independent, with no native synchronization between them. They can’t be federated together or made into a cluster. Any software application that uses a scale-out design can’t use SQLite. As noted above, some third parties have extended SQLite to add those features, but they’re not native to SQLite’s design.
  • Apps with simultaneous write operations from multiple connections: SQLite locks the database for write operations, so anything involving multiple simultaneous write operations could result in performance issues. Apps with multiple simultaneous reads are generally fast, though. SQLite 3.7.0 and higher provide Write-Ahead Logging mode to make multiple writes work faster, but it comes with some limitations. For an alternative, considered Berkeley DB, mentioned above.
  • Apps that need strong data typing: SQLite has relatively few data types—no native datetime type, for instance. This means that the application must enforce most types. If you want the database, as opposed to the application, to normalize and constrain inputs for datetime values, SQLite may not work for you.

Next read this:

Posted Under: Database
Microsoft Build 2024: Cosmos DB for NoSQL gets vector search

Posted by on 21 May, 2024

This post was originally published on this site

Microsoft released multiple updates to its database offerings at its Build 2024 conference.

One of the major updates to its database offerings includes the addition of vector search to Azure Cosmos DB for NoSQL.

Azure Cosmos DB for NoSQL, which is a non-relational database service, is a part of the larger Azure Cosmos DB database offering, a distributed database which implements a set of different consistency models enabling users to trade off performance against latency in their applications.

Fundamentally, NoSQL databases do away with SQL databases’ constraints of data types and consistency in order to achieve more speed, flexibility, and scale.

Azure Cosmos DB supports working with different data models, including APIs for MongoDB and Apache Cassandra. The database also has a flavor of PostgreSQL.

Last year at Build, Microsoft had introduced vector search to Cosmos DB, building on the company’s Cosmos DB for MongoDB vCore service.

The vector search in Cosmos DB for NoSQL, according to the company, is powered by DiskANN—a suite of scalable approximate nearest neighbor search algorithms that support real-time changes.

In addition, the company also made the Azure Database for PostgreSQL extension for Azure AI generally available in order to bring AI capabilities to data in PostgreSQL.

“This enables developers who prefer PostgreSQL to plug data directly into Azure AI for a simplified path to leverage LLMs and build rich PostgreSQL generative AI experiences,” the company said in a statement.

Additionally, the company said it is working to add an embeddings generation feature inside its Azure Database for PostgreSQL offering. The new feature, which is currently in preview, can generate embeddings within the database.

Further, the company said it was adding more capabilities to Copilot inside databases to help developers, including adding the ability to provide summaries of technical documentation in response to user questions inside Azure Database for MySQL.

In March, the company announced a private preview of Copilot in Azure SQL Database in order to offer natural language to SQL conversion, along with self-help for database administration.

Next read this:

Posted Under: Database
SQL at 50: What’s next for the structured query language?

Posted by on 16 May, 2024

This post was originally published on this site

In May 1974, Donald Chamberlin and Raymond Boyce published a paper on SEQUEL, a structured query language that could be used to manage and sort data. After a change in title due to another company’s copyright on the word SEQUEL, Structured Query Language (SQL) was taken up by database companies like Oracle alongside their new-fangled relational database products later in the 1970s. The rest, as they say, is history.

SQL is now 50 years old. SQL was designed and then adopted around databases, and it has continued to grow and develop as a way to manage and interact with data. According to Stack Overflow, it is the third most popular language used by professional programmers on a regular basis. In 2023, the IEEE noted that SQL was the most popular language for developers to know when it came to getting a job, due to how it could be combined with other programming languages.

[ Also on InfoWorld: Why SQL still rules ]

When you look at other older languages being used today, the likes of COBOL (launched in 1959) and FORTRAN (first compiled in 1958) are still going, too. While they can lead to well-paying roles, they are linked to existing legacy deployments rather than new and exciting projects. SQL, on the other hand, is still being used as part of work around AI, analytics, and software development. It continues to be the standard for how we interact with data on a daily basis.

Why is SQL still so important?

When you look at SQL, you may ask why it has survived—even thrived—for so long. It is certainly not easy to learn, as it has a peculiar syntax that is very much of its time. The user experience around SQL can be challenging for new developers to pick up. Alongside this, every database vendor has to support SQL, but each also will have their own quirks or nuances in how they implement this support. Consequently, your approach for one database may not translate to another database easily, leading to both more work and more support requirements.

To make matters worse, it is easy to make mistakes in SQL that can have real and potentially catastrophic consequences. For example, missing a WHERE clause in your instructions can cause you to delete an entire table rather than carrying out the transaction you want, leading to lost data and recovery work. Checking your logic and knowing how things work in practice is a necessary requirement. 

So why is SQL still the leading way to work with data today, 50 years after it was first designed and released? SQL is based on strong mathematical theory, so it continues to perform effectively and support the use cases it was designed for. The truth is that when you combine SQL with relational databases, you can map the data that you create—and how you manage that data—to many business practices in a way that is reliable, effective, and scalable. Put simply, SQL works, and no replacement option has measured up in the same way.

As an example, SQL was the first programming language to return multiple rows per single request. This makes it easier to get data on what is taking place within a set of data—and consequently, within the business and its applications—and then turn it into something the business can use. Similarly, SQL made it easier to compartmentalize and segregate information into different tables, and then use the data in those tables for specific business tasks, such as putting customer data in one table and manufacturing data in another. The ability to perform transactions is the backbone of most processes today, and SQL made that possible at scale.

Another important reason for the success of SQL is that the language has always moved with the times. From its relational roots, SQL has added support for geographic information system (GIS) data, for JSON documents, and for XML and YAML over the years. This has kept SQL up to speed with how developers want to interact with data. Now, SQL can be combined with vector data, enabling developers to interact with data using SQL but carrying out vector searches for generative AI applications.

What is the future for SQL?

There have been attempts to replace SQL in the past. NoSQL (Not only SQL) databases were developed to replace relational databases and get away from the traditional models of working with and managing data at scale. However, rather than replacing SQL, these databases added their own SQL-like languages that replicated some of the methods and approaches that SQL has ingrained into how developers work.

In the past, natural language processing advocates have called for new methods that do away with SQL’s standardized and clunky approach. However, these attempts have ended up with methods that were just as clunky as what they tried to replace, which led to them being sidelined or ignored. Generative AI may take on more of the task of writing SQL for developers, as large language models have been exposed to large quantities of SQL code as part of their training. However, while this approach may develop and become more popular in time, it still relies on SQL for the actual interaction with those sets of data and to deliver the results back to the user. If anything, this will likely make SQL more important for the future, not less, even though it will be less visible to the developer.

Even if SQL ends up moving behind the curtain, it will continue to play a critical role in how we interact with and use data. With such a huge percentage of all our IT systems relying on data to function, SQL will not be going away any time soon. So, let’s celebrate SQL turning 50, and consider how we can continue to develop and use it in the future.

Charly Batista is PostgreSQL technical lead at Percona.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

Next read this:

Posted Under: Database
DuckDB: The tiny but powerful analytics database

Posted by on 15 May, 2024

This post was originally published on this site

Most people assume that analytical databases, or OLAPs, are big, powerful beasts—and they are correct. Systems like Snowflake, Redshift, or Postgres involve a lot of setup and maintenance, even in their cloud-hosted incarnations. But what if all you want is “just enough” analytics for a dataset on your desktop? In that case, DuckDB is worth exploring.

Columnar data analytics on your laptop

DuckDB is a tiny but powerful analytics database engine—a single, self-contained executable, which can run standalone or as a loadable library inside a host process. There’s very little you need to set up or maintain with DuckDB. In this way, it is more like SQLite than the bigger analytical databases in its class.

DuckDB is designed for column-oriented data querying. It ingests data from sources like CSV, JSON, and Apache Parquet, and enables fast querying using familiar SQL syntax. DuckDB supports libraries for all the major programming languages, so you can work with it programmatically using the language of your choice. Or you can use DuckDB’s command-line interface, either on its own or as part of a shell pipeline.

Loading data into DuckDB

When you work with data in DuckDB, there are two modes you can use for that data. Persistent mode writes the data to disk so it can handle workloads bigger than system memory. This approach comes at the cost of some speed. In-memory mode keeps the data set entirely in memory, which is faster but retains nothing once the program ends. (SQLite can be used the same way.)

DuckDB can ingest data from a variety of formats. CSV, JSON, and Apache Parquet files are three of the most common. With CSV and JSON, DuckDB by default attempts to figure out the columns and data types on its own, but you can override that process as needed—for instance, to specify a format for a date column.

Other databases, like MySQL or Postgres, can also be used as data sources. You’ll need to load a DuckDB extension (more on this later) and provide a connection string to the database server; DuckDB doesn’t read the files for those databases directly. With SQLite, though, you connect to the SQLite database file as though it were just another data file.

To load data into DuckDB from an external source, you can use an SQL string, passed directly into DuckDB:


SELECT * FROM read_csv('data.csv');

You can also use methods in the DuckDB interface library for a given language. With the Python library for DuckDB, ingesting looks like this:


import duckdb
duckdb.read_csv("data.csv")

You can also query certain file formats directly, like Parquet:


SELECT * FROM 'test.parquet';

You can also issue file queries to create a persistent data view, which is usable as a table for multiple queries:


CREATE VIEW test_data AS SELECT * FROM read_parquet('test.parquet');

DuckDB has optimizations for working with Parquet files, so that it reads only what it needs from the file.

Other interfaces like ADBC and ODBC can also be used. ODBC serves as a connector for data visualization tools like Tableau.

Data imported into DuckDB can also be re-exported in many common formats: CSV, JSON, Parquet, Microsoft Excel, and others. This makes DuckDB useful as a data-conversion tool in a processing pipeline.

Querying data in DuckDB

Once you’ve loaded data into DuckDB, you can query it using SQL expressions. The format for such expressions is no different from regular SQL queries:


SELECT * FROM users WHERE ID>1000 ORDER BY Name DESC LIMIT 5;

If you’re using a client API to query DuckDB, you can pass SQL strings through the API, or you can use the client’s relational API to build up queries programmatically. In Python, reading from a JSON file and querying it might look like this:


import duckdb
file = duckdb.read_json("users.json")
file.select("*").filter("ID>1000").order("Name").limit(5)

If you use Python, you can use the PySpark API to query DuckDB directly, although DuckDB’s implementation of PySpark doesn’t yet support the full feature set.

DuckDB’s dialect of SQL closely follows most common SQL dialects, although it comes with a few gratuitous additions for the sake of analytics. For instance, placing the SAMPLE clause in a query lets you run a query using only a subset of the data in a table. The resulting query runs faster but it may be less accurate. DuckDB also supports the PIVOT keyword (for creating pivot tables), window functions and QUALIFY clauses to filter them, and many other analytics functions in its SQL dialect.

DuckDB extensions

DuckDB isn’t limited to the data formats and behaviors baked into it. Its extension API makes it possible to write third-party add-ons for DuckDB to support new data formats or other behaviors.

Some of the functionality included with DuckDB is implemented through first-party add-ons, like support for Parquet files. Others, like MySQL or Postgres connectivity, or vector similarity search, are also maintained by DuckDB’s team but provided separately.

Next read this:

Posted Under: Database
DuckDB: The tiny but powerful analytics database

Posted by on 15 May, 2024

This post was originally published on this site

Most people assume that analytical databases, or OLAPs, are big, powerful beasts—and they are correct. Systems like Snowflake, Redshift, or Postgres involve a lot of setup and maintenance, even in their cloud-hosted incarnations. But what if all you want is “just enough” analytics for a dataset on your desktop? In that case, DuckDB is worth exploring.

Columnar data analytics on your laptop

DuckDB is a tiny but powerful analytics database engine—a single, self-contained executable, which can run standalone or as a loadable library inside a host process. There’s very little you need to set up or maintain with DuckDB. In this way, it is more like SQLite than the bigger analytical databases in its class.

DuckDB is designed for column-oriented data querying. It ingests data from sources like CSV, JSON, and Apache Parquet, and enables fast querying using familiar SQL syntax. DuckDB supports libraries for all the major programming languages, so you can work with it programmatically using the language of your choice. Or you can use DuckDB’s command-line interface, either on its own or as part of a shell pipeline.

Loading data into DuckDB

When you work with data in DuckDB, there are two modes you can use for that data. Persistent mode writes the data to disk so it can handle workloads bigger than system memory. This approach comes at the cost of some speed. In-memory mode keeps the data set entirely in memory, which is faster but retains nothing once the program ends. (SQLite can be used the same way.)

DuckDB can ingest data from a variety of formats. CSV, JSON, and Apache Parquet files are three of the most common. With CSV and JSON, DuckDB by default attempts to figure out the columns and data types on its own, but you can override that process as needed—for instance, to specify a format for a date column.

Other databases, like MySQL or Postgres, can also be used as data sources. You’ll need to load a DuckDB extension (more on this later) and provide a connection string to the database server; DuckDB doesn’t read the files for those databases directly. With SQLite, though, you connect to the SQLite database file as though it were just another data file.

To load data into DuckDB from an external source, you can use an SQL string, passed directly into DuckDB:


SELECT * FROM read_csv('data.csv');

You can also use methods in the DuckDB interface library for a given language. With the Python library for DuckDB, ingesting looks like this:


import duckdb
duckdb.read_csv("data.csv")

You can also query certain file formats directly, like Parquet:


SELECT * FROM 'test.parquet';

You can also issue file queries to create a persistent data view, which is usable as a table for multiple queries:


CREATE VIEW test_data AS SELECT * FROM read_parquet('test.parquet');

DuckDB has optimizations for working with Parquet files, so that it reads only what it needs from the file.

Other interfaces like ADBC and ODBC can also be used. ODBC serves as a connector for data visualization tools like Tableau.

Data imported into DuckDB can also be re-exported in many common formats: CSV, JSON, Parquet, Microsoft Excel, and others. This makes DuckDB useful as a data-conversion tool in a processing pipeline.

Querying data in DuckDB

Once you’ve loaded data into DuckDB, you can query it using SQL expressions. The format for such expressions is no different from regular SQL queries:


SELECT * FROM users WHERE ID>1000 ORDER BY Name DESC LIMIT 5;

If you’re using a client API to query DuckDB, you can pass SQL strings through the API, or you can use the client’s relational API to build up queries programmatically. In Python, reading from a JSON file and querying it might look like this:


import duckdb
file = duckdb.read_json("users.json")
file.select("*").filter("ID>1000").order("Name").limit(5)

If you use Python, you can use the PySpark API to query DuckDB directly, although DuckDB’s implementation of PySpark doesn’t yet support the full feature set.

DuckDB’s dialect of SQL closely follows most common SQL dialects, although it comes with a few gratuitous additions for the sake of analytics. For instance, placing the SAMPLE clause in a query lets you run a query using only a subset of the data in a table. The resulting query runs faster but it may be less accurate. DuckDB also supports the PIVOT keyword (for creating pivot tables), window functions and QUALIFY clauses to filter them, and many other analytics functions in its SQL dialect.

DuckDB extensions

DuckDB isn’t limited to the data formats and behaviors baked into it. Its extension API makes it possible to write third-party add-ons for DuckDB to support new data formats or other behaviors.

Some of the functionality included with DuckDB is implemented through first-party add-ons, like support for Parquet files. Others, like MySQL or Postgres connectivity, or vector similarity search, are also maintained by DuckDB’s team but provided separately.

Next read this:

Posted Under: Tech Reviews
Page 2 of 812345...Last »

Social Media

Bulk Deals

Subscribe for exclusive Deals

Recent Post

Archives

Facebook

Twitter

Subscribe for exclusive Deals




Copyright 2015 - InnovatePC - All Rights Reserved

Site Design By Digital web avenue