All posts by Richy George

YugabyteDB Managed adds managed command line interface

Posted by on 27 March, 2023

This post was originally published on this site

Yugabyte on Monday said that it was adding a new managed command line interface along with other features to the managed version of its open source distributed SQL database, dubbed YugabyteDB Managed.

The new Managed Command Line Interface (CLI), according to the company, allows developers to benefit from the advantages of automation while writing code without needing to learn new skills.

“Developers of all levels can easily create and manage clusters from their terminal or Integrated Development Environment (IDE) and make use of the most advanced set of tools available for optimizing database performance and driving the business forward,” Karthik Ranganathan, CTO and co-founder, Yugabyte, said in a statement.

This means that developers can create and manage clusters hosted in YugabyteDB Managed from their IDE or terminal without requiring REST API or Terraform skills, added Ranganathan.

In addition, the new Managed CLI can automate repetitive tasks and has an auto-completion feature that makes it easy for developers, database administrators, and DevOps engineers to discover new features, the company said.

The new CLI also comes with support for multiple platforms such as Mac, Windows and Linux, the company added. The Windows version can be downloaded from GitHub.

The latest update to YugabyteDB Managed also features enhanced observability features with the company adding over 125 new SQL and storage layer metrics.

“With these new metrics, enterprises will gain even deeper insights into their database’s performance, making it easier to identify and resolve performance issues quickly,” the company wrote in a blog post.

The cloud-based user interface for observability inside YugabyteDB Managed includes new visualization options to reorder metrics for a custom dashboard and new synchronized tooltips in charts for easier troubleshooting, the company added.

Further, the company has added support for AWS PrivateLink, a service that provides private connectivity between virtual private clouds, supported AWS services, and on-premises networks.

“This feature, now in private preview, is available for dedicated clusters created in YugabyteDB Managed on AWS, as an alternative to VPC peering, for secure access to your databases over a private network,” the company said.

The support for AWS PrivateLink also provides more secure access to an enterprise’s databases, it added.

Enterprises that already are a user of YugabyteDB can get access to a free trial of YugbyteDB Managed with all features on request.

Posted Under: Database
Oracle adds machine learning features to MySQL HeatWave

Posted by on 23 March, 2023

This post was originally published on this site

Oracle is adding new machine learning features to its data analytics cloud service MySQL HeatWave.

MySQL HeatWave combines OLAP (online analytical processing), OLTP (online transaction processing), machine learning, and AI-driven automation in a single MySQL database.

The new machine learning capabilities will be added to the service’s AutoML and MySQL Autopilot components, the company said when it announced the update on Thursday.

While AutoML allows developers and data analysts to build, train and deploy machine learning models within MySQL HeatWave without moving to a separate service for machine learning, MySQL Autopilot provides machine learning-based automation to HeatWave and OLTP such as auto provisioning, auto encoding, auto query plan, auto shape prediction and auto data placement, among other features.

AutoML augments time series forecasting via machine learning

The new machine learning-based capabilities added to AutoML include multivariate time series forecasting, unsupervised anomaly detection, and recommender systems, Oracle said, adding that all the new features were generally available.

“Multivariate time series forecasting can predict multiple time-ordered variables, where each variable depends both on its past value and the past values of other dependent variables. For example, it is used to build forecasting models to predict electricity demand in the winter considering the various sources of energy used to generate electricity,” said Nipun Agarwal, senior vice president of research at Oracle.

In contrast to the regular practice of having a statistician trained in time-series analysis or forecasting to select the right algorithm for the desired output, AutoML’s multivariate time series forecasting automatically preprocesses the data to select the best algorithm for the ML model and automatically tunes the model, the company said.

“The HeatWave AutoML automated forecasting pipeline uses a patented technique that consists of stages including advanced time-series preprocessing, algorithm selection and hyperparameter tuning,” said Agarwal, adding that this automation can help enterprises save time and effort as they don’t need to have trained statisticians on staff.

The multivariate time series forecasting feature, according to Constellation Research Principal Analyst Holger Muller, is unique to Oracle’s MySQL HeatWave.

“Time series forecasting, multivariate or otherwise, is not currently offered as part of a single database that offers machine learning-augmented analytics. AWS, for example, offers a separate database for time series,” Muller said.

HeatWave enhances anomaly detection

Along with multivariate time series forecasting, Oracle is adding machine-learning based “unsupervised” anomaly detection to MySQL HeatWave.

In contrast to the practice of using specific algorithms to detect specific anomalies in data, AutoML can detect different types of anomalies from unlabeled data sets, the company said, adding that this feature helps enterprise users when they don’t know what anomaly types are in the dataset.

“The model generated by HeatWave AutoML provides high accuracy for all types of anomalies — local, cluster, and global. The process is completely automated, eliminating the need for data analysts to manually determine which algorithm to use, which features to select, and the optimal values of the hyperparameters,” said Agarwal.

In addition, AutoML has added a recommendation engine, which it calls recommender systems, that underpins automation for algorithm selection, feature selection, and hyperparameter optimization inside MySQL HeatWave.

“With MySQL HeatWave, users can invoke the ML_TRAIN procedure, which automatically trains the model that is then stored in the MODEL_CATALOG. To predict a recommendation, users can invoke ML_PREDICT_ROW or ML_PREDICT_TABLE,” said Agarwal.

Business users get MySQL HeatWave AutoML console

In addition, Oracle is adding an interactive console for business users inside HeatWave.

“The new interactive console lets business analysts build, train, run, and explain ML models using the visual interface — without using SQL commands or any coding,” Agarwal said, adding that the console makes it easier for business users to explore conditional scenarios for their enterprise.

“The addition of the interactive console is in line with enterprises trying to make machine learning accountable. The console will help business users dive into the deeper end of the pool as they want to evolve into ‘citizen data scientists’ to avoid getting into too much hot water,” said Tony Baer, principal analyst at dbInsight.

The console has been made initially available for MySQL HeatWave on AWS.

Oracle also said that it would be adding support for storage on Amazon S3 for HeatWave on AWS to reduce cost as well improve the availability of the service.

“When data is loaded from MySQL (InnoDB storage engine) into HeatWave, a copy is made to the scale-out data management layer built on S3. When an operation requires reloading of data to HeatWave, such as during error recovery, data can be accessed in parallel by multiple HeatWave nodes and the data can be directly loaded into HeatWave without the need for any transformation,” said Agarwal.

MySQL Autopilot updates

The new features added to MySQL HeatWave include two new additions to MySQL Autopilot —  Auto Shape prediction advisor integration with the interactive console and auto unload.

“Within the interactive console, database users can now access the MySQL Autopilot Auto shape prediction advisor that continuously monitors the OLTP workload to recommend with an explanation the right compute shape at any given time — allowing customers to always get the best price-performance,” Agarwal said.

The auto unload feature, according to the company, can recommend which tables to be unloaded based on workload history.

“Freeing up memory reduces the size of the cluster required to run a workload and saves cost,” Agarwal said, adding that both the features were in general availability.

HeatWave targets smaller data volumes

Oracle is offering a smaller shape HeatWave to attract customers with smaller sizes of data.

In contrast to the earlier size of 512GB for a standard HeatWave node, the smaller shape will have a size of 32GB with the ability to process up to 50GB for a price of $16 per month, the company said.

In addition, the company said that data processing capability for its standard 512GB HeatWave Node has been increased from 800GB to 1TB.

“With this increase and other query performance improvements, the price performance benefit of HeatWave has further increased by 15%,” said Agarwal.

Posted Under: Database
Tailscale: Fast and easy VPNs for developers

Posted by on 15 March, 2023

This post was originally published on this site

Networking can be an annoying problem for software developers. I’m not talking about local area networking or browsing the web, but the much harder problem of ad hoc, inbound, wide area networking.

Suppose you create a dazzling website on your laptop and you want to share it with your friends or customers. You could modify the firewall on your router to permit incoming web access on the port your website uses and let your users know the current IP address and port, but that could create a potential security vulnerability. Plus, it would only work if you have control over the router and you know how to configure firewalls for port redirection.

Alternatively, you could upload your website to a server, but that’s an extra step that can often become time-consuming, and maintaining dedicated servers can be a burden, both in time and money. You could spin up a small cloud instance and upload your site there, but that is also an extra step that can often become time-consuming, even though it’s often fairly cheap.

Another potential solution is Universal Plug and Play (UPnP), which enables devices to set port forwarding rules by themselves. UPnP needs to be enabled on your router, but it’s only safe if the modem and router are updated and secure. If not, it creates serious security risks on your whole network. The usual advice from security vendors is not to enable it, since the UPnP implementations on many routers are still dangerous, even in 2023. On the other hand, if you have an Xbox in the house, UPnP is what it uses to set up your router for multiplayer gaming and chat.

A simpler and safer way is Tailscale, which allows you to create an encrypted, peer-to-peer virtual network using the secure WireGuard protocol without generating public keys or constantly typing passwords. It can traverse NAT and firewalls, span subnets, use UPnP to create direct connections if it’s available, and connect via its own network of encrypted TCP relay servers if UPnP is not available.

In some sense, all VPNs (virtual private networks) compete with Tailscale. Most other VPNs, however, route traffic through their own servers, which tends to increase the network latency. One major use case for server-based VPNs is to make your traffic look like it’s coming from the country where the server is located; Tailscale doesn’t help much with this. Another use case is to penetrate corporate firewalls by using a VPN server inside the firewall. Tailscale competes for this use case, and usually has a simpler setup.

Besides Tailscale, the only other peer-to-peer VPN is the free open source WireGuard, on which Tailscale builds. Wireguard doesn’t handle key distribution and pushed configurations. Tailscale takes care of all of that.

What is Tailscale?

Tailscale is an encrypted point-to-point VPN service based on the open source WireGuard protocol. Compared to traditional VPNs based on central servers, Tailscale often offers higher speeds and lower latency, and it is usually easier and cheaper to set up and use.

Tailscale is useful for software developers who need to set up ad hoc networking and don’t want to fuss with firewalls or subnets. It’s also useful for businesses that need to set up VPN access to their internal networks without installing a VPN server, which can often be a significant expense.

Installing and using Tailscale

Signing up for a Tailscale Personal plan was free and quick; I chose to use my GitHub ID for authentication. Installing Tailscale took a few minutes on each machine I tried: an M1 MacBook Pro, where I installed it from the macOS App Store; an iPad Pro, installed from the iOS App Store; and a Pixel 6 Pro, installed from the Google Play Store. Installing on Windows starts with a download from the Tailscale website, and installing on Linux can be done using a curl command and shell script, or a distribution-specific series of commands.

tailscale 01 IDG

You can install Tailscale on macOS, iOS, Windows, Linux, and Android. This tab shows the instructions for macOS.

Tailscale uses IP addresses in the 100.x.x.x range and automatically assigns DNS names, which you can customize if you wish. You can see your whole “tailnet” from the Tailscale site and from each machine that is active on the tailnet.

In addition to viewing your machines, you can view and edit the services available, the users of your tailnet, your access controls (ACL), your logs, your tailnet DNS, and your tailnet settings.

tailscale 02 IDG

Once the three devices were running Tailscale, I could see them all on my Tailscale login page. I chose to use my GitHub ID for authentication, as I was testing just for myself. If I were setting up Tailscale for a team I would use my team email address.

tailscale 06 IDG

Tailscale pricing.

Tailscale installs a CLI on desktop and laptop computers. It’s not absolutely necessary to use this command line, but many software developers will find it convenient.

How Tailscale works

Tailscale, unlike most VPNs, sets up peer-to-peer connections, aka a mesh network, rather than a hub-and-spoke network. It uses the open source WireGuard package (specifically the userspace Go variant, wireguard-go) as its base layer.

For public key distribution, Tailscale does use a hub-and-spoke configuration. The coordination server is at login.tailscale.com. Fortunately, public key distribution takes very little bandwidth. Private keys, of course, are never distributed.

You may be familiar with generating public-private key pairs manually to use with ssh, and including a link to the private key file as part of your ssh command line. Tailscale does all of that transparently for its network, and ties the keys to whatever login or 2FA credentials you choose.

The key pair steps are:

  1. Each node generates a random public/private key pair for itself, and associates the public key with its identity.
  2. The node contacts the coordination server and leaves its public key and a note about where that node can currently be found, and what domain it’s in.
  3. The node downloads a list of public keys and addresses in its domain, which have been left on the coordination server by other nodes.
  4. The node configures its WireGuard instance with the appropriate set of public keys.

Tailscale doesn’t handle user authentication itself. Instead, it always outsources authentication to an OAuth2, OIDC (OpenID Connect), or SAML provider, including Gmail, G Suite, and Office 365. This avoids the need to maintain a separate set of user accounts or certificates for your VPN.

tailscale 07 IDG

Tailscale CLI help. On macOS, the CLI executable lives inside the app package. A soft link to this executable doesn’t seem to work on my M1 MacBook Pro, possibly because Tailscale runs in a sandbox.

NAT traversal is a complicated process, one that I personally tried unsuccessfully to overcome a decade ago. NAT (network address translation) is one of the ways firewalls work: Your computer’s local address of, say, 192.168.1.191, gets translated in the firewall, as a packet goes from your computer to the internet, to your current public IP address and a random port number, say 173.76.179.155:9876, and remembers that port number as yours. When a site returns a response to your request, your firewall recognizes the port and translates it back to your local address before passing you the response.

tailscale 08 IDG

Tailscale status, Tailscale pings to two devices, and plain pings to the same devices using the native network. Notice that the Tailscale ping to the Pixel device first routes via a DERP server (see below) in NYC, and then manages to find the LAN connection.

Where’s the problem? Suppose you have two firewall clients trying to communicate peer-to-peer. Neither can succeed until someone or something tells both ends what port to use.

This arbitrator will be a server when you use the STUN (Session Traversal Utilities for NAT) protocol; while STUN works on most home routers, it unfortunately doesn’t work on most corporate routers. One alternative is the TURN (Traversal Using Relays around NAT) protocol, which uses relays to get around the NAT deadlock issue; the trouble with that is that TURN is a pain in the neck to implement, and there aren’t many existing TURN relay servers.

Tailscale implements a protocol of its own for this, called DERP (Designated Encrypted Relay for Packets). This use of the term DERP has nothing to do with being goofy, but it does suggest that someone at Tailscale has a sense of humor.

Tailscale has DERP servers around the world to keep latency low; these include nine servers in the US. If, for example, you are trying to use Tailscale to connect your smartphone from a park to your desktop at your office, the chances are good that the connection will route via the nearest DERP server. If you’re lucky, the DERP server will only be used as a side channel to establish the connection. If you’re not, the DERP server will carry the encrypted WireGuard traffic between your nodes.

Tailscale vs. other VPNs

Tailscale offers a reviewer’s guide. I often look at such documents and then do my own thing because I’ve been around the block a couple of times and recognize when a company is putting up straw men and knocking them down, but this one is somewhat helpful. Here are some key differentiators to consider.

With most VPNs, when you are disconnected you have to log in again. It can be even worse when your company has two internet providers and has two VPN servers to handle them, because you usually have to figure out what’s going on by trial and error or by attempting to call the network administrator, who is probably up to his or her elbows in crises. With Tailscale (and WireGuard), the connection just resumes. Similarly, many VPN servers have trouble with flakey connections such as LTE. Tailscale and WireGuard take the flakiness in stride.

With most VPNs, getting a naive user connected for the first time is an exercise in patience for the network administrator and possibly scary for the user who has to “punch a hole” in her home firewall to enable the connection. With Tailscale it’s a five-minute process that isn’t scary at all.

Most VPNs want to be exclusive. Connecting to two VPN concentrators at once is considered a cardinal sin and a potential security vulnerability, especially if they are at different companies. Tailscale doesn’t care. WireGuard can handle this situation just fine even with hub-and-spoke topologies, and with Tailscale point-to-point connections there is a Zero Trust configuration that exposes no vulnerability.

Tailscale solutions

Tailscale has documented about a dozen solutions to common use cases that can be addressed with its ad hoc networking. These range from wanting to code from your iPad to running a private Minecraft server without paying for hosting or opening up your firewall.

As we’ve seen, Tailscale is simple to use, but also sophisticated under the hood. It’s an easy choice for ad hoc networking, and a reasonable alternative to traditional hub-and-spoke VPNs for companies. The only common VPN function that I can think of that it won’t do is spoof your location so that you can watch geographically restricted video content—but there are free VPNs that handle that.

Cost: Personal, open source, and “friends and family” plans, free. Personal Pro, $48 per year. Team, $5 per user per month (free trial available). Business, $15 per user per month (free trial available). Custom plans, contact sales.

Platform: macOS 10.13 or later, Windows 7 SP1 or later, Linux (most major distros), iOS 15 or later, Android 6 or later, Raspberry Pi, Synology.

Posted Under: Tech Reviews
Tibco’s Spotfire 12.2 release adds streaming and data science tools

Posted by on 14 March, 2023

This post was originally published on this site

Enterprise software provider Tibco is releasing a new version of its data visualization and analytics platform, Spotfire 12.2, with new features that focus on aiding developers and bolstering the software’s ability to act as an end-to-end system combining data science, streaming, and data management tools.  

“With the new release of Spotfire, we are able to combine databases, data science, streaming, real-time analytics and data management, giving Tibco Cloud the capability of an end-to-end platform,” said Michael O’Connell, chief analytics officer at Tibco.

The update, released Tuesday, comes with Tibco’s Cloud Actions, which enables business users to take actions directly from within the business insights windowe, according to O’Connell.

“New Tibco Spotfire Cloud Actions bridge the gap between insight and decision. This no-code capability for writing transactions to any cloud or on-premise application allows you to take action across countless operational systems, spanning all of today’s top enterprise business applications and databases,” the company said in a blog post.

This is made possible by Tibco Cloud Integration, which is an integration platform-as-a-service offered by the company over cloud. Cloud Integration supports all traditional iPaaS use cases and is optimized for REST-based and API use cases, Tibco said, adding that it offers over 800 operating system connectors and works with databases such as Dynamics 365, Amazon Redshift, Google Analytics, Magneto and MySQL connector.

Business users can also use Tibco Cloud Live Apps, which is a no-code interface, to create and automate manual workflows, to enable Cloud Actions, the company said.

Spotfire triggers actions, automation in other apps

Spotfire’s Cloud Actions, according to Constellation Research principal analyst Doug Henschen, enables Spotfire users to harness insights and set criteria that, when met, trigger actions and automation within other apps.

“Customers don’t want insights to end with reports and dashboards that are disconnected from the apps and platforms where people take action and get work done,” Henschen said, adding that leading vendors have been pushing to drive insights into action with workflow and automation options, whereby alerts as well as human and automated triggers can be used to kick off actions and business processes within external systems.

In addition to Cloud Actions, the company is offering data visualization modifications, dubbed mods, which are developed by Tibco and its community of users.

These modifications offer nuanced and different views to generate more insights, the company said, with Connell adding that they can be downloaded from the company website and other community sites.

In addition, Tibco Community, according to the company, provides hands-on enablement, along with galleries of prebuilt mods visualizations, data functions, and Cloud Actions grab-and-go templates, offering point-and-click deployment.

Tibco Streaming, Data Science offer growth opportunities

As part of the 12.2 update, Tibco is offering new features as part of its Tibco Streaming and Tibco Data Science’s Team Studio.

Tibco Streaming now comes with dynamic learning, which analyzes streaming data to automate data management and analytics calculations for real-time events, merging historical and streaming data as part of the same analysis, the company said.

This, according to Tibco, enables business intelligence to expand into low-latency operational use cases, such as IoT and edge sensors, with Spotfire serving as the control and decision hub.

On the data science side, Tibco, has updated its Team Studio to include a new Apache Spark 3 workflow engine to improve performance.

The performance improvement is made possible by a new operator framework that merges core and custom operators, enabling workflows to execute as a single Spark application, the company said.

Data Virtualization enables AI model training

In addition, the company has updated its Tibco Data Virtualization offering, allowing users to control Team Studio data preparation and do AI and analytics model training and inferencing at scale from within the Spotfire interface.

“End user applications can train models, make predictions, summarize data, and apply data science techniques, in context of the business problem at hand,” the company said.

Tibco Data Science’s Team Studio and Tibco Streaming will not only allow the company to offer end-to-end services with Tibco Cloud but also unfurl growth opportunities for the company, analysts said.

Tibco Data Science is about developing and deploying predictive models and managing their complete life cycle, according to Henschen.

“The Team Studio component of Data Science and the integrations with Spotfire and other tools are about making those predictive capabilities accessible to non-data-scientists so they can take proactive action,” Henschen said.

The demand for Tibco’s data science tools and streaming, according to Ventana Research’s David Menninger, will see an increase as more and more business processes involve real time analyses.

“The only way to keep up with real time processes is with AI and machine learning. You can’t expect someone to be monitoring a dashboard in real time to determine what the best action is for the current situation. These decisions need to be made algorithmically and that’s were data science comes in,” Menninger said.

Tibco, according to market research firm IDC, competes with companies including  Microsoft, Tableau, Qlik, IBM and Oracle in the business intelligence market.

Tibco has captured just 1.22% of the market, with installations in 8,160 companies, according to market research firm Enlyft.

The research firm lists Tableau and Microsoft Power BI as the market leaders, with 17% and 14% market share, respectively.

Posted Under: Database
| eBay

Posted by on 14 March, 2023

This post was originally published on this site

Advanced
Posted Under: eBay Store
What’s new in Apache Cassandra 4.1

Posted by on 9 March, 2023

This post was originally published on this site

Apache Cassandra 4.1 was a massive effort by the Cassandra community to build on what was released in 4.0, and it is the first of what we intend to be yearly releases. If you are using Cassandra and you want to know what’s new, or if you haven’t looked at Cassandra in a while and you wonder what the community is up to, then here’s what you need to know.

First off, let’s address why the Cassandra community is growing. Cassandra was built from the start to be a distributed database that could run across dispersed geographic locations, across different platforms, and to be continuously available despite whatever the world might throw at the service. If you asked ChatGPT to describe a database that today’s developer might need—and we did—the response would sound an awful lot like Cassandra.

Cassandra meets what developers need in availability, scalability, and reliability, which are things you just can’t bolt on afterward, however much you might try. The community has put a focused effort into producing tools that would define and validate the most stable and reliable database that they could, because it is what supports their businesses at scale. This effort supports everyone who wants to run Cassandra for their applications.

Guardrails for new Cassandra users

One of the new features in Cassandra 4.1 that should interest those new to the project is Guardrails, a new framework that makes it easier to set up and maintain a Cassandra cluster. Guardrails provide guidance on the best implementation settings for Cassandra. More importantly, Guardrails prevent anyone from selecting parameters or performing actions that would degrade performance or availability.

An example of this is secondary indexing. A good secondary index helps you improve performance, so having multiple secondary indexes should be even more beneficial, right? Wrong. Having too many can degrade performance. Similarly, you can design queries that might run across too many partitions and touch data across all of the nodes in a cluster, or use queries alongside replica-side filtering, which can lead to reading all the memory on all nodes in a cluster. For those experienced with Cassandra, these are known issues that you can avoid, but Guardrails make it easy for operators to prevent new users from making the same mistakes.

Guardrails are set up in the Cassandra YAML configuration files, based on settings including table warnings, secondary indexes per table, partition key selections, collection sizes, and more. You can set warning thresholds that can trigger alerts, and fail conditions that will prevent potentially harmful operations from happening.

Guardrails are intended to make managing Cassandra easier, and the community is already adding more options to this so that others can make use of them. Some of the newcomers to the community have already created their own Guardrails, and offered suggestions for others, which indicates how easy Guardrails are to work with.

To make things even easier to get right, the Cassandra project has spent time simplifying the configuration format with standardized names and units, while still supporting backwards compatibility. This provides an easier and more uniform way to add new parameters for Cassandra, while also reducing the risk of introducing any bugs. 

Improving Cassandra performance

Alongside making things easier for those getting started, Cassandra 4.1 has also seen many improvements in performance and extensibility. The biggest change here is pluggability. Cassandra 4.1 now enables feature plug-ins for the database, allowing you to add capabilities and features without changing the core code.

In practice, this allows you to make decisions on areas like data storage without affecting other services like networking or node coordination. One of the first examples of this came at Instagram, where the team added support for RocksDB as a storage engine for more efficient storage. This worked really well as a one-off, but the team at Instagram had to support it themselves. The community decided that this idea of supporting a choice in storage engines should be built into Cassandra itself.

By supporting different storage or memtable options, Cassandra allows users to tune their database to the types of queries they want to run and how they want to implement their storage as part of Cassandra. This can also support more long-lived or persistent storage options. Another area of choice given to operators is how Cassandra 4.1 now supports pluggable schema. Previously, cluster schema was stored in system tables alone. In order to support more global coordination in deployments like Kubernetes, the community added external schema storage such as etcd.

Cassandra also now supports more options for network encryption and authentication. Cassandra 4.1 removes the need to have SSL certificates co-located on the same node, and instead you can use external key providers like HashiCorp Vault. This makes it easier to manage large deployments with lots of developers. Similarly, adding more options for authentication makes it easier to manage at scale.

There are some other new features, like new SSTable identifiers, which will make managing and backing up multiple SSTables easier, while Partition Denylists will make it easier to either allow operators full access to entire datasets or to reduce the availability of that data to set areas to ensure performance is not affected.

The future for Cassandra is full ACID

One of the things that has always counted against Cassandra in the past is that it did not fully support ACID (atomic, consistent, isolated, durable) transactions. The reason for this is that it was hard to get consistent transactions in a fully distributed environment and still maintain performance. From version 2.0, Cassandra used the Paxos protocol for managing consistency with lightweight transactions, which provided transactions for a single partition of data. What was needed was a new consensus protocol to align better with how Cassandra works.

Cassandra has filled this gap using Accord (PDF), a protocol that can complete consensus in one round trip rather than multiple transactions, and that can achieve this without leader failover mechanisms. Heading toward Cassandra 5.0, the aim is to deliver ACID-compliant transactions without sacrificing any of the capabilities that make Cassandra what it is today. To make this work in practice, Cassandra will support both lightweight transactions and Accord, and make more options available to users based on the modular approach that is in place for other features.

Cassandra was built to meet the needs of internet companies. Today, every company has similarly large-scale data volumes to deal with, the same challenges around distributing their applications for resilience and availability, and the same desire to keep growing their services quickly. At the same time, Cassandra must be easier to use and meet the needs of today’s developers. The community’s work for this update has helped to make that happen. We hope to see you at the upcoming Cassandra Summit where all of these topics will be discussed and more!

Patrick McFadin is vice president of developer relations at DataStax.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Posted Under: Database
Ballerina: A programming language for the cloud

Posted by on 8 March, 2023

This post was originally published on this site

Ballerina, which is developed and supported by WSO2, is billed as “a statically typed, open-source, cloud-native programming language.” What is a cloud-native programming language? In the case of Ballerina, it is one that supports networking and common internet data structures and that includes interfaces to a large number of databases and internet services. Ballerina was designed to simplify the development of distributed microservices by making it easier to integrate APIs, and to do so in a way that will feel familiar to C, C++, C#, and Java programmers.

Essentially, Ballerina is a C-like compiled language that has features for JSON, XML, and tabular data with SQL-like language-integrated queries, concurrency with sequence diagrams and language-managed threads, live sequence diagrams synched to the source code, flexible types for use both inside programs and in service interfaces, explicit error handling and concurrency safety, and network primitives built into the language.

There are two implementations of Ballerina. The currently available version, jBallerina, has a toolchain implemented in Java, compiles to Java bytecode, runs on a Java virtual machine, and interoperates with Java programs. A newer, unreleased (and incomplete) version, nBallerina, cross-compiles to native binaries using LLVM and provides a C foreign function interface. jBallerina can currently generate GraalVM native images on an experimental basis from its CLI, and can also generate cloud artifacts for Docker and Kubernetes. Ballerina has interface modules for PostgreSQL, MySQL, Microsoft SQL Server, Redis, DynamoDB, Azure Cosmos DB, MongoDB, Snowflake, Oracle Database, and JDBC databases.

For development, Ballerina offers a Visual Studio Code plug-in for source and graphical editing and debugging; a command-line utility with several useful features; a web-based sandbox; and a REPL (read-evaluate-print loop) shell. Ballerina can work with OpenAPI, GraphQL schemas, and gRPC schemas. It has a module-sharing platform called Ballerina Central, and a large library of examples. The command-line utility provides a build system and a package manager, along with code generators and the interactive REPL shell.

Finally, Ballerina offers integration with Choreo, WSO2’s cloud-hosted API management and integration solution, for observability, CI/CD, and devops, for a small fee. Ballerina itself is free open source.

Ballerina Language

The Ballerina Language combines familiar elements from C-like languages with unique features. For an example using familiar elements, here’s a “Hello, World” program with variables:

import ballerina/io;
string greeting = "Hello";
public function main() {
    string name = "Ballerina";
    io:println(greeting, " ", name);
}

Both int and float types are signed 64-bit in Ballerina. Strings and identifiers are Unicode, so they can accommodate many languages. Strings are immutable. The language supports methods as well as functions, for example:

// You can have Unicode identifiers.
function พิมพ์ชื่อ(string ชื่อ) {
    // Use u{H} to specify character using Unicode code point in hex.
   io:println(ชื่u{E2D});
}
string s = "abc".substring(1, 2);
int n = s.length();

In Ballerina, nil is the name for what is normally called null. A question mark after the type makes it nullable, as in C#. An empty pair of parentheses means nil.

int? v = ();

Arrays in Ballerina use square brackets:

int[] v = [1, 2, 3];

Ballerina maps are associative key-value structures, similar to Python dictionaries:

map<int> m = {
    "x": 1,
    "y": 2
};

Ballerina records are similar to C structs:

record { int x; int y; } r = {
    x: 1,
    y: 2
};

You can define named types and records in Ballerina, similar to C typedefs:

type MapArray map<string>[];
MapArray arr = [
    {"x": "foo"},
    {"y": "bar"}
];
type Coord record {
    int x;
    int y;
};

You can create a union of multiple types using the | character:

type flexType string|int;
flexType a = 1;
flexType b = "Hello";

Ballerina doesn’t support exceptions, but it does support errors. The check keyword is a shorthand for returning if the type is error:

function intFromBytes(byte[] bytes) returns int|error {
    string|error ret = string:fromBytes(bytes);
    if ret is error {
        return ret;
    } else {
        return int:fromString(ret);
    }
}

This is the same function using check instead of if ret is error { return ret:

function intFromBytes(byte[] bytes) returns int|error {
    string str = check string:fromBytes(bytes);
    return int:fromString(str);
}

You can handle abnormal errors and make them fatal with the panic keyword. You can ignore return values and errors using the Python-like underscore _ character.

Ballerina has an any type, classes, and objects. Object creation uses the new keyword, like Java. Ballerina’s enum types are shortcuts for unions of string constants, unlike C. The match statement is like the switch case statement in C, only more flexible. Ballerina allows type inference to a var keyword. Functions in Ballerina are first-class types, so Ballerina can be used as a functional programming language. Ballerina supports asynchronous programming with the start, future, wait, and cancel keywords; these run in strands, which are logical threads.

Ballerina provides distinctive network services, tables and XML types, concurrency and transactions, and various advanced features. These are all worth exploring carefully; there’s too much for me to summarize here. The program in the image below should give you a feel for some of them.

ballerina 01 IDG

This example on the Ballerina home page shows the code and sequence diagram for a program that pulls GitHub issues from a repository and adds each issue as a new row to a Google Sheet. The code and diagram are linked; a change to one will update the other. The access tokens need to be filled in at the question marks before the program can run, and the ballerinax/googleapis.sheets package needs to be pulled from Ballerina Central, either using the “Pull unresolved modules” code action in VS Code or using the bal pull command from the CLI.

Ballerina standard libraries and extensions

There are more than a thousand packages in the Ballerina Central repository. They include the Ballerina Standard Library (ballerina/*), Ballerina-written extensions (ballerinax/*), and a few third-party demos and extensions.

The standard library is documented here. The Ballerina-written extensions tend to be connectors to third-party products such as databases, observability systems, event streams, and common web APIs, for example GitHub, Slack, and Salesforce.

Anyone can create an organization and publish (push) a package to Ballerina Central. Note that all packages in this repository are public. You can of course commit your code to GitHub or another source code repository, and control access to that.

Installing Ballerina

You can install Ballerina by downloading the appropriate package for your Windows, Linux, or macOS system and then running the installer. There are additional installation options, including building it from the source code. Then run bal version from the command line to verify a successful installation.

In addition, you should install the Ballerina extension for Visual Studio Code. You can double-check that the extension installed correctly in VS Code by running View -> Command Palette -> Ballerina. You should see about 20 commands.

The bal command line

The bal command-line is a tool for managing Ballerina source code that helps you to manage Ballerina packages and modules, test, build, and run programs. It also enables you to easily install, update, and switch among Ballerina distributions. See the screen shot below, which shows part of the output from bal help, or refer to the documentation.

ballerina bal help lg IDG

bal help shows the various subcommands available from the Ballerina command line. The commands include compilation, packaging, scaffolding and code generation, and documentation generation.

Ballerina Examples

Ballerina has, well, a lot of examples. You can find them in the Ballerina by Example learning page, and also in VS Code by running the Ballerina: Show Examples command. Going through the examples is an alternate way to learn Ballerina programming; it’s a good supplement to the tutorials and documentation, and supports unstructured discovery as well as deliberate searches.

One caution about the examples: Not all of them are self-explanatory, as though an intern who knew the product wrote them without thinking about learners or having any review by naive users. On the other hand, many are self-explanatory and/or include links to the relevant documentation and source code.

For instance, in browsing the examples I discovered that Ballerina has a testing framework, Testarina, which is defined in the module ballerina/test. The test module defines the necessary annotations to construct a test suite, such as @test:Config {}, and the assertions you might expect if you’re familiar with JUnit, Rails unit tests, or any similar testing frameworks, for example the assertion test:assertEquals(). The test module also defines ways to specify setup and teardown functions, specify mock functions, and establish test dependencies.

ballerina examples IDG

Ballerina Examples, as viewed from VS Code’s Ballerina: Show Examples command. Similar functionality is available online.

Overall, Ballerina is a useful and feature-rich programming language for its intended purpose, which is cloud-oriented programming, and it is free open source. It doesn’t produce the speediest runtime modules I’ve ever used, but that problem is being addressed, both by experimental GraalVM native images and the planned nBallerina project, which will compile to native code.

At this point, Ballerina might be worth adopting for internal projects that integrate internet services and don’t need to run fast or be beautiful. Certainly, the price is right.

Cost: Ballerina Platform and Ballerina Language: Free open source under the Apache License 2.0. Choreo hosting: $150 per component per month after five free components, plus infrastructure costs.

Platform: Windows, Linux, macOS; Visual Studio Code.

Posted Under: Tech Reviews
Dremio adds new Apache Iceberg features to its data lakehouse

Posted by on 2 March, 2023

This post was originally published on this site

Dremio is adding new features to its data lakehouse including the ability to copy data into Apache Iceberg tables and roll back changes made to these tables.  

Apache Iceberg is an open-source table format used by Dremio to store analytic data sets.  

In order to copy data into Iceberg tables, enterprises and developers have to use the new “copy into SQL” command, the company said.

“With one command, customers can now copy data from CSV and JSON file formats stored in Amazon S3, Azure Data Lake Storage (ADLS), HDFS, and other supported data sources into Apache Iceberg tables using the columnar Parquet file format for performance,” Dremio said in an announcement Wednesday.

The copy operation is distributed across the entire, underlying lake house engine to load more data quickly, it added.

The company has also introduced a table rollback feature for enterprises, akin to a Windows system restore backup or a Mac Time Machine backup.

The tables can be backed up either to a specific time or a snapshot ID, the company said, adding that developers will have to make use of the “rollback” command to access the feature.

“The rollback feature makes it easy to revert a table back to a previous state with a single command. When rolling back a table, Dremio will create a new Apache Iceberg snapshot from the prior state and use it as the new current table state,” Dremio said.

Optimize command boosts Iceberg performance

In an effort to increase the performance of Iceberg tables, Dremio has introduced the “optimize” command to consolidate and optimize sizes of small files that are created when data manipulation commands such as insert, update, or delete are used.

“Often, customers will have many small files as a result of DML operations, which can impact read and write performance on that table and utilize excess storage,” the company said, adding that the “optimize” command can be used inside Dremio Sonar at regular intervals to maintain performance.

Dremio Sonar is a SQL engine that provides data warehousing capabilities to the company’s lakehouse.

The new features are expected to improve productivity of data engineers and system administrators while bringing utility to these class of users, said Doug Henschen, principal analyst at Constellation Research.

Dremio, which was an early proponent of Apache Iceberg tables in lakehouses, competes with the likes of Ahana and Starburst, both of which introduced support for Iceberg in 2021.

Other vendors such as Snowflake and Cloudera added support for Iceberg in 2022.

Dremio features new database, BI connectors

In addition to the new features, Dremio said that it was launching new connectors for Microsoft PowerBI, Snowflake and IBM Db2.

“Customers using Dremio and PowerBI can now use single sign-on (SSO) to access their Dremio Cloud and Dremio Software engines from PowerBI, simplifying access control and user management across their data architecture,” the company said.

The Snowflake and IBM DB2 connectors will allow enterprises to add Snowflake data warehouses and IBM DB2 databases as data sources for Dremio, it added.

This makes it easy to include data in these systems as part of the Dremio semantic layer, enabling customers to explore this data in their Dremio queries and views.

The launch of these connectors, according to Henschen, brings more plug-and-play options to analytics professionals from Dremio’s stable.

Posted Under: Database
Next-gen data engines transform metadata performance

Posted by on 2 March, 2023

This post was originally published on this site

The rapid growth of data-intensive use cases such as simulations, streaming applications (like IoT and sensor feeds), and unstructured data has elevated the importance of performing fast database operations such as writing and reading data—especially when those applications begin to scale. Almost any component in a system can potentially become a bottleneck, from the storage and network layers through the CPU to the application GUI.

As we discussed in “Optimizing metadata performance for web-scale applications,” one of the main reasons for data bottlenecks is the way data operations are handled by the data engine, also called the storage engine—the deepest part of the software stack that sorts and indexes data. Data engines were originally created to store metadata, the critical “data about the data” that companies utilize for recommending movies to watch or products to buy. This metadata also tells us when the data was created, where exactly it’s stored, and much more.

Inefficiencies with metadata often surface in the form of random read patterns, slow query performance, inconsistent query behavior, I/O hangs, and write stalls. As these problems worsen, issues originating in this layer can begin to trickle up the stack and show to the end user, where they can show in form of slow reads, slow writes, write amplification, space amplification, inability to scale, and more.

New architectures remove bottlenecks

Next-generation data engines have emerged in response to the demands of low-latency, data-intensive workloads that require significant scalability and performance. They enable finer-grained performance tuning by adjusting three types of amplification, or writing and re-writing of data, that are performed by the engines: write amplification, read amplification, and space amplification. They also go further with additional tweaks to how the engine finds and stores data.

Speedb, our company, architected one such data engine as a drop-in replacement for the de facto industry standard, RocksDB. We open sourced Speedb to the developer community based on technology delivered in an enterprise edition for the past two years.

Many developers are familiar with RocksDB, a ubiquitous and appealing data engine that is optimized to exploit many CPUs for IO-bound workloads. Its use of an LSM (log-structured merge) tree-based data structure, as detailed in the previous article, is great for handling write-intensive use cases efficiently. However, LSM read performance can be poor if data is accessed in small, random chunks, and the issue is exacerbated as applications scale, particularly in applications with large volumes of small files, as with metadata.

Speedb optimizations

Speedb has developed three techniques to optimize data and metadata scalability—techniques that advance the state of the art from when RocksDB and other data engines were developed a decade ago.

Compaction

Like other LSM tree-based engines, RocksDB uses compaction to reclaim disk space, and to remove the old version of data from logs. Extra writes eat up data resources and slow down metadata processing, and to mitigate this, data engines perform the compaction. However, the two main compaction methods, leveled and universal, impact the ability of these engines to effectively handle data-intensive workloads.

A brief description of each method illustrates the challenge. Leveled compaction incurs very small disk space overhead (the default is about 11%). However, for large databases it comes with a huge I/O amplification penalty. Leveled compaction uses a “merge with” operation. Namely, each level is merged with the next level, which is usually much larger. As a result, each level adds a read and write amplification that is proportional to the ratio between the sizes of the two levels.

Universal compaction has a smaller write amplification, but eventually the database needs full compaction. This full compaction requires space equal or larger than the whole database size and may stall the processing of new updates. Hence universal compaction cannot be used in most real-time high performance applications.

Speedb’s architecture introduces hybrid compaction, which reduces write amplification for very large databases without blocking updates and with small overhead in additional space. The hybrid compaction method works like universal compaction on all the higher levels, where the size of the data is small relative to the size of the entire database, and works like leveled compaction only in the lowest level, where a significant portion of the updated data is kept.

Memtable testing (Figure 1 below) shows a 17% gain in overwrite and 13% gain in mixed read and write workloads (90% reads, 10% writes). Separate bloom filter tests results show a 130% improvement in read misses in a read random workload (Figure 2) and a 26% reduction in memory usage (Figure 3).

Tests run by Redis demonstrate increased performance when Speedb replaced RocksDB in the Redis on Flash implementation. Its testing with Speedb was also agnostic to the application’s read/write ratio, indicating that performance is predictable across multiple different applications, or in applications where the access pattern varies over time.

speedb memtable 01 Speedb

Figure 1. Memtable testing with Speedb.

speedb bloomfilter read misses 02 Speedb

Figure 2. Bloom filter testing using a read random workload with Speedb.

speedb bloomfilter readrandom 03 Speedb

Figure 3. Bloom filter testing showing reduction in memory usage with Speedb.

Memory management

The memory management of embedded libraries plays a crucial role in application performance. Current solutions are complex and have too many intertwined parameters, making it difficult for users to optimize them for their needs. The challenge increases as the environment or workload changes.

Speedb took a holistic approach when redesigning the memory management in order to simplify the use and enhance resource utilization.

A dirty data manager allows for an improved flush scheduler, one that takes a proactive approach and improves the overall memory efficiency and system utilization, without requiring any user intervention.

Working from the ground up, Speedb is making additional features self-tunable to achieve performance, scale, and ease of use for a variety of use cases.

Flow control

Speedb redesigns RocksDB’s flow control mechanism to eliminate spikes in user latency. Its new flow control mechanism changes the rate in a manner that is far more moderate and more accurately adjusted for the system’s state than the old mechanism. It slows down when necessary and speeds up when it can. By doing so, stalls are eliminated, and the write performance is stable.

When the root cause of data engine inefficiencies is buried deep in the system, finding it might be a challenge. At the same time, the deeper the root cause, the greater the impact on the system. As the old saying goes, a chain is only as strong as its weakest link.

Next-generation data engine architectures such as Speedb can boost metadata performance, reduce latency, accelerate search time, and optimize CPU consumption. As teams expand their hyperscale applications, new data engine technology will be a critical element to enabling modern-day architectures that are agile, scalable, and performant.

Hilik Yochai is chief science officer and co-founder of Speedb, the company behind the Speedb data engine, a drop-in replacement for RocksDB, and the Hive, Speedb’s open-source community where developers can interact, improve, and share knowledge and best practices on Speedb and RocksDB. Speedb’s technology helps developers evolve their hyperscale data operations with limitless scale and performance without compromising functionality, all while constantly striving to improve the usability and ease of use.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Posted Under: Database
EDB’s Postgres Distributed 5.0 boosts availability, performance

Posted by on 1 March, 2023

This post was originally published on this site

Database-as-a-service provider EnterpriseDB (EDB) has released the next generation of its popular distributed open source PostgreSQL database, dubbed EDB Postgres Distributed 5.0, designed to offer high availability, optimized performance and protection against data loss.

In contrast to its PostgreSQL14 offering, EDB’s Postgres Distributed 5.0 (PGD 5.0) offers a distributed architecture along with features such as logical replication.

In PGD 5.0 architecture, a node or database is a member of at least one node or database group and the most basic system would have a single node group for an entire cluster.

“Each node (database) participating in a PGD group both receives changes from other members and can be written to directly by the user,” the company said in a blog post.

“This is distinct from hot or warm standby, where only one master server accepts writes, and all the other nodes are standbys that replicate either from the master or from another standby,” the company added.

In order to enable high availability, enterprises can set up a PGD 5.0 system in such a way that each master node or database or server can be protected by one or more standby nodes, the company said.

“The group is the basic building block consisting of 2+ nodes (servers). In a group, each node is in a different availability zone, with dedicated router and backup, giving immediate switchover and high availability. Each group has a dedicated replication set defined on it. If the group loses a node, you can easily repair or replace it by copying an existing node from the group,” the company said.

This means that one node is the target for the main application and the other nodes are in shadow mode, meaning they are performing the read-write replica function.

This architectural setup allows faster performance as the main write function is occurring in one node, the company said, adding that “secondary applications might execute against the shadow nodes, although these are reduced or interrupted if the main application begins using that node.”

“In the future, one node will be elected as the main replicator to other groups, limiting CPU overhead of replication as the cluster grows and minimizing the bandwidth to other groups,” the company said. 

Data protection is key

As enterprises generate an increasing amount of data, downtime of IT infrastructure can cause serious damage to enterprises. In addition, data center breaches are becoming more commonplace and a report from Uptime Institute’s 2022 Outage Analysis Report showed that 80% of data centers have experienced an outage in the past two years.

A separate report from IBM showed that data breaches have become very costly to deal with.

The distributed version of EDB’s object-relational database system, which competes with the likes of Azure’s Cosmos DB with Citius integration, is available as an add-on, dubbed EDB Extreme High Availability, for EDB Enterprise and Standard Plans, the company said.

In addition, EDB said that it will release the distributed version to all its managed database-as-a-service offerings including the Oracle-compatible BigAnimal and the AWS-compatible EDB Postgres Cloud Database Service.  

The company expects to offer a 60-day, self-guided trial for PGD 5.0 soon. The distributed version supports PostgreSQL, EDB Postgres Extended Server and EDB Postgres Advanced Server along with other version combinations.

Posted Under: Database
Page 4 of 7« First...23456...Last »

Social Media

Bulk Deals

Subscribe for exclusive Deals

Recent Post

Facebook

Twitter

Subscribe for exclusive Deals




Copyright 2015 - InnovatePC - All Rights Reserved

Site Design By Digital web avenue