All posts by Richy George

10 best practices for every MongoDB deployment

Posted by on 5 April, 2023

This post was originally published on this site

MongoDB is a non-relational document database that provides support for JSON-like storage. Its flexible data model allows you to easily store unstructured data. First released in 2009, it is the most commonly used NoSQL database. It has been downloaded more than 325 million times.

MongoDB is popular with developers because it is easy to get started with. Over the years, MongoDB has introduced many features that have turned the database into a robust solution able to store terabytes of data for applications.

As with any database, developers and DBAs working with MongoDB should look at how to optimize the performance of their database, especially nowadays with cloud services, where each byte processed, transmitted, and stored costs money. The ability to get started so quickly with MongoDB means that it is easy to overlook potential problems or miss out on simple performance improvements.

In this article, we’ll look at 10 essential techniques you can apply to make the most of MongoDB for your applications. 

MongoDB best practice #1: Enable authorization and authentication on your database right from the start

The bigger the database, the bigger the damage from a leak. There have been numerous data leaks due to the simple fact that authorization and authentication are disabled by default when deploying MongoDB for the first time. While it is not a performance tip, it is essential to enable authorization and authentication right from the start as it will save you any potential pain over time due to unauthorized access or data leakage.

When you deploy a new instance of MongoDB, the instance has no user, password, or access control by default. In recent MongoDB versions, the default IP binding changed to 127.0.0.1 and a localhost exception was added, which reduced the potential for database exposure when installing the database.

However, this is still not ideal from a security perspective. The first piece of advice is to create the admin user and restart the instance again with the authorization option enabled. This prevents any unauthorized access to the instance.

To create the admin user:

> use admin
switched to db admin
> db.createUser({
...   user: "zelmar",
...   pwd: "password",
...   roles : [ "root" ]
... })
Successfully added user: { "user" : "zelmar", "roles" : [ "root" ] }

Then, you need to enable authorization and restart the instance. If you are deploying MongoDB from the command line:

mongod --port 27017 --dbpath /data/db --auth

Or if you are deploying MongoDB using a config file, you need to include:

security:
    authorization: "enabled"

MongoDB best practice #2: Don’t use ‘not recommended versions’ or ‘end-of-life versions’ in production instances and stay updated

It should seem obvious, but one of the most common issues we see with production instances is due to developers running a MongoDB version that is actually not suitable for production in the first place. This might be due to the version being out of date, such as with a retired version that should be updated to a newer iteration that contains all the necessary bug fixes.

Or it might be due to the version being too early and not yet tested enough for production use. As developers, we are normally keen to use our tools’ latest and greatest versions. We also want to be consistent over all the stages of development, from initial build and test through to production, as this decreases the number of variables we have to support, the potential for issues, and the cost to manage all of our instances.

For some, this could mean using versions that are not signed off for production deployment yet. For others, it could mean sticking with a specific version that is tried and trusted. This is a problem from a troubleshooting perspective when an issue is fixed in a later version of MongoDB that is approved for production but has not been deployed yet. Alternatively, you might forget about that database instance that is “just working” in the background, and miss when you need to implement a patch.

In response to this, you should regularly check if your version is suitable for production using the release notes of each version. For example, MongoDB 5.0 provides the following guidance in its release notes: https://www.mongodb.com/docs/upcoming/release-notes/5.0/

mongodb warning IDG

The guidance here would be to use MongoDB 5.0.11 as this version has the required updates in place. If you don’t update to this version, you will run the risk of losing data.

While it might be tempting to stick with one version, keeping up with upgrades is essential to preventing problems in production. You may want to take advantage of newly added features, but you should put these features through your test process first. You want to see if they pose any problems that might affect your overall performance before moving them into production.

Lastly, you should check the MongoDB Software Lifecycle Schedules and anticipate the upgrades of your clusters before the end of life of each version: https://www.mongodb.com/support-policy/lifecycles

End-of-life versions do not receive patches, bug fixes, or any kind of improvements. This could leave your database instances exposed and vulnerable.

From a performance perspective, getting the right version of MongoDB for your production applications involves being “just right” — not so near the bleeding edge that you will encounter bugs or other problems, but also not so far behind that you will miss out on vital updates.

MongoDB best practice #3: Use MongoDB replication to ensure HA and check the status of your replica often

A replica set is a group of MongoDB processes that maintains the same data on all of the nodes used for an application. It provides redundancy and data availability for your data. When you have multiple copies of your data on different database servers—or even better, in different data centers around the world—replication provides a high level of fault tolerance in case of a disaster.

MongoDB replica sets work with one writer node (also called the primary server). The best practice recommendation is to always have an odd number of members. Traditionally, replica sets have at least three instances:

  • Primary (writer node)
  • Secondary (reader node)
  • Secondary (reader node)

All of the nodes of the replica set will work together, as the primary node will receive the writes from the app server, and then the data will be copied to the secondaries. If something happens to the primary node, the replica set will elect a secondary as the new primary. To make this process work more efficiently and ensure a smooth failover, it is important for all the nodes of the replica set to have the same hardware configuration. Another advantage of the replica set is that it is possible to send read operations to the secondary servers, increasing the read scalability of the database.

After you deploy a replica set to production, it is important to check the health of the replica and the nodes. MongoDB has two important commands for this purpose:

  • rs.status() provides information on the current status of the replica set, using data derived from the heartbeat packets sent by the other members of the replica set. It’s a very useful tool for checking the status of all the nodes in a replica set.
  • rs.printSecondaryReplicationInfo() provides a formatted report of the status of the replica set. It’s very useful to check if any of the secondaries are behind the primary on data replication, as this would affect your ability to recover all your data in the event of something going wrong. If secondaries are too far behind the primary, then you could end up losing a lot more data than you are comfortable with.

However, note that these commands provide point-in-time information rather than continuous monitoring for the health of your replica set. In a real production environment, or if you have many clusters to check, running these commands could become time-consuming and annoying. Therefore we recommend using a monitoring system like Percona PMM to keep an eye on your clusters.

MongoDB best practice #4: Use $regex queries only when necessary and choose text search instead where you can

Sometimes the simplest way to search for something in a database is to use a regular expression or $regex operation. Many developers choose this option but in fact using regular expressions can harm your search operations at scale. You should avoid the use of $regex queries especially when your database is big.

A $regex query consumes a lot of CPU time and it will normally be extremely slow and inefficient. Creating an index doesn’t help much and sometimes the performance is worse with indexes than without them.

For example, let’s run a $regex query on a collection of 10 million documents and use .explain(true) to view how many milliseconds the query takes.

Without an index:

> db.people.find({"name":{$regex: "Zelmar"}}).explain(true)
- -   Output omitted  - -
"executionStats" : {
                "nReturned" : 19851,
                "executionTimeMillis" : 4171,
                "totalKeysExamined" : 0,
                "totalDocsExamined" : 10000000,
- -   Output omitted  - -

And if we created an index on “name”:

db.people.find({"name":{$regex: "Zelmar"}}).explain(true)
- -   Output omitted  - -
  "executionStats" : {
                "nReturned" : 19851,
                "executionTimeMillis" : 4283,
                "totalKeysExamined" : 10000000,
                "totalDocsExamined" : 19851,
- -   Output omitted  - -

We can see in this example that the index didn’t help to improve the $regex performance.

It’s common to see a new application using $regex operations for search requests. This is because neither the developers nor the DBAs notice any performance issues in the beginning when the size of the collections is small and the users of the application are very few.

However, when the collections become bigger and the application gathers more users, the $regex operations start to slow down the cluster and become a nightmare for the team. Over time, as your application scales and more users want to carry out search requests, the level of performance can drop significantly.

Rather than using $regex queries, use text indexes to support your text search. Text search is more efficient than $regex but requires you to add text indexes to your data sets in advance. The indexes can include any field whose value is a string or an array of string elements. A collection can have only one text search index, but that index can cover multiple fields.

Using the same collection as the example above, we can test the execution time of the same query using text search:

> db.people.find({$text:{$search: "Zelmar"}}).explain(true)
- -   Output omitted  - -
"executionStages" : {
                         "nReturned" : 19851,
                        "executionTimeMillisEstimate" : 445,
                        "works" : 19852,
                        "advanced" : 19851,
- -   Output omitted  - - 

In practice, the same query took four seconds less using text search than using $regex. Four seconds in “database time,” let alone online application time, is an eternity.

To conclude, if you can solve the query using text search, do so. Restrict $regex queries to those use cases where they are really necessary.

MongoDB best practice #5: Think wisely about your index strategy

Putting some thought into your queries at the start can have a massive impact on performance over time. First, you need to understand your application and the kinds of queries that you expect to process as part of your service. Based on this, you can create an index that supports them.

Indexing can help to speed up read queries, but it comes with an extra cost of storage and they will slow down write operations. Consequently, you will need to think about which fields should be indexed so you can avoid creating too many indexes.

For example, if you are creating a compound index, following the ESR (Equality, Sort, Range) rule is a must, and using an index to sort the results improves the speed of the query.

Similarly, you can always check if your queries are really using the indexes that you have created with .explain(). Sometimes we see a collection with indexes created, but the queries either don’t use the indexes or instead use the wrong index entirely. It’s important to create only the indexes that will actually be used for the read queries. Having indexes that will never be used is a waste of storage and will slow down write operations.

When you look at the .explain() output, there are three main fields that are important to observe. For example:

keysExamined:0
docsExamined:207254
nreturned:0

In this example, no indexes are being used. We know this because the number of keys examined is 0 while the number of documents examined is 207254. Ideally, the query should have the ratio nreturned/keysExamined=1. For example:

keysExamined:5
docsExamined: 0
nreturned:5

Finally, if .explain()shows you that a particular query is using an index that is wrong, you can force the query to use a particular index with .hint(). Calling the .hint() method on a query overrides MongoDB’s default index selection and query optimization process, allowing you to specify the index that is used, or to carry out a forward collection or reverse collection scan.

MongoDB best practice #6: Check your queries and indexes frequently

1

2



Page 2

Every database is unique and particular to its application, and so is the way it grows and changes over time. Nobody knows how an application will grow over months and years or how the queries will change. Whatever assumptions you make, your prediction will inevitably be wrong, so it is essential to check your database and indexes regularly.

For example, you might plan a specific query optimization approach and a particular index, but realize after one year that few queries are using that index and it’s no longer necessary. Continuing with this approach will cost you more in storage while not providing any improvements in application performance.

For this reason, it’s necessary to carry out query optimizations and look at the indexes for each collection frequently.

MongoDB has some tools to do query optimization such as the database profiler or the .explain() method. We recommend using them to find which queries are slow, how the indexes are being used by the queries, and where you may need to improve your optimizations. In addition to removing indexes that are not used efficiently, look out for duplicate indexes that you don’t need to run.

At Percona, we use scripts to check if there are duplicate indexes or if there are any indexes that are not being used. You can find them in our repository on GitHub: https://github.com/percona/support-snippets/tree/master/mongodb/scripts

Similarly, you might consider how many results you want to get from a query, as providing too many results can impact performance. Sometimes you only need the first five results of a query, rather than tens or hundreds of responses. In those cases, you can limit the number of query results with .limit().

Another useful approach is to use projections to get only the necessary data. If you need only one field of the document, use a projection instead of retrieving the entire document, and then filter on the app side.

Lastly, if you need to order the results of a query, be sure that you are using an index and taking advantage of it to improve your efficiency.

MongoDB best practice #7: Don’t run multiple mongod or mongos instances on the same server

Even if it’s possible to run multiple mongod or mongos instances on the same server, using different processes and ports, we strongly recommend not doing this.

When you run multiple mongod or mongos processes on the same server, it becomes very difficult to monitor them and the resources they are consuming (CPU, RAM, network, etc.). Consequently, when there is a problem, it becomes extremely difficult to find out what is going on and get to the root cause of the issue.

We see a lot of cases where customers have experienced a resource problem on the server, but because they are running multiple instances of mongod or mongos, even discovering which specific process has the problem is difficult. This makes troubleshooting the problem extremely challenging.

Similarly, in some cases where developers have implemented a sharded cluster to scale up their application data, we have seen multiple shards running on the same server. In these circumstances, the router will send a lot of queries to the same node, overloading the node and leading to poor performance—the exact opposite of what the sharding strategy wants to achieve.

The worst case scenario here involves replica sets. Imagine running a replica set for resiliency and availability, and then discovering that two or more members of the replica set are running on the same server. This is a recipe for disaster and data loss. Rather than architecting your application for resiliency, you will have made the whole deployment more likely to fail.

MongoDB best practice #8: Back up frequently

So, you have a cluster with replication, but do you want to sleep better? Run backups of your data frequently. Frequent backups allow you to restore the data from an earlier moment if you need to recover from an unplanned event.

There are a number of different options for backing up your MongoDB data:

Mongodump / Mongorestore

Mongodump reads data from MongoDB and creates a BSON file that Mongorestore can use to populate a MongoDB database. These provide efficient tools for backing up small MongoDB deployments. On the plus side, you can select a specific database or collection to back up efficiently, and this approach doesn’t require stopping writes on the node. However, this approach doesn’t backup any indexes you have created, so when restoring, you would need to re-create those indexes again. Logical backups are in general, very slow and time-consuming, so you would have to factor that time into your restore process. Lastly, this approach is not recommended for sharded clusters that are more complex deployments.

Percona Backup for MongoDB

Percona Backup for MongoDB is an open-source, distributed and low-impact solution for consistent backups of MongoDB sharded clusters and replica sets. It enables Backups for MongoDB servers, replica sets, and sharded clusters. It can support logical, physical and point in time recovery backups, and backup to anywhere, including AWS S3, Azure or filesystem storage types.

However, it does require initial setup and configuration on all the nodes that you would want to protect.

Physical / file system backups

You can create a backup of a MongoDB deployment by making a copy of MongoDB’s underlying data files. You can use different methods for this type of backup, from manually copying the data files, to Logical Volume Management (LVM) snapshots, to cloud-based snapshots. These are usually faster than logical backups and they can be copied or shared to remote servers. This approach is especially recommended for large data sets, and it is convenient when building a new node on the same cluster.

On the downside, you cannot select a specific database or collection when restoring, and you cannot do incremental backups. Further, running a dedicated node is recommended for taking the backup as it requires halting writes, which impacts application performance.

MongoDB best practice #9: Know when to shard your replica set and choose a shard key carefully

Sharding is the most complex architecture you can deploy with MongoDB.

As your database grows, you will need to add more capacity to your server. This can involve adding more RAM, more I/O capacity, or even more powerful CPUs to handle processing. This is called vertical scaling.

However, if your database grows so much that it outstrips the capacity of a single machine, then you may have to split the workload up. There are several reasons that might lead to this. For instance, there may not be a physical server large enough to handle the workload, or the server instance would cost so much that it would be unaffordable to run. In these circumstances, you need to start thinking about horizontal scaling.

Horizontal scaling involves dividing the database over multiple servers and adding additional servers to increase capacity as required. For MongoDB, this process is called sharding and it relies on a sharding key to manage how workloads are split up across machines. 

Choosing a sharding key may be the most difficult task you will face when managing MongoDB. It’s necessary to study the datasets and queries and plan ahead before choosing the key, because it’s very difficult to revert the shard once it has been carried out. For MongoDB 4.2 and earlier versions, assigning a shard key is a one-way process that cannot be undone. For MongoDB 4.4 and later, it is possible to refine a shard key, while MongoDB 5.0 and above allow you to change the shard key with the reshardCollection command.

If you choose a bad shard key, then a large percentage of documents may go to one of the shards and only a few to another. This will make the sharded cluster unbalanced, which will affect performance over time. An unbalanced cluster typically happens when a key that grows monotonically is chosen to shard a collection, as all the files over a given value would go to one shard rather than being distributed evenly.

Alongside looking at the value used to shard data, you will also need to think about the queries that will take place across the shard. The queries must use the shard key so that the mongos process distributes the queries across the sharded cluster. If the query doesn’t use the shard key, then mongos will send the query to every shard of the cluster, affecting performance and making the sharding strategy inefficient.

MongoDB best practice #10: Don’t throw money at the problem

Last but not least, it’s common to see teams throwing money at the problems they have with their databases. However, instead of immediately reaching for the credit card, first try to think laterally and imagine a better solution.

Adding more RAM, adding more CPU, or moving to a larger instance or a bigger machine can overcome a performance problem. However, doing so without first analyzing the underlying problem or the bottleneck can lead to more of the same kinds of problems in the future. In most cases, the answer is not spending more money on resources, but looking at how to optimize your implementation for better performance at the same level.

Although cloud services make it easy to scale up instances, the costs of inefficiency can quickly mount up. Worse, this is an ongoing expense that will carry on over time. By looking at areas like query optimization and performance first, it’s possible to avoid additional spend. For some of the customers we have worked with, the ability to downgrade their EC2 instances saved their companies a lot of money in monthly charges.

As a general recommendation, adopt a cost-saving mindset and, before adding hardware or beefing up cloud instances, take your time to analyze the problem and think of a better solution for the long term.

Zelmar Michelini is a database engineer at Percona, a provider of open source database software, support, and services for MongoDB, MySQL, and PostgreSQL databases.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Posted Under: Database
Databricks launches lakehouse for manufacturing sector

Posted by on 4 April, 2023

This post was originally published on this site

Databricks on Tuesday announced an industry-specific data lakehouse for the manufacturing sector, in an effort to surpass its data lake and data warehouse rivals.

data lakehouse is a data architecture that offers storage and analytics capabilities, in contrast to data lakes, which store data in native format, and data warehouses, which store structured data (often in SQL format).

Dubbed Databricks Lakehouse for Manufacturing, the new service offers capabilities for predictive maintenance, digital twins, supply chain optimization, demand forecasting, real-time IoT analytics, computer vision, and AI, along with data governance and data sharing tools.

“The Lakehouse for Manufacturing includes access to packaged use case accelerators that are designed to jumpstart the analytics process and offer a blueprint to help organizations tackle critical, high-value industry challenges,” the company said in a statement.

To help assist users of the new manufacturing lakehouse, Databricks is providing partner-supported services and tools such as database migration, data management, data intelligence, revenue growth management, financial services, and cloud data migration under the aegis of what the company calls Brickbuilder Solutions.

These partners include Accenture, Avanade, L&T Mindtree, Wipro, Infosys, Capgemini, Deloitte, Tredence, Lovelytics, and Cognizant.

Databricks’ Lakehouse for Manufacturing has been adopted by enterprises such as DuPont, Honeywell, Rolls-Royce, Shell, and Tata Steel, the company said.

Industry-specific lakehouse to aid data managers

Databricks’ new Lakehouse for Manufacturing is expected to have a positive impact on data managers or data engineers, according to IDC Research Vice President Carl Olofson.

The lakehouse offering will make it easy for data managers to coordinate data across data lake and data warehouse environments, ensuring data consistency, timeliness, and trustworthiness, Olofson said.

Other analysts feel the offering will also help data science teams across enterprises.

“It helps data science teams skip a step by having preconfigured analytics rather than a blank slate to start from,” said Tony Baer, principal analyst at dbInsights.

Databricks is in a better position to deliver advanced data science capabilities when compared to other offerings from rivals, according to Doug Henschen, principal analyst at Constellation Research.

“That’s certainly evident in this Databricks Lakehouse for Manufacturing, which includes support for digital twins, predictive maintenance, part-level forecasting and computer vision,” Henschen said.

Lakehouse for Manufacturing aimed at accelerating adoption

The Lakehouse for Manufacturing offering from Databricks is aimed at accelerating the adoption of the company’s lakehouse offerings and increasing the “stickiness” of other services, according to Olofson.

“Lakehouse is still a new and somewhat amorphous concept. Databricks is trying to accelerate adoption by offering industry-specific lakehouses. These are really what you might call ‘starter kits’ since the guts of any lakehouse are specific to what data the company has and how it is to be put together,” Olofson said.

Providing such kits, or what IBM used to call, “patterns,” according to Olofson, is meant to jumpstart the use of lakehouses by offering enterprises a partially complete set of functionality that users can finish with company-specific definitions and rules.

“This is a well-worn approach in software when seeking to sell products that are complex or multifunctional, since customers often don’t know how to get started. If Databricks can win over customers with these lakehouse offerings, they will get a measure of stickiness that should ensure that the customer will remain loyal for a while,” Olofson added.

The launch of industry-specific warehouses was prompted by a mix of the company’s internal priorities, which include factors such as considering which sectors have the biggest potential for Databricks’ offerings, and industry-specific demand, Constellation Research’s Henschen said.

“I suspect that the company launched a lakehouse for the manufacturing sector as the next one in line after having already introduced similar offerings for retail, financial services, healthcare and life sciences, and media and entertainment last year,” Henschen said.

The launch of the industry-specific lakehouse is aimed at lowering the barrier to lakehouse adoption by adding capabilities such as prebuilt analytic patterns that would help enterprises jumpstart their journeys, Baer said.

Databricks versus Snowflake

Databricks, which competes with Snowflake, Starburst, Dremio, Google Cloud, AWS, Oracle, and HPE, has timed its industry-specific lakehouse announcements to be competitive with Snowflake, experts said.

“The announcements are very similar to that of Snowflake and there is an element of competitive gamesmanship in the timing of announcements as well,” Henschen said, adding that Snowflake might have a head start as it kicked off its industry cloud announcements in 2021 with media and financial services cloud offerings.

However, there seems to be a difference in the approach between Snowflake and Databricks in terms of how they speak about their product offerings.

“Snowflake does not use the term ‘lakehouse’ in their materials although they say that data lake workloads are supported by them. Their core technology is a cloud-based data warehouse relational database management system (RDBMS), with extensions that support semistructured and unstructured data as well as data in common storage formats such as Apache Iceberg,” Olofson said, adding that Snowflake too offers industry-specific configurations.

Analysts said it was too early to gauge any changes in marketshare arising from these industry-specific offerings.

“I’d say it’s still early days for combined lakehouses to be displacing incumbents. Databricks customers may be running more SQL analytic workloads on Databricks than in the past, but I don’t see it displacing incumbents in support of high-scale, mission-critical workloads,” Henschen said.

“Similarly, it’s early days for Snowflake Snowpark, and I don’t see customers choosing Snowflake as a platform for hardcore data science needs. Best-of-breed is still winning for each respective need,” Henschen added.

Posted Under: Database
MariaDB SkySQL adds serverless analytics, cost management features

Posted by on 30 March, 2023

This post was originally published on this site

MariaDB is adding features such as serverless analytics and cost management to the new release of its managed database-as-a-service (DBaaS) SkySQL, it said Thursday.

SkySQL, which is a managed instance of the MariaDB platform, offers OLAP (online analytical processing) and OLTP (online transaction processing) along with enterprise features like sharding, load balancing, and auto-failover via a combination of MariaDB Xpand, MariaDB Enterprise Server, and MariaDB ColumnStore.

In order to help enterprises bring down the cost of databases and to manage expenditure better, MariaDB has introduced an autoscaling feature for both compute and storage.

“Rules specify when autoscaling is triggered, for example, when CPU utilization is above 75% over all replicas sustained for 30 minutes, then a new replica or node will be added to handle the increase,” the company said in a statement.

“Similarly, when CPU utilization is less than 50% over all replicas for an hour, nodes or a replica is removed. Users always specify the top and bottom threshold so there are never any cost surprises,” it explained, adding that enterprises only pay for the resources used.

In addition to autoscaling, the company has added serverless analytics capabilities eliminating the need for running extract, transform, load (ETL) tasks.

“SkySQL enables operational analytics on active transactional data as well as external data sources using a serverless analytics layer powered by Apache Spark SQL,” the company said, adding that this approach removes inconsistencies between an analytical view and a transactional view.

Further, it said that enterprises will pay for the compute used for analytics without the need to provision for processing power.

Additional features in the new release includes access for data scientists to a specific version of Apache Zeppelin notebooks.

“The notebook is pre-loaded with examples that demonstrate ways to run analytics on data stored in SkySQL. It can also be used to discover database schemas, running queries on data stored in Amazon S3 and federating queries to join data across SkySQL databases and S3 object storage,” the company said.

The new release of SkySQL has been made generally available on AWS and Google cloud. It includes updated MariaDB Xpand 6.1.1, Enterprise Server 10.6.12 and ColumnStore 6.3.1.

New customers signing up for the DBaaS can claim $500 in credits, MariaDB said.

Posted Under: Database
MariaDB’s new SkySQL release gets serverless analytics, cost management features

Posted by on 30 March, 2023

This post was originally published on this site

MariaDB is adding features such as serverless analytics and cost management to the new release of its managed database-as-a-service (DBaaS) SkySQL, it said Thursday.

SkySQL, which is a managed instance of the MariaDB platform, offers OLAP (online analytical processing) and OLTP (online transaction processing) along with enterprise features like sharding, load balancing, and auto-failover via a combination of MariaDB Xpand, MariaDB Enterprise Server, and MariaDB ColumnStore.

In order to help enterprises bring down the cost of databases and to manage expenditure better, MariaDB has introduced an autoscaling feature for both compute and storage.

“Rules specify when autoscaling is triggered, for example, when CPU utilization is above 75% over all replicas sustained for 30 minutes, then a new replica or node will be added to handle the increase,” the company said in a statement.

“Similarly, when CPU utilization is less than 50% over all replicas for an hour, nodes or a replica is removed. Users always specify the top and bottom threshold so there are never any cost surprises,” it explained, adding that enterprises only pay for the resources used.

In addition to autoscaling, the company has added serverless analytics capabilities eliminating the need for running extract, transform, load (ETL) tasks.

“SkySQL enables operational analytics on active transactional data as well as external data sources using a serverless analytics layer powered by Apache Spark SQL,” the company said, adding that this approach removes inconsistencies between an analytical view and a transactional view.

Further, it said that enterprises will pay for the compute used for analytics without the need to provision for processing power.

Additional features in the new release includes access for data scientists to a specific version of Apache Zeppelin notebooks.

“The notebook is pre-loaded with examples that demonstrate ways to run analytics on data stored in SkySQL. It can also be used to discover database schemas, running queries on data stored in Amazon S3 and federating queries to join data across SkySQL databases and S3 object storage,” the company said.

The new release of SkySQL has been made generally available on AWS and Google cloud. It includes updated MariaDB Xpand 6.1.1, Enterprise Server 10.6.12 and ColumnStore 6.3.1.

New customers signing up for the DBaaS can claim $500 in credits, MariaDB said.

Posted Under: Database
Google ambushes on-prem PostgreSQL with AlloyDB Omni

Posted by on 29 March, 2023

This post was originally published on this site

Google is developing a self-managed and downloadable version of its PostgreSQL-compatible AlloyDB fully managed database-as-a-service (DBaaS) in order to further help enterprises to modernize their legacy databases. It is now inviting applications for the private preview, it said Wednesday.

Dubbed AlloyDB Omni, the new offering uses the same underlying engine as AlloyDB and can be downloaded and run on premises, at the edge, across clouds, or even on developer laptops, Andi Gutmans, general manager of databases at Google Cloud, wrote in a blog post.

This means that enterprises using AlloyDB Omni will get AlloyDB’s improved transactional processing performance and memory management compared with standard PostgreSQL, and an index advisor to optimize frequently run queries.

“The AlloyDB Omni index advisor helps alleviate the guesswork of tuning query performance by conducting a deep analysis of the different parts of a query including subqueries, joins, and filters,” Gutmans said, adding that it periodically analyzes the database workload to identify queries that can benefit from indexes, and recommends new indexes that can increase query performance.

In order to reduce latency for query results, Omni uses AlloyDB’s columnar engine that keeps frequently queried data in an in-memory columnar format for faster scans, joins, and aggregations, the company said, adding that AlloyDB Omni uses machine learning to automatically organize data between row-based and columnar formats, convert the data when needed, and choose between columnar and row-based execution plans.

“This delivers excellent performance for a wide range of queries, with minimal management overhead,” Gutmans said.

How does AlloyDB Omni help enterprises?

Self-managed AlloyDB Omni provides a pathway to modernize legacy databases on-premises before moving to the cloud, analysts said.

“Database migrations can be complex and costly, especially when combined with migration from on-premises infrastructure to cloud. AlloyDB Omni provides a pathway for organizations to modernize those workloads in-place by migrating to AlloyDB Omni on-premises,” said Matt Aslett, research director at Ventana Research.

“This move can be seen as one step prior to a potential move to the AlloyDB managed service, or with a view to retaining the workloads in on-premises data centers or on edge infrastructure due to sovereignty or performance requirements,” he added.

According to Omdia’s Chief Analyst Bradley Shimmin and dbInsight’s Principal Analyst Tony Baer, AlloyDB Omni combines the best of open-source PostgreSQL and Google Cloud’s architecture, making it more appealing than rival services such as AWS Aurora for PostgreSQL and Microsoft’s CitiusDB, among others.

Shimmin said that for larger customers or those looking to modernize and transform sizable, mission-critical databases, “Sticking with an open-source solution like PostgreSQL can be limiting in terms of providing modern data architectures or features, especially in supporting multi or hybrid-deployment requirements.” AlloyDB Omni could overcome those limitations, he said.

For Baer, “The appeal of AlloyDB Omni is that it is one of the few PostgreSQL implementations optimized for both scale and mixed transaction or analytic workloads that is not solely tethered to a specific hyperscaler.”

What is Google’s strategy with AlloyDB Omni?

Google plans to use AlloyDB Omni as another offering in its plan to gain more share in the PostgreSQL-led legacy database migration market at a time when PostgreSQL has seen rise in popularity, the analysts said.

Shimmin noted that, “For many customers, PostgreSQL is a relational lingua-franca and therefore a means of modernizing legacy databases by porting them to a cloud-native rendition on AWS, GCP or any other hyperscaler.”

According to data from relational databases knowledge platform db-engines.com, PostgreSQL has been steadily rising in popularity and is currently the fourth-most-popular RDBMS (relational database management system) and fourth-most-popular product cited among all databases in their rankings.

Another reason for PostgreSQL’s rise in popularity is that the database management system offers better transactional and analytical capabilities than MySQL along with other features such as extended support for spatial data, broader SQL support, enhanced security and governance, and expanded support for programming languages.

Google’s Gutmans said the company has received “huge” interest from customers for database modernization since the launch of AlloyDB.

And according to Aslett, AlloyDB Omni builds on AlloyDB’s momentum for Google to gain share in the PostgreSQL market.

“AlloyDB was launched to enable organizations to modernize applications with high-end performance and reliability requirements that have previously been deployed on-premises on enterprise operational databases including Oracle, IBM and Microsoft, as well as PostgreSQL,” he said.

“By 2025, two-thirds of organizations will re-examine their current operational database suppliers with a view to improving fault tolerance and supporting the development of new intelligent operational applications,” he added.

According to a report from market research firm Gartner, the race to modernize databases is accelerating due to enterprises’ need to run analytics for business strategy and growth.

How to access AlloyDB Omni?

Google is currently offering the free developer version of AlloyDB Omni for non-production use, which can be downloaded on developers’ laptops.

“When it’s time to move an application to a production-ready environment, it will run unchanged on AlloyDB Omni in any environment, or on the AlloyDB for PostgreSQL service in Google Cloud,” Gutmans said.

“If needed, you can use standard open-source PostgreSQL tools to migrate or replicate their data. You can also use standard open-source PostgreSQL tools for database operations like backup and replication,” he added.

Google said AlloyDB Omni supports existing PostgreSQL applications as it uses standard PostgreSQL drivers. In addition, the software provides compatibility with PostgreSQL extensions and configuration flags.

Further, Google said that it will provide full enterprise support, including 24/7 technical support and software updates for security patches, features, when AlloyDB Omni is made generally available.

Although Google hasn’t yet set a date for that, enterprises can already get access to the technical preview of the offering by submitting a request to the search giant.

Posted Under: Database
Google offers modernization path for PostgreSQL with on-premises AlloyDB Omni

Posted by on 29 March, 2023

This post was originally published on this site

Google is developing a self-managed and downloadable version of its PostgreSQL-compatible AlloyDB fully managed database-as-a-service (DBaaS) in order to further help enterprises to modernize their legacy databases. It is now inviting applications for the private preview, it said Wednesday.

Dubbed AlloyDB Omni, the new offering uses the same underlying engine as AlloyDB and can be downloaded and run on premises, at the edge, across clouds, or even on developer laptops, Andi Gutmans, general manager of databases at Google Cloud, wrote in a blog post.

This means that enterprises using AlloyDB Omni will get AlloyDB’s improved transactional processing performance and memory management compared with standard PostgreSQL, and an index advisor to optimize frequently run queries.

“The AlloyDB Omni index advisor helps alleviate the guesswork of tuning query performance by conducting a deep analysis of the different parts of a query including subqueries, joins, and filters,” Gutmans said, adding that it periodically analyzes the database workload to identify queries that can benefit from indexes, and recommends new indexes that can increase query performance.

In order to reduce latency for query results, Omni uses AlloyDB’s columnar engine that keeps frequently queried data in an in-memory columnar format for faster scans, joins, and aggregations, the company said, adding that AlloyDB Omni uses machine learning to automatically organize data between row-based and columnar formats, convert the data when needed, and choose between columnar and row-based execution plans.

“This delivers excellent performance for a wide range of queries, with minimal management overhead,” Gutmans said.

How does AlloyDB Omni help enterprises?

Self-managed AlloyDB Omni provides a pathway to modernize legacy databases on-premises before moving to the cloud, analysts said.

“Database migrations can be complex and costly, especially when combined with migration from on-premises infrastructure to cloud. AlloyDB Omni provides a pathway for organizations to modernize those workloads in-place by migrating to AlloyDB Omni on-premises,” said Matt Aslett, research director at Ventana Research.

“This move can be seen as one step prior to a potential move to the AlloyDB managed service, or with a view to retaining the workloads in on-premises data centers or on edge infrastructure due to sovereignty or performance requirements,” he added.

According to Omdia’s Chief Analyst Bradley Shimmin and dbInsight’s Principal Analyst Tony Baer, AlloyDB Omni combines the best of open-source PostgreSQL and Google Cloud’s architecture, making it more appealing than rival services such as AWS Aurora for PostgreSQL and Microsoft’s CitiusDB, among others.

Shimmin said that for larger customers or those looking to modernize and transform sizable, mission-critical databases, “Sticking with an open-source solution like PostgreSQL can be limiting in terms of providing modern data architectures or features, especially in supporting multi or hybrid-deployment requirements.” AlloyDB Omni could overcome those limitations, he said.

For Baer, “The appeal of AlloyDB Omni is that it is one of the few PostgreSQL implementations optimized for both scale and mixed transaction or analytic workloads that is not solely tethered to a specific hyperscaler.”

What is Google’s strategy with AlloyDB Omni?

Google plans to use AlloyDB Omni as another offering in its plan to gain more share in the PostgreSQL-led legacy database migration market at a time when PostgreSQL has seen rise in popularity, the analysts said.

Shimmin noted that, “For many customers, PostgreSQL is a relational lingua-franca and therefore a means of modernizing legacy databases by porting them to a cloud-native rendition on AWS, GCP or any other hyperscaler.”

According to data from relational databases knowledge platform db-engines.com, PostgreSQL has been steadily rising in popularity and is currently the fourth-most-popular RDBMS (relational database management system) and fourth-most-popular product cited among all databases in their rankings.

Another reason for PostgreSQL’s rise in popularity is that the database management system offers better transactional and analytical capabilities than MySQL along with other features such as extended support for spatial data, broader SQL support, enhanced security and governance, and expanded support for programming languages.

Google’s Gutmans said the company has received “huge” interest from customers for database modernization since the launch of AlloyDB.

And according to Aslett, AlloyDB Omni builds on AlloyDB’s momentum for Google to gain share in the PostgreSQL market.

“AlloyDB was launched to enable organizations to modernize applications with high-end performance and reliability requirements that have previously been deployed on-premises on enterprise operational databases including Oracle, IBM and Microsoft, as well as PostgreSQL,” he said.

“By 2025, two-thirds of organizations will re-examine their current operational database suppliers with a view to improving fault tolerance and supporting the development of new intelligent operational applications,” he added.

According to a report from market research firm Gartner, the race to modernize databases is accelerating due to enterprises’ need to run analytics for business strategy and growth.

How to access AlloyDB Omni?

Google is currently offering the free developer version of AlloyDB Omni for non-production use, which can be downloaded on developers’ laptops.

“When it’s time to move an application to a production-ready environment, it will run unchanged on AlloyDB Omni in any environment, or on the AlloyDB for PostgreSQL service in Google Cloud,” Gutmans said.

“If needed, you can use standard open-source PostgreSQL tools to migrate or replicate their data. You can also use standard open-source PostgreSQL tools for database operations like backup and replication,” he added.

Google said AlloyDB Omni supports existing PostgreSQL applications as it uses standard PostgreSQL drivers. In addition, the software provides compatibility with PostgreSQL extensions and configuration flags.

Further, Google said that it will provide full enterprise support, including 24/7 technical support and software updates for security patches, features, when AlloyDB Omni is made generally available.

Although Google hasn’t yet set a date for that, enterprises can already get access to the technical preview of the offering by submitting a request to the search giant.

Posted Under: Database
Working with Azure’s Data API builder

Posted by on 29 March, 2023

This post was originally published on this site

Microsoft’s platform-based approach to cloud development has allowed it to offer managed versions of many familiar elements of the tech stack, especially within its data platform. As well as its own SQL Server (as Azure SQL) and the no-SQL Cosmos DB, it has managed versions of familiar open source databases, including PostgreSQL and MySQL.

Using these familiar databases and APIs makes it easy to migrate data from on premises to Azure, or to build new cloud-native applications without a steep learning curve. Once your data is stored on Azure, you can use familiar tools and techniques to use it from your code, especially if you’re working with .NET and Java which have plenty of official and unofficial data SDKs. But what if you’re taking advantage of newer development models like Jamstack and using tools like Azure Static Web Apps to add API-driven web front ends to your applications?

Although you could use tools such as Azure Functions or App Service to build your own data API layer, it adds inefficiencies and increases your maintenance and testing requirements. Instead, you can now use Microsoft’s own Data API builder tool. It’s simple to configure and gives a database either REST or GraphQL endpoints that can quickly be consumed by JavaScript or any other REST-aware language. It’s also possibly the fastest way to start turning Azure-hosted databases into applications.

Introducing Data API builder

Designed to run on premises, at the edge, and in the cloud, Data API builder is an open source tool targeting five different databases: Azure SQL, SQL Server, PostgreSQL, MySQL, and Cosmos DB. You can work with your own installations as well as with Microsoft’s own managed services, so you can develop and run in your own data center and migrate code to the cloud as needed.

If you’re using Data API builder as part of your own code, it’s a .NET tool that’s available as a Nuget package. You need .NET 6 or 7 to run it, and it runs on any .NET-compatible system, including Linux. Once it’s installed, you can use its CLI to build the appropriate endpoints for your databases, ready for use in your applications. Alternatively, you can use a ready-to-run container image from Microsoft’s container registry. This approach works well if you’re targeting edge container environments, such as the new Azure Kubernetes Service (AKS) Edge Essentials, which gives you a limited managed Kubernetes platform.

Installation is quick and you can use the tool with the dab command from any command line. Help is relatively basic, but as this is a very focused tool, you shouldn’t find it hard to use. Single-purpose command-line tools like this are an increasingly important part of the .NET ecosystem, and it’s worth being familiar with them as they can save a lot of work and time.

Building APIs at the command line

It’s a good idea to be familiar with ADO.NET to use Data API builder. That’s not surprising; it’s the standard way of accessing data services in .NET and, at heart, this is a .NET tool, even if you’re using it to build web applications.

To make a connection, you’ll need to know the structure of your database and which elements you want to expose. At the same time, you also need any ADO connection strings so you can make the initial connection to your database. For Azure resources, these can be found in the Azure Portal as part of your resource properties. You don’t need to store the connection data in the clear; you have the option of using environment variables to hold data outside your code at runtime, so you can use tools like Azure Key Vault to keep any secrets safe.

Data API builder uses a JSON configuration file to store details of any APIs you build. Create this by defining the database type, along with a connection string. Be sure to use an account with appropriate permissions for your application. The configuration file details the supported API types, so you can enable either REST, GraphQL, or both. Other parts of the configuration file specify the mode, whether cross-origin scripting is allowed, and the authentication type used for the connection. While the CLI tool creates and updates configuration data, you can edit it yourself using the GitHub-hosted documentation.

Once you have defined a connection, you can set up the APIs for your data. Using familiar database entities, give the API a name and tie it to a source, like a table or a query, and give it permissions associated with users and database operations. The name is used to build the API path for both REST and GraphQL.

With a connection defined and entities added to the configuration file, you’re now ready to build and serve the API. The Data API builder is perhaps best thought of as a simple broker that takes REST and GraphQL connections, maps them to prebuilt ADO statements, and runs them on the source before returning results and remapping them into the appropriate format. The REST API supports common verbs that map to standard CRUD (create, read, update, delete) operations; for example, GET will retrieve data and POST will write it.

Each REST verb has additional query parameters to help manage your data. You can filter data, order it, and apply select statements. Unfortunately, even though you can pick the first however many items to display, there doesn’t seem to be a way to paginate data at present. Hopefully, this will be added in a future release as it would simplify building web content from the query data.

Using GraphQL with Data API builder

If you’re planning to use GraphQL, it’s worth using a tool such as Postman to help build and test requests. GraphQL can do a lot more than a basic REST query, but it can be hard to build queries by hand. Having a tool to explore the API and test queries can save a lot of time. For more complex GraphQL queries, you will need to build relationships into your configuration. Here it helps to have an entity diagram of your data source with defined relationships that you can describe by the type of relationship, the target entity for the query, and how the relationship is stored in your database.

The process of making an API is the same for all the supported databases, with one difference for Cosmos DB. As it already has a REST API, there’s no need to generate another. However, you can still use it to create a GraphQL API.

If you’re using this approach with Azure Static Web Apps, first use the Azure Portal to add your source database to your site configuration. You then need to import an existing Data API builder configuration file. You can use both the Database API builder and the Azure Static Web Apps CLI to create the files needed. The Static Web Apps CLI creates a stub file for the configuration, which you can either edit by hand or paste in the contents of a Database API builder file.

Being able to add GraphQL support to any database is important; it’s a much more efficient way to query data than traditional APIs, simplifying complex queries. By supporting both REST and GraphQL APIs, Data API builder can help migrate between API types, allowing you to continue using familiar queries at the same time as you learn how to structure GraphQL. As an added bonus, while this is a tool that works for any application framework, it’s well worth using with Azure Static Web Apps to build data-connected Jamstack apps.

Posted Under: Database
YugabyteDB Managed adds managed command line interface

Posted by on 27 March, 2023

This post was originally published on this site

Yugabyte on Monday said that it was adding a new managed command line interface along with other features to the managed version of its open source distributed SQL database, dubbed YugabyteDB Managed.

The new Managed Command Line Interface (CLI), according to the company, allows developers to benefit from the advantages of automation while writing code without needing to learn new skills.

“Developers of all levels can easily create and manage clusters from their terminal or Integrated Development Environment (IDE) and make use of the most advanced set of tools available for optimizing database performance and driving the business forward,” Karthik Ranganathan, CTO and co-founder, Yugabyte, said in a statement.

This means that developers can create and manage clusters hosted in YugabyteDB Managed from their IDE or terminal without requiring REST API or Terraform skills, added Ranganathan.

In addition, the new Managed CLI can automate repetitive tasks and has an auto-completion feature that makes it easy for developers, database administrators, and DevOps engineers to discover new features, the company said.

The new CLI also comes with support for multiple platforms such as Mac, Windows and Linux, the company added. The Windows version can be downloaded from GitHub.

The latest update to YugabyteDB Managed also features enhanced observability features with the company adding over 125 new SQL and storage layer metrics.

“With these new metrics, enterprises will gain even deeper insights into their database’s performance, making it easier to identify and resolve performance issues quickly,” the company wrote in a blog post.

The cloud-based user interface for observability inside YugabyteDB Managed includes new visualization options to reorder metrics for a custom dashboard and new synchronized tooltips in charts for easier troubleshooting, the company added.

Further, the company has added support for AWS PrivateLink, a service that provides private connectivity between virtual private clouds, supported AWS services, and on-premises networks.

“This feature, now in private preview, is available for dedicated clusters created in YugabyteDB Managed on AWS, as an alternative to VPC peering, for secure access to your databases over a private network,” the company said.

The support for AWS PrivateLink also provides more secure access to an enterprise’s databases, it added.

Enterprises that already are a user of YugabyteDB can get access to a free trial of YugbyteDB Managed with all features on request.

Posted Under: Database
Oracle adds machine learning features to MySQL HeatWave

Posted by on 23 March, 2023

This post was originally published on this site

Oracle is adding new machine learning features to its data analytics cloud service MySQL HeatWave.

MySQL HeatWave combines OLAP (online analytical processing), OLTP (online transaction processing), machine learning, and AI-driven automation in a single MySQL database.

The new machine learning capabilities will be added to the service’s AutoML and MySQL Autopilot components, the company said when it announced the update on Thursday.

While AutoML allows developers and data analysts to build, train and deploy machine learning models within MySQL HeatWave without moving to a separate service for machine learning, MySQL Autopilot provides machine learning-based automation to HeatWave and OLTP such as auto provisioning, auto encoding, auto query plan, auto shape prediction and auto data placement, among other features.

AutoML augments time series forecasting via machine learning

The new machine learning-based capabilities added to AutoML include multivariate time series forecasting, unsupervised anomaly detection, and recommender systems, Oracle said, adding that all the new features were generally available.

“Multivariate time series forecasting can predict multiple time-ordered variables, where each variable depends both on its past value and the past values of other dependent variables. For example, it is used to build forecasting models to predict electricity demand in the winter considering the various sources of energy used to generate electricity,” said Nipun Agarwal, senior vice president of research at Oracle.

In contrast to the regular practice of having a statistician trained in time-series analysis or forecasting to select the right algorithm for the desired output, AutoML’s multivariate time series forecasting automatically preprocesses the data to select the best algorithm for the ML model and automatically tunes the model, the company said.

“The HeatWave AutoML automated forecasting pipeline uses a patented technique that consists of stages including advanced time-series preprocessing, algorithm selection and hyperparameter tuning,” said Agarwal, adding that this automation can help enterprises save time and effort as they don’t need to have trained statisticians on staff.

The multivariate time series forecasting feature, according to Constellation Research Principal Analyst Holger Muller, is unique to Oracle’s MySQL HeatWave.

“Time series forecasting, multivariate or otherwise, is not currently offered as part of a single database that offers machine learning-augmented analytics. AWS, for example, offers a separate database for time series,” Muller said.

HeatWave enhances anomaly detection

Along with multivariate time series forecasting, Oracle is adding machine-learning based “unsupervised” anomaly detection to MySQL HeatWave.

In contrast to the practice of using specific algorithms to detect specific anomalies in data, AutoML can detect different types of anomalies from unlabeled data sets, the company said, adding that this feature helps enterprise users when they don’t know what anomaly types are in the dataset.

“The model generated by HeatWave AutoML provides high accuracy for all types of anomalies — local, cluster, and global. The process is completely automated, eliminating the need for data analysts to manually determine which algorithm to use, which features to select, and the optimal values of the hyperparameters,” said Agarwal.

In addition, AutoML has added a recommendation engine, which it calls recommender systems, that underpins automation for algorithm selection, feature selection, and hyperparameter optimization inside MySQL HeatWave.

“With MySQL HeatWave, users can invoke the ML_TRAIN procedure, which automatically trains the model that is then stored in the MODEL_CATALOG. To predict a recommendation, users can invoke ML_PREDICT_ROW or ML_PREDICT_TABLE,” said Agarwal.

Business users get MySQL HeatWave AutoML console

In addition, Oracle is adding an interactive console for business users inside HeatWave.

“The new interactive console lets business analysts build, train, run, and explain ML models using the visual interface — without using SQL commands or any coding,” Agarwal said, adding that the console makes it easier for business users to explore conditional scenarios for their enterprise.

“The addition of the interactive console is in line with enterprises trying to make machine learning accountable. The console will help business users dive into the deeper end of the pool as they want to evolve into ‘citizen data scientists’ to avoid getting into too much hot water,” said Tony Baer, principal analyst at dbInsight.

The console has been made initially available for MySQL HeatWave on AWS.

Oracle also said that it would be adding support for storage on Amazon S3 for HeatWave on AWS to reduce cost as well improve the availability of the service.

“When data is loaded from MySQL (InnoDB storage engine) into HeatWave, a copy is made to the scale-out data management layer built on S3. When an operation requires reloading of data to HeatWave, such as during error recovery, data can be accessed in parallel by multiple HeatWave nodes and the data can be directly loaded into HeatWave without the need for any transformation,” said Agarwal.

MySQL Autopilot updates

The new features added to MySQL HeatWave include two new additions to MySQL Autopilot —  Auto Shape prediction advisor integration with the interactive console and auto unload.

“Within the interactive console, database users can now access the MySQL Autopilot Auto shape prediction advisor that continuously monitors the OLTP workload to recommend with an explanation the right compute shape at any given time — allowing customers to always get the best price-performance,” Agarwal said.

The auto unload feature, according to the company, can recommend which tables to be unloaded based on workload history.

“Freeing up memory reduces the size of the cluster required to run a workload and saves cost,” Agarwal said, adding that both the features were in general availability.

HeatWave targets smaller data volumes

Oracle is offering a smaller shape HeatWave to attract customers with smaller sizes of data.

In contrast to the earlier size of 512GB for a standard HeatWave node, the smaller shape will have a size of 32GB with the ability to process up to 50GB for a price of $16 per month, the company said.

In addition, the company said that data processing capability for its standard 512GB HeatWave Node has been increased from 800GB to 1TB.

“With this increase and other query performance improvements, the price performance benefit of HeatWave has further increased by 15%,” said Agarwal.

Posted Under: Database
Tailscale: Fast and easy VPNs for developers

Posted by on 15 March, 2023

This post was originally published on this site

Networking can be an annoying problem for software developers. I’m not talking about local area networking or browsing the web, but the much harder problem of ad hoc, inbound, wide area networking.

Suppose you create a dazzling website on your laptop and you want to share it with your friends or customers. You could modify the firewall on your router to permit incoming web access on the port your website uses and let your users know the current IP address and port, but that could create a potential security vulnerability. Plus, it would only work if you have control over the router and you know how to configure firewalls for port redirection.

Alternatively, you could upload your website to a server, but that’s an extra step that can often become time-consuming, and maintaining dedicated servers can be a burden, both in time and money. You could spin up a small cloud instance and upload your site there, but that is also an extra step that can often become time-consuming, even though it’s often fairly cheap.

Another potential solution is Universal Plug and Play (UPnP), which enables devices to set port forwarding rules by themselves. UPnP needs to be enabled on your router, but it’s only safe if the modem and router are updated and secure. If not, it creates serious security risks on your whole network. The usual advice from security vendors is not to enable it, since the UPnP implementations on many routers are still dangerous, even in 2023. On the other hand, if you have an Xbox in the house, UPnP is what it uses to set up your router for multiplayer gaming and chat.

A simpler and safer way is Tailscale, which allows you to create an encrypted, peer-to-peer virtual network using the secure WireGuard protocol without generating public keys or constantly typing passwords. It can traverse NAT and firewalls, span subnets, use UPnP to create direct connections if it’s available, and connect via its own network of encrypted TCP relay servers if UPnP is not available.

In some sense, all VPNs (virtual private networks) compete with Tailscale. Most other VPNs, however, route traffic through their own servers, which tends to increase the network latency. One major use case for server-based VPNs is to make your traffic look like it’s coming from the country where the server is located; Tailscale doesn’t help much with this. Another use case is to penetrate corporate firewalls by using a VPN server inside the firewall. Tailscale competes for this use case, and usually has a simpler setup.

Besides Tailscale, the only other peer-to-peer VPN is the free open source WireGuard, on which Tailscale builds. Wireguard doesn’t handle key distribution and pushed configurations. Tailscale takes care of all of that.

What is Tailscale?

Tailscale is an encrypted point-to-point VPN service based on the open source WireGuard protocol. Compared to traditional VPNs based on central servers, Tailscale often offers higher speeds and lower latency, and it is usually easier and cheaper to set up and use.

Tailscale is useful for software developers who need to set up ad hoc networking and don’t want to fuss with firewalls or subnets. It’s also useful for businesses that need to set up VPN access to their internal networks without installing a VPN server, which can often be a significant expense.

Installing and using Tailscale

Signing up for a Tailscale Personal plan was free and quick; I chose to use my GitHub ID for authentication. Installing Tailscale took a few minutes on each machine I tried: an M1 MacBook Pro, where I installed it from the macOS App Store; an iPad Pro, installed from the iOS App Store; and a Pixel 6 Pro, installed from the Google Play Store. Installing on Windows starts with a download from the Tailscale website, and installing on Linux can be done using a curl command and shell script, or a distribution-specific series of commands.

tailscale 01 IDG

You can install Tailscale on macOS, iOS, Windows, Linux, and Android. This tab shows the instructions for macOS.

Tailscale uses IP addresses in the 100.x.x.x range and automatically assigns DNS names, which you can customize if you wish. You can see your whole “tailnet” from the Tailscale site and from each machine that is active on the tailnet.

In addition to viewing your machines, you can view and edit the services available, the users of your tailnet, your access controls (ACL), your logs, your tailnet DNS, and your tailnet settings.

tailscale 02 IDG

Once the three devices were running Tailscale, I could see them all on my Tailscale login page. I chose to use my GitHub ID for authentication, as I was testing just for myself. If I were setting up Tailscale for a team I would use my team email address.

tailscale 06 IDG

Tailscale pricing.

Tailscale installs a CLI on desktop and laptop computers. It’s not absolutely necessary to use this command line, but many software developers will find it convenient.

How Tailscale works

Tailscale, unlike most VPNs, sets up peer-to-peer connections, aka a mesh network, rather than a hub-and-spoke network. It uses the open source WireGuard package (specifically the userspace Go variant, wireguard-go) as its base layer.

For public key distribution, Tailscale does use a hub-and-spoke configuration. The coordination server is at login.tailscale.com. Fortunately, public key distribution takes very little bandwidth. Private keys, of course, are never distributed.

You may be familiar with generating public-private key pairs manually to use with ssh, and including a link to the private key file as part of your ssh command line. Tailscale does all of that transparently for its network, and ties the keys to whatever login or 2FA credentials you choose.

The key pair steps are:

  1. Each node generates a random public/private key pair for itself, and associates the public key with its identity.
  2. The node contacts the coordination server and leaves its public key and a note about where that node can currently be found, and what domain it’s in.
  3. The node downloads a list of public keys and addresses in its domain, which have been left on the coordination server by other nodes.
  4. The node configures its WireGuard instance with the appropriate set of public keys.

Tailscale doesn’t handle user authentication itself. Instead, it always outsources authentication to an OAuth2, OIDC (OpenID Connect), or SAML provider, including Gmail, G Suite, and Office 365. This avoids the need to maintain a separate set of user accounts or certificates for your VPN.

tailscale 07 IDG

Tailscale CLI help. On macOS, the CLI executable lives inside the app package. A soft link to this executable doesn’t seem to work on my M1 MacBook Pro, possibly because Tailscale runs in a sandbox.

NAT traversal is a complicated process, one that I personally tried unsuccessfully to overcome a decade ago. NAT (network address translation) is one of the ways firewalls work: Your computer’s local address of, say, 192.168.1.191, gets translated in the firewall, as a packet goes from your computer to the internet, to your current public IP address and a random port number, say 173.76.179.155:9876, and remembers that port number as yours. When a site returns a response to your request, your firewall recognizes the port and translates it back to your local address before passing you the response.

tailscale 08 IDG

Tailscale status, Tailscale pings to two devices, and plain pings to the same devices using the native network. Notice that the Tailscale ping to the Pixel device first routes via a DERP server (see below) in NYC, and then manages to find the LAN connection.

Where’s the problem? Suppose you have two firewall clients trying to communicate peer-to-peer. Neither can succeed until someone or something tells both ends what port to use.

This arbitrator will be a server when you use the STUN (Session Traversal Utilities for NAT) protocol; while STUN works on most home routers, it unfortunately doesn’t work on most corporate routers. One alternative is the TURN (Traversal Using Relays around NAT) protocol, which uses relays to get around the NAT deadlock issue; the trouble with that is that TURN is a pain in the neck to implement, and there aren’t many existing TURN relay servers.

Tailscale implements a protocol of its own for this, called DERP (Designated Encrypted Relay for Packets). This use of the term DERP has nothing to do with being goofy, but it does suggest that someone at Tailscale has a sense of humor.

Tailscale has DERP servers around the world to keep latency low; these include nine servers in the US. If, for example, you are trying to use Tailscale to connect your smartphone from a park to your desktop at your office, the chances are good that the connection will route via the nearest DERP server. If you’re lucky, the DERP server will only be used as a side channel to establish the connection. If you’re not, the DERP server will carry the encrypted WireGuard traffic between your nodes.

Tailscale vs. other VPNs

Tailscale offers a reviewer’s guide. I often look at such documents and then do my own thing because I’ve been around the block a couple of times and recognize when a company is putting up straw men and knocking them down, but this one is somewhat helpful. Here are some key differentiators to consider.

With most VPNs, when you are disconnected you have to log in again. It can be even worse when your company has two internet providers and has two VPN servers to handle them, because you usually have to figure out what’s going on by trial and error or by attempting to call the network administrator, who is probably up to his or her elbows in crises. With Tailscale (and WireGuard), the connection just resumes. Similarly, many VPN servers have trouble with flakey connections such as LTE. Tailscale and WireGuard take the flakiness in stride.

With most VPNs, getting a naive user connected for the first time is an exercise in patience for the network administrator and possibly scary for the user who has to “punch a hole” in her home firewall to enable the connection. With Tailscale it’s a five-minute process that isn’t scary at all.

Most VPNs want to be exclusive. Connecting to two VPN concentrators at once is considered a cardinal sin and a potential security vulnerability, especially if they are at different companies. Tailscale doesn’t care. WireGuard can handle this situation just fine even with hub-and-spoke topologies, and with Tailscale point-to-point connections there is a Zero Trust configuration that exposes no vulnerability.

Tailscale solutions

Tailscale has documented about a dozen solutions to common use cases that can be addressed with its ad hoc networking. These range from wanting to code from your iPad to running a private Minecraft server without paying for hosting or opening up your firewall.

As we’ve seen, Tailscale is simple to use, but also sophisticated under the hood. It’s an easy choice for ad hoc networking, and a reasonable alternative to traditional hub-and-spoke VPNs for companies. The only common VPN function that I can think of that it won’t do is spoof your location so that you can watch geographically restricted video content—but there are free VPNs that handle that.

Cost: Personal, open source, and “friends and family” plans, free. Personal Pro, $48 per year. Team, $5 per user per month (free trial available). Business, $15 per user per month (free trial available). Custom plans, contact sales.

Platform: macOS 10.13 or later, Windows 7 SP1 or later, Linux (most major distros), iOS 15 or later, Android 6 or later, Raspberry Pi, Synology.

Posted Under: Tech Reviews
Page 11 of 12« First...89101112

Social Media

Bulk Deals

Subscribe for exclusive Deals

Recent Post

Facebook

Twitter

Subscribe for exclusive Deals




Copyright 2015 - InnovatePC - All Rights Reserved

Site Design By Digital web avenue