Why your cloud database performance sucks

Posted by on 8 December, 2023

This post was originally published on this site

“My cloud application is slow,” is a common complaint. However, nine times out of ten the cause does not lie with the application processing or the database’s inability to serve the application at the required performance level.

It’s almost 2024. Why are we still having these issues with cloud-based database performance? What are the most common causes? How can we fix them? I have a few ideas. 

Did you choose the right cloud service? 

Cloud providers offer many database services, such as Amazon RDS, Azure SQL Database, and Google Cloud SQL. Sometimes the database you chose based on your application’s requirements, scalability, and performance expectations must be adjusted to ensure a more appropriate fit.

 In many cases, databases were selected for the wrong reasons. For instance, the future requires the storage and management of binaries, which leads to the selection of object databases. However, a relational database is the right choice for this specific use case. Consider all factors, including managed services, geographic locations, and compatibility. 

Also, consider performance when selecting a database type and brand. The assumption is that it’s on the cloud, and the cloud is “infinitely scalable,” so any database will perform well. The type of databases you select should depend on the data type you’re looking to store and how you’ll use the data, such as columnar hierarchical, relational, object, etc. The most popular database and the one that works for your specific use case are rarely the same.

How’s your database design and indexing? 

This is huge. Efficient database design and proper indexing significantly impact performance. Most underperforming database problems trace their roots to database design issues, especially overly complex database structures and misapplied indexing.

Make sure to establish appropriate indexes to speed up data retrieval. Regularly review and optimize queries to eliminate bottlenecks. Make sure that your database schema is optimized. Also, normalize the database where necessary, but know that over-normalizing can be just as bad. For those who didn’t take Database 101 in the 1990s, it means organizing data into separate, interrelated tables or other native database containers in a database.

Normalization aims to minimize redundancy and dependency by eliminating duplicate data and breaking down larger tables into smaller, more manageable ones. I’ve found that the process of database normalization to maximize performance is often overlooked and causes many performance issues.

Are you scaling resources appropriately? 

Although public cloud providers offer highly scalable resources to adapt to varying workloads, they often need to be more effective. You need to investigate the implementation of auto-scaling features to adjust resources based on demand and dynamically. Horizontal scaling (adding more instances) and vertical scaling (increasing instance size) can be used strategically for high-performance requirements. 

However, be careful allowing the cloud provider to allocate resources automatically on your behalf. In many instances, they allocate too many, and you’ll get a big bill at the end of the month. You should determine a balance versus just selecting the auto-scale button.

Is your storage configuration a disaster?

It’s best to optimize storage configurations based on the workload characteristics, not the best practices you saw in a cloud certification course. For instance, utilize SSDs for I/O-intensive workloads but understand that they are often more expensive. Also, choose the right storage tier and implement caching mechanisms to reduce the need for frequent disk I/O operations. Indeed, caching has also gone into an automated state, where you may need more granular control to find the optimum performance with the minimum cost. 

Cloud architects and database engineers need to do better at database performance. In some cases, it means getting back to the basics of good database design, configuration, and deployment. This is becoming a lost art, as those charged with cloud systems seem to prefer tossing money at the problem. That is not the way you solve problems.

Next read this:

Posted Under: Database
Don’t make Apache Kafka your database

Posted by on 5 December, 2023

This post was originally published on this site

It’s a tale as old as time. An enterprise is struggling against the performance and scalability limitations of its incumbent relational database. Teams tasked with finding a newer solution land on an event-driven architecture, take one look at Apache Kafka, and say, “Aha! Here’s our new database solution.” It’s fast. It’s scalable. It’s highly available. It’s the superhero they hoped for!

Those teams set up Kafka as their database and expect it to serve as their single source of truth, storing and fetching all the data they could ever need. Except, that’s when the problems begin. The core issue is that Kafka isn’t actually a database, and using it as a database won’t solve the scalability and performance issues they’re experiencing.

What is and isn’t a database?

When developers conceptualize a database, they generally think of a data store with a secondary index and tables, like most SQL and NoSQL solutions. Another traditional requirement is ACID compliance: atomicity, consistency, isolation, and durability. However, the traditional thinking around what is or isn’t a database is being challenged regularly. For example, Redis does not have tables, and RocksDB does not have secondary indexes. And neither is ACID compliant. However, both are commonly referred to as a database. Similarly, Apache Cassandra is known as a NoSQL database, but it is not ACID compliant.

I draw the line at Kafka, which I will argue is not a database and, largely, should not be used as a database. I’d venture to say the open-source Kafka community at large holds the same perspective.

Kafka doesn’t have a query language. You can access specific records for a specific time frame, but you’re accessing a write-ahead log. Kafka does have offsets and topics, but they aren’t a substitute for indexes and tables. Crucially, Kafka isn’t ACID compliant. Although it’s possible to use Kafka as a data store or to create your own version of a database, Kafka isn’t a database in and of itself.

That begs the question, does it ever make sense to pursue using Kafka as a database anyway? Does your use case demand it? Do you have the expertise to absorb the mounting technical debt of forcing Kafka to act like a database in the long term? For most users and use cases, my answer is a firm no.

Kafka is best as a team player

Selecting the right technology for, well, any use case comes down to matching a solution to the problem you’re trying to solve. Kafka is intended to function as a distributed event streaming platform, full stop. While it can be used as a long-term data store (technically), doing so means major tradeoffs when it comes to accessing those data. Tools in Kafka’s ecosystem like ksqlDB can make Kafka feel more like a database, but that approach only functions up to medium-scale use cases. Most enterprises that choose to implement Apache Kafka have high-velocity data, and ksqlDB doesn’t keep up with their needs.

The right strategy is to let Kafka do what it does best, namely ingest and distribute your events in a fast and reliable way. For example, consider an ecommerce website with an API that would traditionally save all data directly to a relational database with massive tables—with poor performance, scalability, and availability as the result. Introducing Kafka, we can design a superior event-driven ecosystem and instead push that data from the API to Kafka as events.

This event-driven approach separates processing into separate components. One event might consist of customer data, another may have order data, and so on—enabling multiple jobs to process events simultaneously and independently. This approach is the next evolution in enterprise architecture. We’ve gone from monolith to microservices and now event-driven architecture, which reaps many of the same benefits of microservices with higher availability and more speed.

Once events are sitting in Kafka, you have tremendous flexibility in what you do with them. If it makes sense for the raw events to be stored in a relational databases, use an ecosystem tool like Kafka Connect to make that easy. Relational databases are still a critical tool in the modern enterprise architecture, especially when you consider the advantages of working with familiar tools and a mature ecosystem. Kafka isn’t a replacement for the tools we know and love. It simply enables us to handle the massive influx of data we’re seeing.

Pluggable and versatile, but not a database

Kafka provides its greatest value in enabling use cases such as data aggregation and real-time metrics. Using Kafka and Apache ecosystem tools like Spark, Flink, or KStreams, developers can perform aggregations and transformations of streaming data and then push that data to the desired database. Some of these tools can also aggregate data in a time-series or windowed fashion and push it to a reporting engine for real-time metrics.

If developers wish to save certain data to a cache—perhaps to support a website or CRM systems—it’s simple to tap into the Kafka data stream and push data to Redis or a compacted Kafka topic. Data streaming from Kafka allows teams to add various components as they see fit without worrying about any degradation in service, because Kafka is so gosh-darn scalable, reliable, and available. That includes feeding data into any data store, whether that’s Apache Cassandra, big data platforms, data lakes, or almost any other option.

If data is the lifeblood of a modern enterprise, Kafka should be the heart of your data ecosystem. With Kafka, users can pipe data wherever it needs to go. In this way, Kafka is complementary to your database, but should not be your database. The right prescription for Kafka should include the direction “use as intended,” meaning as a powerful message broker and the central data pipeline of your organization.

Andrew Mills is a senior solutions architect at Instaclustr, part of Spot by NetApp, which provides a managed platform and support around open-source technologies. In 2016 Andrew began his data streaming journey, developing deep, specialized knowledge of Apache Kafka and the surrounding ecosystem. He has designed and implemented several big data pipelines with Kafka at the core.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

Next read this:

Posted Under: Database
AWS re:Invent 2023: 7 takeaways from the big annual event

Posted by on 4 December, 2023

This post was originally published on this site

At the AWS re:Invent conference last week, the spotlight was focused on artificial intelligence, with the new generative AI assistant, Amazon Q, debuting as the star of the show. But there was plenty other news to spark the interest of database managers, data scientists, data engineers, and developers, including new extract, transform, load (ETL) services, a new Cost Optimization Hub, and revamped enterprise pricing tier for AWS’ cloud-based development tool, dubbed Amazon CodeCatalyst.

Here are seven key takeaways from the conference:

The cloud services provider, which has been adding infrastructure capabilities and chips since the last year to support high-performance computing with enhanced energy efficiency, announced the latest iterations of its Graviton and the Trainium chips.

The Graviton4 processor, according to AWS, provides up to 30% better compute performance, 50% more cores, and 75% more memory bandwidth than the current generation Graviton3 processors.

Trainium2, on the other hand, is designed to deliver up to four times faster training than first-generation Trainium chips.

At re:Invent, AWS also extended its partnership with Nvidia, including support for the DGX Cloud, a new GPU project named Ceiba, and new instances for supporting generative AI workloads.

Nvidia also shared plans to integrate its NeMo Retriever microservice into AWS to help users with the development of generative AI tools like chatbots. NeMo Retriever is a generative AI microservice that enables enterprises to connect custom large language models (LLMs) to enterprise data, so the company can generate proper AI responses based on their own data.

Further, AWS said that it will be the first cloud provider to bring Nvidia’s GH200 Grace Hopper Superchips to the cloud.

Updated models added to Bedrock include Anthropic’s Claude 2.1 and Meta Llama 2 70B, both of which have been made generally available. Amazon also has added its proprietary Titan Text Lite and Titan Text Express foundation models to Bedrock.

In addition, the cloud services provider has added a model in preview, Amazon Titan Image Generator, to the AI app-building service.

AWS also has released a new feature within Bedrock that allows enterprises to evaluate, compare, and select the best foundational model for their use case and business needs.

Dubbed Model Evaluation on Amazon Bedrock and currently in preview, the feature is aimed at simplifying several tasks such as identifying benchmarks, setting up evaluation tools, and running assessments, the company said, adding that this saves time and cost.

In order to help enterprises train and deploy large language models efficiently, AWS introduced two new offerings — SageMaker HyperPod and SageMaker Inference — within its Amazon SageMaker AI and machine learning service.

In contrast to the manual model training process — which is prone to delays, unnecessary expenditure and other complications — HyperPod removes the heavy lifting involved in building and optimizing machine learning infrastructure for training models, reducing training time by up to 40%, the company said.

SageMaker Inference, on the other hand, is targeted at helping enterprise reduce model deployment cost and decrease latency in model responses. In order to do so, Inference allows enterprises to deploy multiple models to the same cloud instance to better utilize the underlying accelerators.

AWS has also updated its low code machine learning platform targeted at business analysts,  SageMaker Canvas.

Analysts can use natural language to prepare data inside Canvas in order to generate machine learning models, said Swami Sivasubramanian, head of database, analytics and machine learning services for AWS. The no code platform supports LLMs from Anthropic, Cohere, and AI21 Labs.

SageMaker also now features the Model Evaluation capability, now called SageMaker Clarify, which can be accessed from within the SageMaker Studio.

Last Tuesday, AWS CEO Adam Selipsky premiered the star of the cloud giant’s re:Invent 2023 conference: Amazon Q, the company’s answer to Microsoft’s GPT-driven Copilot generative AI assistant.    

Amazon Q can be used by enterprises across a variety of functions including developing applicationstransforming code, generating business intelligence, acting as a generative AI assistant for business applications, and helping customer service agents via the Amazon Connect offering. 

The cloud services provider has announced a new program, dubbed Amazon Braket Direct, to offer researchers direct, private access to quantum computers.

The program is part of AWS’ managed quantum computing service, named Amazon Braket, which was introduced in 2020. 

Amazon Bracket Direct allows researchers across enterprises to get private access to the full capacity of various quantum processing units (QPUs) without any wait time and also provides the option to receive expert guidance for their workloads from AWS’ team of quantum computing specialists, AWS said.

Currently, the Direct program supports the reservation of IonQ Aria, QuEra Aquila, and Rigetti Aspen-M-3 quantum computers.

The IonQ is priced at $7,000 per hour and the QuEra Aquila is priced at $2,500 per hour. The Aspen-M-3 is priced slightly higher at $3,000 per hour.

The updates announced at re:Invent include a new AWS Billing and Cost Management feature, dubbed AWS Cost Optimization Hub, which makes it easy for enterprises to identify, filter, aggregate, and quantify savings for AWS cost optimization recommendations.

The new Hub, according to the cloud services provider, gathers all cost-optimizing recommended actions across AWS Cloud Financial Management (CFM) services, including AWS Cost Explorer and AWS Compute Optimizer, in one place.

It incorporates customer-specific pricing and discounts into these recommendations, and it deduplicates findings and savings to give a consolidated view of an enterprise’s cost optimization opportunities, AWS added.

The feature is likely to help FinOps or infrastructure management teams understand cost optimization opportunities.

Continuing to build on its efforts toward zero-ETL for data warehousing services, AWS announced new Amazon RedShift integrations with Amazon Aurora PostgreSQL, Amazon DynamoDB, and Amazon RDS for MySQL.

Enterprises, typically, use extract, transform, load (ETL) to integrate data from multiple sources into a single consistent data store to be loaded into a data warehouse for analysis.

However, most data engineers claim that transforming data from disparate sources could be a difficult and time-consuming task as the process involves steps such as cleaning, filtering, reshaping, and summarizing the raw data. Another issue is the added cost of maintaining teams that prepare data pipelines for running analytics, AWS said.

In contrast, the new zero-ETL integrations, according to the company, eliminate the need to perform ETL between Aurora PostgreSQL, DynamoDB, RDS for MySQL, and RedShift as transactional data in these databases can be replicated into RedShift almost immediately and is ready for running analysis.

Other generative AI-related updates at re:Invent include updated support for vector databases for Amazon Bedrock. These databases include Amazon Aurora and MongoDB. Other supported databases include Pinecone, Redis Enterprise Cloud, and Vector Engine for Amazon OpenSearch Serverless.  

The company also added a new enterprise pricing tier to its cloud-based development tool, dubbed Amazon CodeCatalyst.

Next read this:

Posted Under: Database
AWS re:Invent 2023: 7 Takeaways from the big annual event

Posted by on 4 December, 2023

This post was originally published on this site

At the AWS re:Invent conference last week, the spotlight was focused on artificial intelligence, with the new generative AI assistant, Amazon Q, debuting as the star of the show. But there was plenty other news to spark the interest of database managers, data scientists, data engineers, and developers, including new extract, transform, load (ETL) services, a new Cost Optimization Hub, and revamped enterprise pricing tier for AWS’ cloud-based development tool, dubbed Amazon CodeCatalyst.

Here are seven key takeaways from the conference:

The cloud services provider, which has been adding infrastructure capabilities and chips since the last year to support high-performance computing with enhanced energy efficiency, announced the latest iterations of its Graviton and the Trainium chips.

The Graviton4 processor, according to AWS, provides up to 30% better compute performance, 50% more cores, and 75% more memory bandwidth than the current generation Graviton3 processors.

Trainium2, on the other hand, is designed to deliver up to four times faster training than first-generation Trainium chips.

At re:Invent, AWS also extended its partnership with Nvidia, including support for the DGX Cloud, a new GPU project named Ceiba, and new instances for supporting generative AI workloads.

Nvidia also shared plans to integrate its NeMo Retriever microservice into AWS to help users with the development of generative AI tools like chatbots. NeMo Retriever is a generative AI microservice that enables enterprises to connect custom large language models (LLMs) to enterprise data, so the company can generate proper AI responses based on their own data.

Further, AWS said that it will be the first cloud provider to bring Nvidia’s GH200 Grace Hopper Superchips to the cloud.

Updated models added to Bedrock include Anthropic’s Claude 2.1 and Meta Llama 2 70B, both of which have been made generally available. Amazon also has added its proprietary Titan Text Lite and Titan Text Express foundation models to Bedrock.

In addition, the cloud services provider has added a model in preview, Amazon Titan Image Generator, to the AI app-building service.

AWS also has released a new feature within Bedrock that allows enterprises to evaluate, compare, and select the best foundational model for their use case and business needs.

Dubbed Model Evaluation on Amazon Bedrock and currently in preview, the feature is aimed at simplifying several tasks such as identifying benchmarks, setting up evaluation tools, and running assessments, the company said, adding that this saves time and cost.

In order to help enterprises train and deploy large language models efficiently, AWS introduced two new offerings — SageMaker HyperPod and SageMaker Inference — within its Amazon SageMaker AI and machine learning service.

In contrast to the manual model training process — which is prone to delays, unnecessary expenditure and other complications — HyperPod removes the heavy lifting involved in building and optimizing machine learning infrastructure for training models, reducing training time by up to 40%, the company said.

SageMaker Inference, on the other hand, is targeted at helping enterprise reduce model deployment cost and decrease latency in model responses. In order to do so, Inference allows enterprises to deploy multiple models to the same cloud instance to better utilize the underlying accelerators.

AWS has also updated its low code machine learning platform targeted at business analysts,  SageMaker Canvas.

Analysts can use natural language to prepare data inside Canvas in order to generate machine learning models, said Swami Sivasubramanian, head of database, analytics and machine learning services for AWS. The no code platform supports LLMs from Anthropic, Cohere, and AI21 Labs.

SageMaker also now features the Model Evaluation capability, now called SageMaker Clarify, which can be accessed from within the SageMaker Studio.

Last Tuesday, AWS CEO Adam Selipsky premiered the star of the cloud giant’s re:Invent 2023 conference: Amazon Q, the company’s answer to Microsoft’s GPT-driven Copilot generative AI assistant.    

Amazon Q can be used by enterprises across a variety of functions including developing applicationstransforming code, generating business intelligence, acting as a generative AI assistant for business applications, and helping customer service agents via the Amazon Connect offering. 

The cloud services provider has announced a new program, dubbed Amazon Braket Direct, to offer researchers direct, private access to quantum computers.

The program is part of AWS’ managed quantum computing service, named Amazon Braket, which was introduced in 2020. 

Amazon Bracket Direct allows researchers across enterprises to get private access to the full capacity of various quantum processing units (QPUs) without any wait time and also provides the option to receive expert guidance for their workloads from AWS’ team of quantum computing specialists, AWS said.

Currently, the Direct program supports the reservation of IonQ Aria, QuEra Aquila, and Rigetti Aspen-M-3 quantum computers.

The IonQ is priced at $7,000 per hour and the QuEra Aquila is priced at $2,500 per hour. The Aspen-M-3 is priced slightly higher at $3,000 per hour.

The updates announced at re:Invent include a new AWS Billing and Cost Management feature, dubbed AWS Cost Optimization Hub, which makes it easy for enterprises to identify, filter, aggregate, and quantify savings for AWS cost optimization recommendations.

The new Hub, according to the cloud services provider, gathers all cost-optimizing recommended actions across AWS Cloud Financial Management (CFM) services, including AWS Cost Explorer and AWS Compute Optimizer, in one place.

It incorporates customer-specific pricing and discounts into these recommendations, and it deduplicates findings and savings to give a consolidated view of an enterprise’s cost optimization opportunities, AWS added.

The feature is likely to help FinOps or infrastructure management teams understand cost optimization opportunities.

Continuing to build on its efforts toward zero-ETL for data warehousing services, AWS announced new Amazon RedShift integrations with Amazon Aurora PostgreSQL, Amazon DynamoDB, and Amazon RDS for MySQL.

Enterprises, typically, use extract, transform, load (ETL) to integrate data from multiple sources into a single consistent data store to be loaded into a data warehouse for analysis.

However, most data engineers claim that transforming data from disparate sources could be a difficult and time-consuming task as the process involves steps such as cleaning, filtering, reshaping, and summarizing the raw data. Another issue is the added cost of maintaining teams that prepare data pipelines for running analytics, AWS said.

In contrast, the new zero-ETL integrations, according to the company, eliminate the need to perform ETL between Aurora PostgreSQL, DynamoDB, RDS for MySQL, and RedShift as transactional data in these databases can be replicated into RedShift almost immediately and is ready for running analysis.

Other generative AI-related updates at re:Invent include updated support for vector databases for Amazon Bedrock. These databases include Amazon Aurora and MongoDB. Other supported databases include Pinecone, Redis Enterprise Cloud, and Vector Engine for Amazon OpenSearch Serverless.  

The company also added a new enterprise pricing tier to its cloud-based development tool, dubbed Amazon CodeCatalyst.

Next read this:

Posted Under: Database
Key new features and innovations in EDB Postgres 16

Posted by on 1 December, 2023

This post was originally published on this site

PostgreSQL 16, the latest major release of your favorite open source RDBMS, set new standards for database management, data replication, system monitoring, and performance optimization. Like clockwork, EnterpriseDB (EDB), a leading contributor to PostgreSQL code and leading provider of the Postgres database to enterprises, has unveiled its latest portfolio release for Postgres 16.1.

The milestone EDB Postgres 16 portolio release integrates the core advancements of PostgreSQL 16, reaffirming EDB’s dedication to the Postgres community and driving innovation in this technology. Let’s take a look at the key features added to the EDB Postgres 16 portolio release.

Performance and scalability enhancements

The new release boasts significant improvements in parallel processing and faster query execution, elevating Postgres’s status as a sophisticated open-source database. These enhancements are poised to benefit enterprises by facilitating more efficient data processing and quicker response times, crucial in today’s fast-paced business environments​​.

Advanced security features

Security takes a front seat in EDB Postgres 16, with the introduction of flexible cryptographic key support and enhancements to Transparent Data Encryption (TDE), which has been updated to offer options for both AES-128 and AES-256 encryption. This allows customers to select AES-128 for scenarios where performance and energy efficiency are priorities, and AES-256 for instances where compliance with regulatory standards or achieving the highest level of security is essential.

The addition of privilege analysis further strengthens the database by adhering to the principle of least privilege, which involves tracing and documenting all active and inactive privileges assigned to a role. This approach allows customers to tighten their database security by methodically revoking unnecessary privileges, thereby preventing both deliberate and accidental data access or alterations. Additionally, this system facilitates the provision of comprehensive reports on database privileges for each role to auditors.

Oracle compatibility and easier migration

Acknowledging the challenges of migrating from Oracle databases, EDB has enhanced its Oracle compatibility features, prioritizing the most common incompatibilities found in EDB Migration Portal. The results led EDB To expand coverage in Oracle packages such as DBMS_SESSION, DBMS_SQL, and UTL_FILE. This additional coverage is a significant boon for organizations migrating from legacy systems while maintaining familiar workflows and minimizing disruption​​​​.

EDB also has introduced SPL Check, which aims to transform the developer experience for developers working with stored procedures. Instead of writing stored procedures and ensuring complete application suite testing to detect errors, SPL Check helps detect errors not found until runtime despite a successful CREATE PROCEDURE/FUNCTION command.

Additional features compatible with Oracle have been incorporated into the SQL MERGE command, aiming to minimize the discrepancies encountered during runtime between Oracle’s MERGE and PostgreSQL’s MERGE.

Lastly, the update also introduces new NLS Charset functions, namely NLS_CHARSET_ID, NLS_CHARSET_NAME, and NLS_CHARSET_DECL_LEN. 

Enhanced management and administrative control

EDB Postgres 16 introduces sophisticated role membership controls, providing administrators with greater oversight of user activities. This update is crucial for managing complex enterprise databases, ensuring optimal performance even under high-intensity workloads. Additionally, enhanced visibility into table and index usage paves the way for more informed decision-making and efficient database management.

EDB’s latest offering is a testament to its enduring commitment to advancing Postgres. Improved scalability, enhanced security features, and better management tools make EDB Postgres 16 a premier choice for enterprises worldwide. This release not only underscores EDB’s innovation but also solidifies its role in addressing the dynamic needs of modern businesses​​​​.

Adam Wright is the senior product manager of core database, extensions, and backup/restore at EDB.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

Next read this:

Posted Under: Database
AWS updates Bedrock, SageMaker to boost generative AI offerings

Posted by on 29 November, 2023

This post was originally published on this site

At its ongoing re:Invent 2023 conference, AWS unveiled several updates to its SageMaker, Bedrock and database services in order to boost its generative AI offerings.

Taking to the stage on Wednesday, AWS vice president of data and AI, Swami Sivasubramanian, unveiled updates to existing foundation models inside its generative AI application-building service, Amazon Bedrock.

The updated models added to Bedrock include Anthropic’s Claude 2.1 and Meta Llama 2 70B, both of which have been made generally available. Amazon also has added its proprietary Titan Text Lite and Titan Text Express foundation models to Bedrock.

In addition, the cloud services provider has added a model in preview, Amazon Titan Image Generator, to the AI app-building service.

The model, which can be used to rapidly generate and iterate images at low cost, can understand complex prompts and generate relevant images with accurate object composition and limited distortions, AWS said.

Enterprises can use the model in the Amazon Bedrock console either by submitting a natural language prompt to generate an image or by uploading an image for automatic editing, before configuring the dimensions and specifying the number of variations the model should generate.

Invisible watermark identifies AI images

The images generated by Titan have an invisible watermark to help reduce the spread of disinformation by providing a discreet mechanism to identify AI-generated images.

Foundation models that are currently available in Bedrock include large language models (LLMs) from the stables of AI21 Labs, Cohere Command, Meta, Anthropic, and Stability AI.

These models, with the exception of Anthropic’s Claude 2, can be fine-tuned inside Bedrock, the company said, adding that support for fine-tuning Claude 2 was expected to be released soon.

In order to help enterprises generate embeddings for training or prompting foundation models, AWS is also making its Amazon Titan Multimodal Embeddings generally available.

“The model converts images and short text into embeddings — numerical representations that allow the model to easily understand semantic meanings and relationships among data — which are stored in a customer’s vector database,” the company said in a statement.

Evaluating the best foundational model for generative AI apps

Further, AWS has released a new feature within Bedrock that allows enterprises to evaluate, compare, and select the best foundational model for their use case and business needs.

Dubbed Model Evaluation on Amazon Bedrock and currently in preview, the feature is aimed at simplifying several tasks such as identifying benchmarks, setting up evaluation tools, and running assessments, the company said, adding that this saves time and cost.

“In the Amazon Bedrock console, enterprises choose the models they want to compare for a given task, such as question-answering or content summarization,” Sivasubramanian said, explaining that for automatic evaluations, enterprises select predefined evaluation criteria (e.g., accuracy, robustness, and toxicity) and upload their own testing data set or select from built-in, publicly available data sets.

For subjective criteria or nuanced content requiring sophisticated judgment, enterprises can set up human-based evaluation workflows — which leverage an enterprise’s in-house workforce — or use a managed workforce provided by AWS to evaluate model responses, Sivasubramanian said.

Other updates to Bedrock include Guardrails, currently in preview, targeted at helping enterprises adhere to responsible AI principles. AWS has also made Knowledge Bases and Amazon Agents for Bedrock generally available.

SageMaker capabilities to scale large language models

In order to help enterprises train and deploy large language models efficiently, AWS introduced two new offerings — SageMaker HyperPod and SageMaker Inference — within its Amazon SageMaker AI and machine learning service.

In contrast to the manual model training process — which is prone to delays, unnecessary expenditure and other complications — HyperPod removes the heavy lifting involved in building and optimizing machine learning infrastructure for training models, reducing training time by up to 40%, the company said.

The new offering is preconfigured with SageMaker’s distributed training libraries, designed to let users automatically split training workloads across thousands of accelerators, so workloads can be processed in parallel for improved model performance.

HyperPod, according to Sivasubramanian, also ensures customers can continue model training uninterrupted by periodically saving checkpoints.

Helping enterprises reduce AI model deployment cost

SageMaker Inference, on the other hand, is targeted at helping enterprise reduce model deployment cost and decrease latency in model responses. In order to do so, Inference allows enterprises to deploy multiple models to the same cloud instance to better utilize the underlying accelerators.

“Enterprises can also control scaling policies for each model separately, making it easier to adapt to model usage patterns while optimizing infrastructure costs,” the company said, adding that SageMaker actively monitors instances that are processing inference requests and intelligently routes requests based on which instances are available.

AWS has also updated its low code machine learning platform targeted at business analysts,  SageMaker Canvas.

Analysts can use natural language to prepare data inside Canvas in order to generate machine learning models, Sivasubramanian said. The no code platform supports LLMs from Anthropic, Cohere, and AI21 Labs.

SageMaker also now features the Model Evaluation capability, now called SageMaker Clarify, which can be accessed from within the SageMaker Studio.

Other generative AI-related updates include updated support for vector databases for Amazon Bedrock. These databases include Amazon Aurora and MongoDB. Other supported databases include Pinecone, Redis Enterprise Cloud, and Vector Engine for Amazon OpenSearch Serverless.

Next read this:

Posted Under: Database
AWS adds more zero-ETL integrations to Amazon RedShift

Posted by on 29 November, 2023

This post was originally published on this site

Continuing to build on its efforts toward zero-ETL for data warehousing services, AWS at its ongoing re:Invent 2023 conference, announced new Amazon RedShift integrations with Amazon Aurora PostgreSQL, Amazon DynamoDB, and Amazon RDS for MySQL.

Enterprises, typically, use extract, transform, load (ETL) to integrate data from multiple sources into a single consistent data store to be loaded into a data warehouse for analysis.

However, most data engineers claim that transforming data from disparate sources could be a difficult and time-consuming task as the process involves steps such as cleaning, filtering, reshaping, and summarizing the raw data.

Another issue is the added cost of maintaining teams that prepare data pipelines for running analytics, AWS said.

In contrast, the new zero-ETL integrations, according to the company, eliminate the need to perform ETL between Aurora PostgreSQL, DynamoDB, RDS for MySQL, and RedShift as transactional data in these databases can be replicated into RedShift almost immediately and is ready for running analysis.

Currently, all three integrations are in preview.

Last year, AWS announced two new capabilities—Amazon Aurora zero-ETL integration with Amazon Redshift and Amazon Redshift integration for Apache Spark.

In addition, the cloud services provider made the Amazon DynamoDB zero-ETL integration with Amazon OpenSearch Service generally available.

This integration will allow data professionals across enterprises to perform a search on their DynamoDB data by automatically replicating and transforming it without custom code or infrastructure, AWS said.

Amazon DynamoDB zero-ETL integration with Amazon OpenSearch Service can be availed across any AWS Region where OpenSearch Ingestion is available presently, AWS added.

Next read this:

Posted Under: Database
Happy Hacking Keyboard Studio review: Mouse and keyboard in one tiny package

Posted by on 21 November, 2023

This post was originally published on this site

The Happy Hacking Keyboard line from PFU America is aimed at users who want a compact, but powerful and customizable keyboard with a great typing feel. The latest version of the HHKB (as it’s abbreviated) is the HHKB Studio, designed to compress both keyboard and mouse functionality into the most compact footprint possible. Like its predecessors, this keyboard isn’t cheap—its list price is $385—but it offers a mix of features you won’t find in other keyboards.

Let’s take a look.

HHKB Studio test drive

The HHKB Studio uses USB-C or Bluetooth and battery-powered connections, with both cabling and batteries included. Bluetooth pairing works with up to four distinct devices, and it can be used to command both Mac and Windows systems interchangeably.

I was fond of the soft-touch, smooth-sliding linear mechanical key switch mechanisms used in the HHKB Hybrid Type-S model I previously reviewed. The Studio uses the same switches, but you can swap in your own standard MX-stem switches—for instance, to give the non-alphanumeric keys a little more click, or to make the Esc key harder to actuate. The keycaps shipped with my unit used a gloss-black over matte-black color scheme that you’ll either find classy and stylish or next to impossible to make out. There is no key backlighting, but a brightly lit room helps.

HHKB Studio keyboard IDG

HHKB Studio features a super compact keyboard layout with a pointing stick mouse and gesture pads.

The super-compact 60-key layout means no dedicated cursor controls or number pad. Key controls for the arrows are accessed by way of function key combos. Also, the left Control key now sits where Caps Lock usually does; you use FN + Tab to access Caps Lock if needed. Each FN key combo is printed on the bottom front of each keycap, but again the black keycap colors on my unit made them tough to read without direct lighting.

For cursor control, the HHKB Studio adds two other features. One is the pointing stick mouse, as popularized by the original IBM ThinkPad. It’s set between the G/H/B key cluster, and complemented with thumb-reachable mouse buttons set below the space bar. It takes some practice to work with, but for basic mousing about it’s convenient, and the keyboard comes with four replacement caps for the stick mouse.

The other cursor control feature is four “gesture pads” along the front edges and sides of the unit. Slide your fingers along the left side and left front edges to move the cursor; slide them along the right side and right front edges to scroll the current window or tab between windows. You can also freely reassign the corresponding key actions for these movements.

The gesture pads are powerful and useful enough that I rarely relied on the arrow-diamond key cluster or even the pointing stick to move the cursor. However, you can trigger the gesture pads by accident. A couple of times I innocently bumped the side of the unit when moving it, and ended up sending keystrokes to a different window.

Many hackable keyboard models use the VIA standard, meaning you can change your keyboard’s layout or behaviors through a web browser app. HHKB does not support VIA, unfortunately; the keymapping and control tool provided for it runs as an installable desktop application.

Bottom line

Like its predecessor, the Happy Hacking Keyboard Studio packs functionality and a great typing feel into a small form factor. This version ramps up the functionality even further by letting you do away with a mouse. But you’ll have to decide if $385 is a worthy price.

Next read this:

Posted Under: Tech Reviews
How Apache Arrow accelerates InfluxDB

Posted by on 20 November, 2023

This post was originally published on this site

Historically, working with big data has been quite a challenge. Companies that wanted to tap big data sets faced significant performance overhead relating to data processing. Specifically, moving data between different tools and systems required leveraging different programming languages, network protocols, and file formats. Converting this data at each step in the data pipeline was costly and inefficient.

Enter Apache Arrow, an open-source framework that defines an in-memory columnar data format that every analytical processing engine can use.

Developed by open source leaders from Impala, Spark, Calcite, and others, Apache Arrow was designed to be the language-agnostic standard for efficient columnar memory representation to facilitate interoperability. Arrow provides zero-copy reads, reducing both memory requirements and CPU cycles, and because it was designed for modern CPUs and GPUs, Arrow can process data in parallel and leverage single-instruction/multiple data (SIMD) and vectorized processing and querying.

So far, Arrow has enjoyed widespread adoption.

Who’s using Apache Arrow?

Apache Arrow is the power behind many projects for data analytics and storage solutions, including:

  • Apache Spark, a large-scale parallel processing data engine that uses Arrow to convert Pandas DataFrames to Spark DataFrames. This enables data scientists to port over POC models developed on small data sets to large data sets.
  • Apache Parquet, an extremely efficient columnar storage format. Parquet uses Arrow for vectorized reads, which make columnar storage even more efficient by batching multiple rows in a columnar format.
  • InfluxDB, a time series data platform that uses Arrow to support near-unlimited cardinality use cases, querying in multiple query languages (including Flux, InfluxQL, SQL and more to come), and offering interoperability with BI and data analytics tools.
  • Pandas, a data analytics toolkit built on top of Python. Pandas uses Arrow to offer read and write support for Parquet.

The InfluxData-Apache Arrow effect

Earlier this year, InfluxData debuted a new database engine built on the Apache ecosystem. Developers wrote the new engine in Rust on top of Apache Arrow, Apache DataFusion, and Apache Parquet. With Apache Arrow, InfluxDB can support near-unlimited cardinality or dimensionality use cases by providing efficient columnar data exchange. To illustrate, imagine that we write the following data to InfluxDB:

field1 field2 tag1 tag2 tag3
1i null tagvalue1 null null
2i null tagvalue2 null null
3i null null tagvalue3 null
4i true tagvalue1 tagvalue3 tagvalue4

However, the engine stores the data in a columnar format like this:

1i 2i 3i 4i
null null null true
tagvalue1 tagvalue2 null tagvalue1
null null tagvalue3 tagvalue3
null null null tagvalue4
timestamp1 timestamp2 timestamp3 timestamp4

Or, in other words, the engine stores the data like this:

1i, 2i, 3i, 4i;
null, null, null, true;
tagvalue1, tagvalue2, null, tagvalue1;
null, null, tagvalue3, tagvalue3; 
null, null, null, tagvalue4;
timestamp1, timestamp2, timestamp3, timestamp4; 

By storing data in a columnar format, the database can group like data together for cheap compression. Specifically, Apache Arrow defines an inter-process communication mechanism to transfer a collection of Arrow columnar arrays (called a “record batch”) as described in this FAQ. This can be done synchronously between processes or asynchronously by first persisting the data in storage.

Additionally, time series data is unique because it usually has two dependent variables. The value of your time series is dependent on time, and values have some correlation with the values that preceded them. This attribute of time series means that InfluxDB can take advantage of the record batch compression to a greater extent through dictionary encoding. Dictionary encoding allows InfluxDB to eliminate storage of duplicate values, which frequently exist in time series data. InfluxDB also enables vectorized query instruction using SIMD instructions.

Apache Arrow contributions and the commitment to open source

In addition to a free tier of InfluxDB Cloud, InfluxData offers open-source versions of InfluxDB under a permissive MIT license. Open-source offerings provide the community with the freedom to build their own solutions on top of the code and the ability to evolve the code, which creates opportunities for real impact.

The true power of open source becomes apparent when developers not only provide open source code but also contribute to popular projects. Cross-organizational collaboration generates some of the most popular open source projects like TensorFlow, Kubernetes, Ansible, and Flutter. InfluxDB’s database engineers have contributed greatly to Apache Arrow, including the weekly release of https://crates.io/crates/arrow and https://crates.io/crates/parquet releases. They also help author DataFusion blog posts. Other InfluxData contributions to Arrow include:

Apache Arrow is proving to be a critical component in the architecture of many companies. Its in-memory columnar format supports the needs of analytical database systems, data frame libraries, and more. By taking advantage of Apache Arrow, developers will save time while also gaining access to new tools that also support Arrow.

Anais Dotis-Georgiou is a developer advocate for InfluxData with a passion for making data beautiful with the use of data analytics, AI, and machine learning. She takes the data that she collects and applies a mix of research, exploration, and engineering to translate the data into something of function, value, and beauty. When she is not behind a screen, you can find her outside drawing, stretching, boarding, or chasing after a soccer ball.

New Tech Forum provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to doug_dineley@foundryco.com.

Next read this:

Posted Under: Database
The best ORMs for database-powered Python apps

Posted by on 15 November, 2023

This post was originally published on this site

When you want to work with a relational database in Python, or most any other programming language, it’s common to write database queries “by hand,” using the SQL syntax supported by most databases.

This approach has its downsides, however. Hand-authored SQL queries can be clumsy to use, since databases and software applications tend to live in separate conceptual worlds. It’s hard to model how your app and your data work together.

Another approach is to use a library called an ORM, or object-relational mapping tool. ORMs let you describe how your database works through your application’s code—what tables look like, how queries work, and how to maintain the database across its lifetime. The ORM handles all the heavy lifting for your database, and you can concentrate on how your application uses the data.

This article introduces six ORMs for the Python ecosystem. All provide programmatic ways to create, access, and manage databases in your applications, and each one embodies a slightly different philosophy of how an ORM should work. Additionally, all of the ORMs profiled here will let you manually issue SQL statements if you so choose, for those times when you need to make a query without the ORM’s help.

6 of the best ORMs for Python

  • Django ORM
  • Peewee
  • PonyORM
  • SQLAlchemy
  • SQLObject
  • Tortoise ORM

Django

The Django web framework comes with most everything you need to build professional-grade websites, including its own ORM and database management tools. Most people will only use Django’s ORM with Django, but it is possible to use the ORM on its own. Also, Django’s ORM has massively influenced the design of other Python ORMs, so it’s a good starting point for understanding Python ORMs generally.

Models for a Django-managed database follow a pattern similar to other ORMs in Python. Tables are described with Python classes, and Django’s custom types are used to describe the fields and their behaviors. This includes things like one-to-many or many-to-many references with other tables, but also types commonly found in web applications like uploaded files. It’s also possible to create custom field types by subclassing existing ones and using Django’s library of generic field class methods to alter their behaviors.

Django’s command-line management tooling for working with sites includes powerful tools for managing a project’s data layer. The most useful ones automatically create migration scripts for your data, when you want to alter your models and migrate the underlying data to use the new models. Each change set is saved as its own migration script, so all migrations for a database are retained across the lifetime of your application. This makes it easier to maintain data-backed apps where the schema might change over time.

Peewee

Peewee has two big claims to fame. One, it’s a small but powerful library, around 6,600 lines of code in a single module. Two, it’s expressive without being verbose. While Peewee natively handles only a few databases, they’re among the most common ones: SQLite, PostgreSQL, MySQL/MariaDB, and CockroachDB.

Defining models and relationships in Peewee is a good deal simpler than in some other ORMs. One uses Python classes to create tables and their fields, but Peewee requires minimal boilerplate to do this, and the results are highly readable and easy to maintain. Peewee also has elegant ways to handle situations like foreign key references to tables that are defined later in code, or self-referential foreign keys.

Queries in Peewee use a syntax that hearkens back to SQL itself; for example, Person.select(Person.name, Person.id).where(Person.age>20). Peewee also lets you return the results as rich Python objects, as named tuples or dictionaries, or as a simple tuple for maximum performance. The results can also be returned as a generator, for efficient iteration over a large rowset. Window functions and CTEs (Common Table Expressions) also have first-class support.

Peewee uses many common Python metaphors beyond classes. For instance, transactions can be expressed by way of a context manager, as in with db.atomic():. You can’t use keywords like and or not with queries, but Peewee lets you use operators like & and ~ instead.

Sophisticated behaviors like optimistic locking and top n objects per group aren’t supported natively, but the Peewee documentation has a useful collection of tricks to implement such things. Schema migration is not natively supported, but Peewee includes a SchemaManager API for creating migrations along with other schema-management operations.

PonyORM

PonyORM‘s standout feature is the way it uses Python’s native syntax and language features to compose queries. For instance, PonyORM lets you express a SELECT query as a generator expression: query = select (u for u in User if u.name == "Davis").order_by(User.name). You can also use lambdas as parts of queries for filtering, as in query.filter(lambda user: user.is_approved is True). The generated SQL is also always accessible.

When you create database tables with Python objects, you use a class to declare the behavior of each field first, then its type. For instance, a mandatory, distinct name field would be name = Required(str, unique=True). Most common field types map directly to existing Python types, such as int/float/Decimal, datetime, bytes (for BLOB data), and so on. One potential point of confusion is that large text fields use PonyORM’s LongStr type; the Python str type is basically the underlying database’s CHAR.

PonyORM automatically supports JSON and PostgreSQL-style Array data types, as more databases now support both types natively. Where there isn’t native support, PonyORM can often shim things up—for example, SQLite versions earlier than 3.9 can use TEXT to store JSON, but more recent versions can work natively via an extension module.

Some parts of PonyORM hew less closely to Python’s objects and syntax. To describe one-to-many and many-to-many relationships in PonyORM, you use Set(), a custom PonyORM object. For one-to-one relationships, there are Optional() and Required() objects.

PonyORM has some opinionated behaviors worth knowing about before you build with it. Generated queries typically have the DISTINCT keyword added automatically, under the rationale that most queries shouldn’t return duplicates anyway. You can override this behavior with the .without_distinct() method on a query.

A major omission from PonyORM’s core is that there’s no tooling for schema migrations yet, although it’s planned for a future release. On the other hand, the makers of PonyORM offer a convenient online database schema editor as a service, with basic access for free and more advanced feature sets for $9/month.

SQLAlchemy

SQLAlchemy is one of the best-known and most widely used ORMs. It provides powerful and explicit control over just about every facet of the database’s models and behavior. SQLAlchemy 2.0, released early in 2023, introduced a new API and data modeling system that plays well with Python’s type linting and data class systems.

SQLAlchemy uses a two-level internal architecture consisting of Core and ORM. Core is for interaction with database APIs and rendering of SQL statements. ORM is the abstraction layer, providing the object model for your databases. This decoupled architecture means SQLAlchemy can, in theory, use any number or variety of abstraction layers, though there is a slight performance penalty. To counter this, some of SQLAlchemy’s components are written in C (now Cython) for speed.

SQLAlchemy lets you describe database schemas in two ways, so you can choose what’s most appropriate for your application. You can use a declarative system, where you create Table() objects and supply field names and types as arguments. Or you can declare classes, using a system reminiscent of the way dataclasses work. The former is easier, but may not play as nicely with linting tools. The latter is more explicit and correct, but requires more ceremony and boilerplate.

SQLAlchemy values correctness over convenience. For instance, when bulk-inserting values from a file, date values have to be rendered as Python date objects to be handled as unambiguously as possible.

Querying with SQLAlchemy uses a syntax reminiscent of actual SQL queries—for example, select(User).where(User.name == "Davis"). SQLachemy queries can also be rendered as raw SQL for inspection, along with any changes needed for a specific dialect of SQL supported by SQLAlchemy (for instance, PostgreSQL versus MySQL). The expression construction tools can also be used on their own to render SQL statements for use elsewhere, not just as part of the ORM. For debugging queries, a handy echo=True options` lets you see SQL statements in the console as they are executed.

Various SQLAlchemy extensions add powerful features not found in the core or ORM. For instance, the “horizontal sharding” add-on transparently distributes queries across multiple instances of a database. For migrations, the Alembic project lets you generate change scripts with a good deal of flexibility and configuration.

1

2



Page 2

SQLObject

SQLObject is easily the oldest project in this collection, originally created in 2002, but still being actively developed and released. It supports a very wide range of databases, and early in its lifetime supported many common Python ORM behaviors we might take for granted now—like using Python classes and objects to describe database tables and fields, and providing high levels of abstraction for those activities.

With most ORMs, by default, changes to objects are only reflected in the underlying database when you save or sync. SQLObject reflects object changes immediately in the database, unless you alter that behavior in the table object’s definition.

Table definitions in SQLObject use custom types to describe fields—for example, StringCol() to define a string field, and ForeignKey() for a reference to another table. For joins, you can use a MultipleJoin() attribute to get a table’s one-to-many back references, and RelatedJoin() for many-to-many relationships.

A handy sqlmeta class gives you more control over a given table’s programmatic behaviors—for instance, if you want to provide your own custom algorithm for how Python class names are translated into database table names, or a table’s default ordering.

The querying syntax is similar to other ORMs, but not always as elegant. For instance, an OR query across two fields would look like this:

User.select(OR(User.status=="Active", User.rank=="Admin"))

A whole slew of custom query builder methods are available for performing different kinds of join operations, which is useful if you explicitly want, say, a FULLOUTERJOIN instead of a NATURALRIGHTJOIN.

SQLObject has little in the way of utilities. Its biggest offering there is the ability to dump and load database tables to and from CSV. However, with some additional manual work, its native admin tool lets you record versions of your database’s schema and perform migrations; the upgrade process is not automatic.

Tortoise ORM

Tortoise ORM is the youngest project profiled here, and the only one that is asynchronous by default. That makes it an ideal companion for async web frameworks like FastAPI, or applications built on asynchronous principles, generally.

Creating models with Tortoise follows roughly the same pattern as other Python ORMs. You subclass Tortoise’s Model class, and use field classes like IntField, ForeignKeyField, or ManyToManyField to define fields and their relationships. Models can also have a Meta inner class to define additional details about the model, such as indexes or the name of the created table. For relationship fields, such as OneToOne, the field definition can also specify delete behaviors such as a cascading delete.

Queries in Tortoise do not track as closely to SQL syntax as some other ORMs. For instance, User.filter(rank="Admin") is used to express a SELECT/WHERE query. An .exclude() clause can be used to further refine results; for example, User.filter(rank="Admin").exclude(status="Disabled"). This approach does provide a slightly more compact way to express common queries than the .select().where() approach used elsewhere.

The Signals feature lets you specify behaviors before or after actions like saving or deleting a record. In other ORMs this would be done by, say, subclassing a model and overriding .save(). With Tortoise, you can wrap a function with a decorator to specify a signal action, outside of the model definition. Tortoise also has a “router” mechanism for allowing reads and writes to be applied to different databases if needed. A very useful function not commonly seen in ORMs is .explain(), which executes the database’s plan explainer on the supplied query.

Async is still a relatively new presence in Python’s ecosystem. To get a handle on how to use Tortoise with async web frameworks, the documentation provides examples for FastAPI, Quart, Sanic, Starlette, aiohttp, and others. For those who want to use type annotations (also relatively new to the Python ecosystem), a Pydantic plugin can generate Pydantic models from Tortoise models, although it only supports serialization and not deserialization of those models. An external tool, Aerich, generates migration scripts, and supports both migrating to newer and downgrading to older versions of a schema.

Conclusion

The most widely used of the Python ORMs, SQLAlchemy, is almost always a safe default choice, even if newer and more elegant tools exist. Peewee is compact and expressive, with less boilerplate needed for many operations, but it lacks more advanced ORM features like a native mechanism for schema migrations.

Django’s ORM is mainly for use with the Django web framework, but its power and feature set, especially its migration management system, make it a strong reason to consider Django as a whole. PonyORM’s use of native Python metaphors makes it easy to grasp conceptually, but be aware of its opinionated defaults.

SQLObject, the oldest of the ORMs profiled here, has powerful features for evoking exact behaviors (e.g., joins), but it’s not always elegant to use and has few native utilities. And the newest, Tortoise ORM, is async by default, so it complements the new generation of async-first web frameworks.

Next read this:

Posted Under: Database
Page 6 of 11« First...45678...Last »

Social Media

Bulk Deals

Subscribe for exclusive Deals

Recent Post

Facebook

Twitter

Subscribe for exclusive Deals




Copyright 2015 - InnovatePC - All Rights Reserved

Site Design By Digital web avenue