Bringing observability to the modern data stack

Posted by on 1 June, 2023

This post was originally published on this site

You can’t manage what you can’t measure. Just as software engineers need a comprehensive picture of the performance of applications and infrastructure, data engineers need a comprehensive picture of the performance of data systems. In other words, data engineers need data observability.

Data observability can help data engineers and their organizations ensure the reliability of their data pipelines, gain visibility into their data stacks (including infrastructure, applications, and users), and identify, investigate, prevent, and remediate data issues. Data observability can help solve all kinds of common enterprise data issues.

Data observability can help resolve data and analytics platform scaling, optimization, and performance issues, by identifying operational bottlenecks. Data observability can help avoid cost and resource overruns, by providing operational visibility, guardrails, and proactive alerts. And data observability can help prevent data quality and data outages, by monitoring data reliability across pipelines and frequent transformations.

Acceldata Data Observability Platform

Acceldata Data Observability Platform is an enterprise data observability platform for the modern data stack. It platform provides comprehensive visibility, giving data teams the real-time information they need to identify and prevent issues and make data stacks reliable.

Acceldata Data Observability Platform supports data sources such as Snowflake, Databricks, Hadoop, Amazon Athena, Amazon Redshift, Azure Data Lake, Google BigQuery, MySQL, and PostgreSQL. The Acceldata platform provides insights into:

  • Compute – Optimize compute, capacity, resources, costs, and performance of your data infrastructure.
  • Reliability – Improve data quality, reconciliation, and determine schema drift and data drift.
  • Pipelines – Identify issues with transformation, events, applications, and deliver alerts and insights.
  • Users – Real-time insights for data engineers, data scientists, data administrators, platform engineers, data officers, and platform leads.

The Acceldata Data Observability Platform is built as a collection of microservices that work together to manage various business outcomes. It gathers various metrics by reading and processing raw data as well as meta information from underlying data sources. It allows data engineers and data scientists to monitor compute performance and validate data quality policies defined within the system.

Acceldata’s data reliability monitoring platform allows you to set various types of policies to ensure that the data in your pipelines and databases meet the required quality levels and are reliable. Acceldata’s compute performance platform displays all of the computation costs incurred on customer infrastructure, and allows you to set budgets and configure alerts when expenditures reach the budget.

The Acceldata Data Observability Platform architecture is divided into a data plane and a control plane.

Data plane

The data plane of the Acceldata platform connects to the underlying databases or data sources. It never stores any data and returns metadata and results to the control plane, which receives and stores the results of the executions. The data analyzer, query analyzer, crawlers, and Spark infrastructure are a part of the data plane.

Data source integration comes with a microservice that crawls the metadata for the data source from their underlying meta store. Any profiling, policy execution, and sample data task is converted into a Spark job by the analyzer. The execution of jobs is managed by the Spark clusters.

acceldata 01 Acceldata

Control plane

The control plane is the platform’s orchestrator, and is accessible via UI and API interfaces. The control plane stores all metadata, profiling data, job results, and other data in the database layer. It manages the data plane and, as needed, sends requests for job execution and other tasks.

The platform’s data computation monitoring section obtains the metadata from external sources via REST APIs, collects it on the data collector server, and then publishes it to the data ingestion module. The agents deployed near the data sources collect metrics regularly before publishing them to the data ingestion module.

The database layer, which includes databases like Postgres, Elasticsearch, and VictoriaMetrics, stores the data collected from the agents and data control server. The data processing server facilitates the correlation of data collected by the agents and the data collector service. The dashboard server, agent control server, and management server are the data computation monitoring infrastructure services.

When a major event (errors, warnings) occurs in the system or subsystems monitored by the platform, it is either displayed on the UI or notified to the user via notification channels such as Slack or email using the platform’s alert and notification server.

acceldata 02 Acceldata

Key capabilities

Detect problems at the beginning of data pipelines to isolate them before they hit the warehouse and affect downstream analytics:

  • Shift left to files and streams: Run reliability analysis in the “raw landing zone” and “enriched zone” before data hits the “consumption zone” to avoid wasting costly cloud credits and making bad decisions due to bad data.
  • Data reliability powered by Spark: Fully inspect and identify issues at petabyte scale, with the power of open-source Apache Spark.
  • Cross-data-source reconciliation: Run reliability checks that join disparate streams, databases, and files to ensure correctness in migrations and complex pipelines.
acceldata 03 Acceldata

Get multi-layer operational insights to solve data problems quickly:

  • Know why, not just when: Debug data delays at their root by correlating data and compute spikes.
  • Discover the true cost of bad data: Pinpoint the money wasted computing on unreliable data.
  • Optimize data pipelines: Whether drag-and-drop or code-based, single platform or polyglot, you can diagnose data pipeline failures in one place, at all layers of the stack.
acceldata 04 Acceldata

Maintain a constant, comprehensive view of workloads and quickly identify and remediate issues through the operational control center: 

  • Built by data experts for data teams: Tailored alerts, audits, and reports for today’s leading cloud data platforms.
  • Accurate spend intelligence: Predict costs and control usage to maximize ROI even as platforms and pricing evolve.
  • Single pane of glass: Budget and monitor all of your cloud data platforms in one view.
acceldata 05 Acceldata

Complete data coverage with flexible automation:

  • Fully-automated reliability checks: Immediately know about missing, late, or erroneous data on thousands of tables. Add advanced data drift alerting with one click.
  • Reusable SQL and user-defined functions (UDFs): Express domain centric reusable reliability checks in five programming languages. Apply segmentation to understand reliability across dimensions.
  • Broad data source coverage: Apply enterprise data reliability standards across your company, from modern cloud data platforms to traditional databases to complex files.
acceldata 06 Acceldata

Acceledata’s Data Observability Platform works across diverse technologies and environments and provides enterprise data observability for modern data stacks. For Snowflake and Databricks, Acceldata can help maximize return on investment by delivering insight into performance, data quality, cost, and much more. For more information visit www.acceldata.io.

Ashwin Rajeeva is co-founder and CTO at Acceldata.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Next read this:

Posted Under: Database
PostgreSQL 16 advances query parallelism

Posted by on 26 May, 2023

This post was originally published on this site

PostgreSQL 16, the next major update of the open source relational database, has arrived in a beta release, highlighted by enhancements in query execution, logical replication, developer experience, and security.

PostgreSQL 16 Beta 1 was published on May 25. The new release improves query execution with more query parallelism, allowing parallel execution of FULL and RIGHT joins and parallel execution of the string_agg and array_agg aggregate functions. PostgreSQL 16 can use incremental sorts in SELECT DISTINCT queries, and improves performance of concurrent bulk loading of data using COPY by as much as 300%, the PostgreSQL Development Group said.

The PostgreSQL 16 release debuts support for CPU acceleration using SIMD for both x86 and Arm architectures, including optimizations for processing ASCII and JSON strings and array and subtransaction searches. Load balancing is introduced for libpq, the PostgreSQL client library.

With logical replication, PostgreSQL 16 can perform logical decoding on a standby instance, providing more options to distribute workloads. Logical replication lets PostgreSQL users stream data in real time to other PostgreSQL instances or other external systems that implement the logical protocol. Performance of logical replication also has been improved. Logical decoding now can be done on a standby instance, providing more options to distribute workloads.

For developers, PostgreSQL 16 continues to implement the SQL/JSON standard for manipulating JSON data, including support for SQL/JSON constructors. The release adds the SQL standard ANY_VALUE aggregate function, which returns any arbitrary value from the aggregate set. Developers can specify non-decimal integers such as 0xff and 0o777. And support has been added for the extended query protocol to the psql client.

PostgreSQL can be downloaded from the project web page for the Linux, Windows, macOS, BSD, and Solaris operating platforms. Additional betas are expected as required for testing, with the final release of PostgreSQL 16 due in late-2023.

Also in PostgreSQL 16:

  • Support has been added for Kerberos credential delegation, allowing extensions such as postgres_fdw and dblink to use the authenticated credentials to connect to other services. New security-oriented connection parameters are added for clients. And regular expressions now can be used in the pg_hba.conf and pg_ident.conf files for matching user and database names. PostgreSQL 16 supports the SQL standard SYSTEM_USER keyword, which returns the username and authentication for establishing a session.
  • PostgreSQL 16 introduces the Meson build system, which will ultimately replace Autoconf.
  • Monitoring features have been added including a pg_stat_io view to provide IO statistics. The page freezing strategy has been improved to help the performance of vacuuming and other maintenance operations. General support for text collations has been improved as well.

Next read this:

Posted Under: Database
Snowflake acquires Neeva to add generative AI-based search to Data Cloud

Posted by on 24 May, 2023

This post was originally published on this site

Cloud-based data warehouse company Snowflake on Wednesday said that it was acquiring Neeva, a startup based in Mountain View, California, for an undisclosed sum in an effort to add generative AI-based search to its Data Cloud platform.

“Snowflake is acquiring Neeva, a search company founded to make search even more intelligent at scale. Neeva created a unique and transformative search experience that leverages generative AI and other innovations to allow users to query and discover data in new ways,” Snowflake co-founder Benoit Dageville said in a blog post.

“Search is fundamental to how businesses interact with data, and the search experience is evolving rapidly with new conversational paradigms emerging in the way we ask questions and retrieve information, enabled by generative AI. The ability for teams to discover precisely the right data point, data asset, or data insight is critical to maximizing the value of data,” Dageville added.

Snowflake has been on an acquisition spree lately, with the company acquiring LeapYear in February to boost its data clean room abilties.

The LeapYear acquisition came just a month after Snowflake agreed to purchase artificial intelligence-based time series forecasting platform provider Myst AI, taking the company’s acquisition count to seven companies in three years.

In August 2022 it bought AI-based document analysis platform Applica, based in Poland, to help enterprises handle unstructured data.

Other acquisitions included Streamlit (March 2022), Polish custom software company Pragmatists (January 2022), Polish digital products development studio Polidea (February 2021), and Canadian data anonymization company CryptoNumerics (July 2020).

Neeva, which has raised over $77 million in funding till date from firms such as Greylock and Sequoia, was founded in 2019 by Sridhar Ramaswamy and Vivek Raghunathan.

Next read this:

Posted Under: Database
DataStax taps ThirdAI to bring generative AI to its database offerings

Posted by on 24 May, 2023

This post was originally published on this site

DataStax on Wednesday said that it was partnering with Houston-based startup ThirdAI to bring large language models (LLMs) to its database offerings, such as DataStax Enterprise for on-premises and NoSQL database-as-a-service AstraDB.  

The partnership, according to DataStax’s chief product officer, Ed Anuff, is part of the company’s strategy to bring artificial intelligence to where data is residing.

ThirdAI can be installed in the same cluster, on-premises or in the cloud, where DataStax is running because it comes with a small library and the installation can be processed with Python.

“The benefit is that the data does not have to move from DataStax to another environment, it is just passed to ThirdAI which is adjacent to it. This guarantees full privacy and also speed because no time is lost in transferring data over a network,” a DataStax spokesperson said. 

“ThirdAI can be run as a Python package or be accessed via an API, depending on the customer preference,” the spokesperson added.

Enterprises running DataStax Enterprise or AstraDB can use the data residing in those databases and ThirdAI’s tech and LLMs to spin up their own generative AI applications. The foundation models from ThirdAI can be trained to understand data and answer queries, such as which product recommendation would likely result in a sale, based on a customer’s history.

The integration of ThirdAI’s LLMs will see DataStax imbibe the startup’s Bolt technology, which can achieve better AI training performance on CPUs compared to GPUs for relatively smaller models. The advantage of this is that CPUs are generally priced lower than GPUs, which are usually used for AI and machine learning workloads.

“The Bolt engine, which is an algorithmic accelerator for training deep learning models, can reduce computations exponentially. The algorithm achieves neural network training in 1% or fewer floating point operations per second (FLOPS), unlike standard tricks like quantization, pruning, and structured sparsity, which only offer a slight constant factor improvement,” ThirdAI said in a blog post.

“The speedups are naturally observed on any CPU, be it Intel, AMD, or ARM. Even older versions of commodity CPUs can be made equally capable of training billion parameter models faster than A100 GPUs,” it added.

Bolt can also be invoked by “just a few” line changes in existing Python machine learning pipelines, according to ThirdAI.

The announcement with ThirdAI is the first in a new partnership program that DataStax is setting up to bring in more technology from AI startups that can help enterprises with data residing on Datastax databases develop generative AI applications.

Next read this:

Posted Under: Database
Yugabyte adds multiregion Kubernetes support to YugabyteDB 2.18

Posted by on 24 May, 2023

This post was originally published on this site

Yugabyte has added multiregion Kubernetes support along with other features in the latest update to its open source distributed SQL database YugabyteDB 2.18.

The update, which is already in general availability, adds multiregion Kubernetes support to the company’s self-managed, database-as-a-service YugabyteDB Anywhere.

To help enterprises eliminate points of friction while deploying Kubernetes, the company has added support for shared namespaces, incremental backups, and up to five times faster backups, Yugabyte said.

“Multiregion, multicluster Kubernetes deployments are made simpler through the combination of YugabyteDB’s native synchronous replication and Kubernetes Multicluster Service (MCS) APIs,” Yugabyte said.

Enterprises who need a standard two-datacenter configuration can configure and leverage the power of xCluster asynchronous replication for Kubernetes, Yugabyte said, adding that it is also adding support for Mirantis Kubernetes Engine (MKE) to provide more variety of orchestration platforms.

The update also includes an intelligent performance advisor for YugabyteDB Anywhere, which optimizes indexes, queries, and schema. Other updates to the self-managed, database-as-a-service include security features and granular recovery with the point-in-time recovery feature.

The YugabyteDB 2.18 update also comes with the general availability of collocated tables, new query pushdowns, and scheduled full compactions that improve performance on diverse workloads, the company said.

YugabyteDB supports “collocating” SQL tables, which allow for closely related data in ‘colocated’ tables to reside together in a single parent tablet called the “colocation tablet,” according to the company.

“Colocation helps to optimize for low-latency, high-performance data access by reducing the need for additional trips across the network. It also reduces the overhead of creating a tablet for every relation (tables, indexes, and so on) and the storage for these per node,” the company added in a blog post.

Scheduled full compactions, according to Yugabyte, improve the performance of data access and minimize space amplification.

In March, the company added a new managed command line interface and other features to YugabyteDB Managed.

In its last update which was made generally available in June last year, the company added a new migration service dubbed Voyager.

In May last year, the company launched Yugabyte 2.13, which included new features such as materialized views, local reads for performance, and region-local backups.

Next read this:

Posted Under: Database
Refurbished Computers Making a Difference

Posted by on 15 June, 2015

Each day we all have an opportunity make a difference in the lives of other people. Each of us has unique ways we can make that happen. Whether donating through a charity, through a small kindness offered to a stranger, helping out special people in our own lives, and the list can go on and on.

If you take a moment to look at charities, there are a variety of ways in which you can help, as there are so many charities available. Whether your choice is donating money to a charity, donating food to a food pantry, or donating clothes to a shelter, each of us can make a difference. The key to this is finding a charity that means something to you and a charity where you can make a difference.

At Innovative Computer Products we are so pleased to have found a charity where we can make a difference, and that organization is CFY (www.cfy.org). Through CFY and our One for One Program we are able to reach out to the neediest students who have no means of obtaining home technology.

Donating refurbished computers is the key to our One for One Program. For every refurbished desktop computer we sell at Innovative Computer Products, we donate one refurbished desktop computer to CFY. CFY is such a worthy organization and through their own means along with our One for One Program, the technology is truly getting out there to the families that need it. Last year we were able to donate over two thousand refurbished computers to CFY through our One on One Program.

CFY has many different ways you can make a difference in their organization, and we urge you to do so. While our method is donating computers, maybe your method will be with your money, time or talent, or possibly computer donation as well. Your contribution will make a difference in the lives of children.

Please consider helping CFY – you can make a difference.

Posted Under: General, Refurbished IT Hardware
7 Reasons to Consider Refurbished IT Hardware

Posted by on 26 January, 2015

IT hardware procurement process can be a challenging one for any organization.  If you are an IT professional or a business owner, there are various options available that must be sorted through to meet key priorities and requirements.  When it comes to buying IT hardware, refurbished equipment is a viable option to consider seriously. It provides an array of undeniable benefits including performance, quality and flexibility at great price points.  Following are seven notable benefits your organization can rely on when opting for refurbished IT equipment.

Cost

Companies can procure refurbished IT equipment at a mere fraction of OEMs’ pricing. Opting for refurbished IT hardware can help stretch budget, afford larger projects, and even have extra hardware on hand in case of disaster recovery or if any backup is necessary.

The latest and highest end technology is not always an affordable option for small businesses, schools, and nonprofits. However, by choosing refurbished IT hardware, one can gain access to the latest technology regardless of their budget.

Refurbished hardware is an excellent way for organizations to increase buying power while benefiting substantial cost savings.

Quality

IT refurbishers go above and beyond when it comes to quality control. Experienced, trained and certified technicians rigorously test, diagnose and refurbish all IT hardware to ensure that its performance – both functionally and cosmetically – rivals that of any brand-new computer.

Microsoft registered refurbishers (MRR) are an elite group of refurbishers who take quality to whole new level by following Microsoft’s certified refurbishing processes. The MRR certification enables refurbishers to load and authenticate Windows OS legally on any Windows-based machine.

Sustainability

Refurbished IT hardware is very eco-friendly. If “going green” is a priority in your technology choices, buying refurbished IT hardware is an ideal decision. Refurbishing and reusing not only prevents electronics from ending up in landfills, but also eliminates the need to manufacture new electronics.

Buying and using refurbished equipment is a form of electronic recycling that offers numerous benefits to both the organization using it and the environment.

Flexibility

IT hardware refurbishers will work within and according to a customer’s needs and requirements as well as their limitations. Typically, this much flexibility is not available when buying directly from traditional retailers.

Refurbishers can customize specs to meet exact technology hardware requirements and offer a variety of prices to meet virtually any budget. They also offer flexible warranty, extended coverage options, payment options and terms, such as PayPal, net terms and more.

Warranty

IT refurbishers can offer among the best warranties available today. In many cases, they provide hassle-free advance replacements, which mean replacement product will be shipped out before receiving the product being returned. This system offers a level of convenience and customer service that simply cannot be found when buying directly from OEMs. IT refurbishers offer flexible warranty options and extended warranty coverages as well.

Obtain Hard to Find or Obsolete Equipment

Sometimes, finding legacy equipment can be very challenging. Refurbishers are well-equipped sources of OEM discontinued hardware, which is helpful for companies running proprietary software and hardware that sometimes requires older hardware.

Selection

When compared to OEMs, you’ll find many IT hardware refurbishers offer a much larger inventory pool, including brands such as Apple, Dell, HP, Lenovo and more.

Clearly, these advantages point to one undeniable conclusion: refurbished IT hardware can provide customers with substantial flexibility, service and savings. Whether you are a small business, educational institution, nonprofit or part of any organization that requires IT equipment to function, an IT refurbisher can provide one-stop-shopping for all of your IT needs.

Posted Under: Refurbished IT Hardware
Page 11 of 11« First...7891011

Social Media

Bulk Deals

Subscribe for exclusive Deals

Recent Post

Facebook

Twitter

Subscribe for exclusive Deals




Copyright 2015 - InnovatePC - All Rights Reserved

Site Design By Digital web avenue