Dell Latitude E6410 Notebook| Quantity Available: 40+
This post is intended for businesses and other organizations interested... Read more →
Posted by Richy George on 28 June, 2023
Databricks on Wednesday introduced a new version of its data lakehouse offering, dubbed Delta Lake 3.0, in order to take on the rising popularity of Apache Iceberg tables used by rival Snowflake.
As part of Delta Lake 3.0, the company has introduced a new universal table format, dubbed UniForm, that will allow enterprises to use the data lakehouse with other table formats such as Apache Iceberg and Apache Hudi, the company said.
A data lakehouse is a data architecture that offers both storage and analytics capabilities, in contrast to the concepts for data lakes, which store data in native format, and data warehouses, which store structured data (often in SQL format).
UniForm eliminates the need for manually converting files from different data lakes and data warehouses while conducting analytics or building AI models, Databricks said.
The new table format, according to analysts, is Databricks’ strategy to connect its data lakehouse with the rest of the world and take on rival Snowflake, especially on the backdrop of Apache Iceberg garnering more multivendor support in the past few years.
“With UniForm, Databricks is essentially saying, if you can’t beat them, join them,” said Tony Baer, principal analyst at dbInsight, likening the battle between the table formats to the one between Apple’s iOS and Google’s Android operating system.
However, Baer believes that the adoption of lakehouses will depend on the ecosystem they provide and not just table formats.
“Adoption of data lakehouses is still very preliminary as the ecosystems have only recently crystallized, and most enterprises are still learning what lakehouses are,” Baer said, adding that lakehouses may see meaningful adoption a year from now.
Contrary to Baer, Databricks said its Delta Lake has seen nearly one billion downloads in a year. Last year, the company open sourced its Delta Lake offering and this according to the company has seen the lakehouse get updates from contributing engineers from AWS, Adobe, Twilio, eBay, and Uber.
As part of Delta Lake 3.0, the company has also introduced two other features — Delta Kernel and a liquid clustering feature.
According to Databricks, Delta Kernel addresses connector fragmentation by ensuring that all connectors are built using a core Delta library that implements Delta specifications.
This alleviates the need for enterprise users to update Delta connectors with each new version or protocol change, the company said.
Delta Kernel, according to SanjMo principal analyst Sanjeev Mohan, is like a connector development kit that abstracts many of the underlying details and instead provides a set of stable APIs.
“This reduces the complexity and time to build and deploy connectors. We expect that the system integrators will now be able to accelerate development and deployment of connectors, in turn further expanding Databricks’ partner ecosystem,” Mohan said.
Liquid clustering has been introduced to address performance issues around data read and write operations, Databricks said.
In contrast to traditional methods such as Hive-style partitioning that increases data management complexity due to its use of a fixed data layout to improve read and write performance, liquid clustering offers a flexible data layout format that Databricks claims will provide cost-efficient clustering as data increases in size.
Next read this:
Posted by Richy George on 27 June, 2023
Cloud-based data warehouse company Snowflake is shifting its attention toward large language models and generative AI. Launched in 2014 with a focus on disrupting the traditional data warehouse market and big-data analytics, the company has continued to add new features, such as its Native Application Framework, to target different sets of enterprise users.
At its annual Snowflake Summit Tuesday, the company announced Snowpark Container Services, a partnership with Nvidia, and updates to its Streamlit Python library designed to help enterprise users manage large language models (LLMs) and build applications using them from within its Data Cloud Platform.
Snowpark Container Services, currently in private preview, will allow enterprises to bring more diverse workloads, including LLMs, to the Data Cloud Platform, said Christian Kleinerman, senior vice president of product at Snowflake, adding that it also allows developers to build applications in any programming language.
The new container services acts as a linchpin, connecting enterprise data stored in Snowflake with LLMs, model training interfaces, model governance frameworks, third-party data augmenting applications, machine learning models, APIs, and Snowflake’s Native Application Framework.
“Snowpark Containerized Services will help companies to move workloads, such as machine learning models or LLMs, between public and private cloud based on the client’s preferences,” said Hyoun Park, lead analyst at Amalgam Insights.
The process of moving workloads securely will become increasingly important as enterprises discover that the massive data entry and usage associated with training LLMs and other machine learning models are potential compliance risks, causing them to move these models to governed and isolated systems, Park added.
Container Services will also help reduce the burden on Snowflake’s data warehousing engine as it will run in an abstracted Kubernetes environment, according to Doug Henschen, principal analyst at Constellation Research.
“Simply put, it is a way to run an array of application services directly on Snowflake data but without burdening the data warehouses and performance sensitive analytical applications that run on them,” Henschen said.
In order to help enterprises train LLMs with data they have stored in Snowflake, the company has partnered with Nvidia to gain access to its AI Platform, which combines hardware and software capabilities. Snowflake will run Nvidia NeMo, a part of the AI Platform, from within the Data Cloud, the company said, adding that NeMo can be used for developing generative AI-based applications such as chatbots and intelligent search engines.
In addition, Snowpark Container Services will allow enterprises to gain access to third-party generative AI model providers such as Reka AI, said Sanjeev Mohan, principal analyst at SanjMo.
Other LLMs, such as those from OpenAI, Cohere and Anthropic, also can be accessed via APIs, Mohan said.
Snowflake’s updates reveal a strategy that is aimed at taking on Databricks, analysts said.
“Databricks is currently offering far more capabilities for building native AI, ML [machine learning] models than Snowflake, especially with the MosiacML acquisition that promises abilities to train models cheaper and faster,” said Andy Thurai, principal analyst at Constellation Research.
The difference in strategy between the two companies, according to dbInsights’ principal analyst Tony Baer, seems to be their approach in expanding their user bases.
“Snowflake is seeking to extend from its base of data and BI developers to data scientists and data engineers, while Databricks is approaching from the opposite side,” Baer said.
The new Container Services will allow enterprises to access data-augmenting and machine learning tools, such as Hex’s notebooks for analytics and data science, AI tools from Alteryx, Dataiku, and SAS, along with a data workflow management tool from Astronomer that is based on Apache Airflow, the company said. Third-party software from Amplitude, CARTO, H2O.ai, Kumo AI, Pinecone, RelationalAI, and Weights & Biases are also available.
Snowflake also said that it was releasing a self-developed LLM, dubbed Document AI, designed to generate insights from documents.
Document AI, which is built on technology from Snowflake’s acquisition of Applica last year, is targeted at helping enterprises make more use of unstructured data, the company said, adding that the new LLM can help enhance enterprise productivity.
DbInsights’ Baer believes that the addition of the new LLM is a step to keep pace with rival offerings from the stables of AWS, Oracle, and Microsoft.
In order to help enterprises with machine learning model operations (MLOps), Snowflake has introduced the Snowpark Model Registry.
The registry, according to the company, is a unified repository for an enterprise’s machine learning models. It’s designed to enable users to centralize the publishing and discovery of models, thereby streamlining collaboration between data scientists and machine learning engineers.
Although rivals such as AWS, Databricks, Google Cloud and Microsoft offer MLOps tools already, analysts see the new Model Registry as an important update.
“Model registries and repositories are one of the new great battlefields in data as companies choose where to place their treasured proprietary or commercial models and ensure that the storage, metadata, and versioning are appropriately governed,” Park said.
In addition, Snowflake is also advancing the integration of Streamlit into its Data Cloud Platform, bringing it into public preview for a final fine-tuning before its general release.
Further, the company said that it was extending the use of Apache Iceberg tables to an enterprise’s own storage.
Other updates, mostly targeted at developers, include the integration of Git and a new command line interface (CLI) inside the Data Cloud Platform, both of which are in private preview.
While the native Git integration is expected to support CI/CD workflows, the new CLI will aid in application development and testing within Snowflake, the company said.
In order to help developers ingest streaming data and eliminate the boundaries between batch and streaming pipelines, Snowflake also unveiled new features in the form of Dynamic Tables and Snowpipe Streaming.
While Snowpipe Streaming is expected to be in general availability soon, Dynamic Tables is currently in public preview.
Snowflake also said that is Native Application Framework was now in public preview on AWS.
Next read this:
Posted by Richy George on 27 June, 2023
Much of the data accumulated in today’s world is in JSON (JavaScript Object Notation) format. However, many of the databases designed with a JSON-first mindset have not been able to provide the sort of in-app analytics available in classic SQL systems, leaving a huge gap in the amount of the world’s data that is able to be analyzed in real time. In an era when even millisecond lags are too slow, this is a gap in the market that needs to be addressed.
SingleStore Kai API for MongoDB is intended to solve this problem, and to do so in a way that is simple and straightforward. Let’s take a closer look at the key features of SingleStore Kai.
With SingleStore Kai, you can perform complex analytics on JSON data for MongoDB applications faster and more efficiently. On some benchmarks, SingleStoreDB was able to drive 100x faster analytical performance for most queries. How is this speed boost achieved? The SingleStore MongoDB API proxy translates MongoDB queries into SQL statements that are executed by SingleStoreDB to power real-time analytics for your applications.
The new era of generative AI requires real-time analytics on all data, including JSON collections. SingleStoreDB supports vectors and fast vector similarity search using the $dotProduct
and $euclideanDistance
functions. With SingleStore Kai, developers can harness the vector and AI capabilities on JSON collections within MongoDB, enabling use cases like semantic search, image recognition, and similarity matching.
Developers can continue to use existing MongoDB queries. They don’t have to normalize or flatten data, or do extensive schema migrations to power fast analytics for their applications. SingleStore Kai requires no code changes, data transformations, schema migrations, or changes to existing queries.
SingleStore Kai supports the MongoDB wire protocol and allows MongoDB clients to communicate with a SingleStoreDB cluster. Developers can take advantage of fast analytics on SingleStoreDB without having to learn a new set of tools or APIs. And they can continue to use the same MongoDB tools and drivers their customers are most familiar with.
SingleStoreDB was already MySQL wire protocol compatible. With the addition of SingleStore Kai for MongoDB, the database gives developers essentially the best of both worlds—the schema flexibility and simplicity of a JSON document store and the speed, efficiency, and complex analytical capabilities of a relational SQL database.
As part of this MongoDB API offering, SingleStoreDB includes a fast and efficient replication service (in private preview) that copies MongoDB collections into SingleStoreDB. This service is natively integrated into SingleStoreDB and leverages one of SingleStore’s most widely used features, SingleStore Pipelines, to drive speedy replication and real-time change data capture, enabling customers to get started quickly and easily.
Real-time data and real-time analytics play a crucial role in modern business. With SingleStore Kai, regardless of whether you traditionally work in SQL or NoSQL, you now have the ability to do real-time analytics on the majority of data in our fast-moving world.
Jason Thorsness is a principal software engineer at SingleStore.
—
New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.
Next read this:
Posted by Richy George on 27 June, 2023
When I wrote about the GitHub Copilot preview in 2021, I noted that the AI pair programmer didn’t always generate good, correct, or even running code, but was still somewhat useful. At the time, I concluded that future versions could be real time-savers. Two years later, Copilot is improving. These days, it costs money even for individuals, and it has some competition. In addition, the scope of coding assistants has expanded beyond code generation to code explanations, pull request summaries, security scanning, and related tasks.
Let’s start with a quick overview of the tools under review, then we’ll dive in for a closer look at each one.
Using one of these code generators is not the only way to generate code. To begin with, you can access general-purpose transformers like GPT-4 and its predecessors, including ChatGPT, BingGPT/Bing Chat (available in the Edge browser), and Boo.ai. There are also other code-specific AI tools, such as StarCoder, Tabnine, Cody, AlphaCode, Polycoder, and Replit Ghostwriter. In every case I’ve mentioned, it is vital to use discretion and carefully test and review the generated code before using it.
In my previous article about code generation, I evaluated the AI code generators based on the rather easy task of writing a program to determine the number of days between two dates. Most did okay, although some needed more guidance than others. For this review, I tried the code generators on the more difficult task of scraping InfoWorld.com for a list of articles. I gave them an outline but no additional help. None generated correct code, although some came closer than others. As an additional task, I asked the tools that support code explanation to explain a Python code example from an MIT Open Courseware introductory programming course.
For reference, the outline I gave to the code generators is:
Scrape infoworld.com front page:
Open https://www.infoworld.com/
Find all articles by looking for links with ‘article’ in the href; extract title, author, date from each
List all articles alphabetically by title; eliminate duplicates
List all articles alphabetically by author last name
List all articles latest first
In general, I tried to act like a more naive programmer than I am, just to see what the tools would do.
Now, let’s look more closely at each of our code generators.
Within your IDE, Amazon CodeWhisperer analyzes your English language comments and surrounding code to infer what code it should generate to complete what you are typing. Then, it offers code as a suggestion that you can either accept or reject, or you can ask CodeWhisperer for alternate code, or ignore and continue writing your own code. CodeWhisperer’s large language model (LLM) was trained on billions of lines of code, including Amazon and open source code. Any given suggestion is based not only on your comments and immediate code context, but also on the contents of other files open in the IDE.
In addition to code generation, CodeWhisperer can scan your Python, Java, and JavaScript code for security vulnerabilities and suggest fixes for them. The vulnerability lists it consults include Open Web Application Security Project (OWASP), crypto library best practices, AWS API best practices, and other API best practices. Security scans occur on-demand, unlike code completion, which is offered continuously as you code unless you turn off suggestions.
CodeWhisperer’s best programming languages for code generation (the most prevalent languages in the training corpus) are Java, Python, JavaScript, TypeScript, and C#. It has been trained to a lesser extent on Ruby, Go, PHP, C++, C, Shell, Scala, Rust, Kotlin, and SQL.
There are CodeWhisperer plugins for Visual Studio Code and JetBrains IDEs. You can also activate CodeWhisperer for use inside AWS Cloud9 and AWS Lambda; in both cases, you must edit your IAM permissions as well as checking the Enable CodeWhisperer box.
I installed CodeWhisperer in Visual Studio Code. There are four steps:
Figure 1. Once you have installed and authorized CodeWhisperer (at left), you can see the usage instructions, shown here on the right.
It’s worth watching some of the videos demonstrating CodeWhisperer’s capabilities, listed on the CodeWhisperer resource page. They’ll tell you more than I can in words and screenshots. While watching them, it became clear to me that CodeWhisperer shines on code that calls AWS APIs.
Nevertheless, here’s a screenshot of the generated code from CodeWhisperer:
Figure 2. CodeWhisperer generating code.
Note that the generated code did not work, and also that CodeWhisperer generated multiple functions for more or less the same purpose. It was fairly close, however, and CodeWhisperer kind of read my mind about using Beautiful Soup to scrape the web page.
Figure 3. Trying the CodeWhisperer-generated program.
I ran the CodeWhisperer security scan on my project, as well. Unsurprisingly, no issues were detected in this file, although the CodeWhisperer scan found a resource leak in another file generated by GitHub Copilot X. The CodeWhisperer security scan is a differentiating feature that hasn’t yet surfaced in the competing AI code generators.
This file did not trigger any open source references, but when I opened the CodeWhisperer Reference Log, I found a link to the setting that controls whether CodeWhisperer will suggest code that has such references. In the Amazon examples I viewed, code suggestions with references displayed the license type at the top.
CodeWhisperer generates code in an IDE and does security reviews, but it lacks a chat window and code explanations. The tool works more or less as advertised, and the price (free) is right for individual developers. For corporate developers, the $19 per month price for CodeWhisperer is the same as for Copilot. I’d favor CodeWhisperer over Copilot for shops that use AWS heavily and shops that need to know when code suggestions refer to open source.
Bard support for programming was announced April 21, 2023. The blog post mentioned “more than 20 programming languages including C++, Go, Java, Javascript, Python and Typescript.”
When I asked Bard for the full list, I received the following response:
Bard supports over 20 programming languages, including:
- C++
- Go
- Java
- JavaScript
- Python
- TypeScript
- C#
- PHP
- Ruby
- Swift
- Kotlin
- Rust
- Julia
- R
- Haskell
- Erlang
- Scala
- Lua
- OCaml
- Scheme
Bard can generate code in these languages, debug code, and explain code. It can also translate from one programming language to another.
Please note that Bard is still under development, so its capabilities may not be perfect. It is always a good idea to review AI-generated code carefully before using it.
I tested Bard on my code explanation and code generation tasks, which are shown below. The code explanation was quite good—so good that I wondered whether Bard had been trained on that MIT course code sample, which dates from 2008 and has been available on the web since then.
The code generation for the site scraper looked fairly credible at first glance but had several obvious problems on further inspection. It was a matter of clicking twice to open the code in Colab, but it threw a type error when I tried to run it, and it didn’t even get as far as the code I had identified as wrong.
Rather than fixing the problems, I tried running the code generation again, and the result made a bit more sense. It threw an attribute error when I tried to run it in Colab, however. I fed that traceback back to Bard, it generated a fix, I ran the “fixed” code in Colab, and there was a type error in a different spot. Enough, already! If I was actually a naive programmer, I would have thrown up my hands.
What about citing web pages and open source code? The Bard FAQ had this to say:
Bard, like some other standalone LLM experiences, is intended to generate original content and not replicate existing content at length. We’ve designed our systems to limit the chances of this occurring, and we will continue to improve how these systems function. If Bard does directly quote at length from a webpage, it cites that page.
Sometimes the same content may be found on multiple webpages and Bard attempts to point to a popular source. In the case of citations to code repositories, the citation may also reference an applicable open source license.
Given that Bard is completely free at this time, and requires no setup, it’s certainly worth trying when you need to generate, debug, and explain code in any of these 20 languages. Don’t forget to review, test, and debug any generated code.
Here is the instruction I gave to Bard and the resulting dialog:
I'm about to give you some code to explain. Please wait until the code is complete.
Sure, I will wait until you are finished giving me the code. Please feel free to take your time. I am here to help you in any way that I can.
Here are some tips for explaining code:
• Start by explaining the overall purpose of the code. What does it do? What problem is it trying to solve?
• Then, break down the code into smaller chunks and explain each one in turn.
• Use clear and concise language. Avoid jargon and technical terms.
• Use examples to illustrate your points.
• Be patient and don't be afraid to ask questions.
I am confident that I can help you explain any code that you give me. Just let me know what you need and I will do my best to help.
Page 2
And here is the code I provided:
import pylab, random
class Stock(object):
def __init__(self, price, distribution, vol):
self.price = price
self.history = [price]
self.distribution = distribution
self.vol = vol
self.lastChangeInfluence = 0.0
def setPrice(self, price):
self.price = price
self.history.append(price)
def getPrice(self):
return self.price
def makeMove(self, bias, mo):
oldPrice = self.price
baseMove = self.distribution(self.vol) + bias
self.price = self.price * (1.0 + baseMove)
self.price += mo*random.choice([0.0, 1.0])*self.lastChangeInfluence
self.history.append(self.price)
change = self.price - oldPrice
if change >= 0:
self.lastChangeInfluence = min(change, oldPrice*0.01)
else:
self.lastChangeInfluence = max(change, -oldPrice*0.01)
def showHistory(self, fig, test):
pylab.figure(fig)
pylab.plot(self.history)
pylab.title('Closing Prices, Test ' + test)
pylab.xlabel('Day')
pylab.ylabel('Price')
class SimpleMarket(object):
def __init__(self, numStks, volUB):
self.stks = []
self.bias = 0.0
for n in range(numStks):
volatility = random.uniform(0, volUB)
distribution = lambda vol: random.gauss(0.0, vol)
stk = Stock(100.0, distribution, volatility)
self.addStock(stk)
def addStock(self, stk):
self.stks.append(stk)
def setBias(self, bias):
self.bias = bias
def getBias(self):
return self.bias
def getStocks(self):
return self.stks[:]
def move(self, mo):
prices = []
for s in self.stks:
s.makeMove(self.bias, mo)
prices.append(s.getPrice())
return prices
class Market(SimpleMarket):
def __init__(self, numStks, volUB, dailyBiasRange):
SimpleMarket.__init__(self, numStks, volUB)
self.dailyBiasRange = dailyBiasRange
def move(self, mo):
prices = []
dailyBias = random.gauss(self.dailyBiasRange[0], self.dailyBiasRange[1])
for s in self.stks:
s.makeMove(self.bias + dailyBias, mo)
prices.append(s.getPrice())
return prices
def simMkt(mkt, numDays, mo):
endPrices = []
for i in range(numDays):
vals = mkt.move(mo)
vals = pylab.array(vals)
mean = vals.sum()/float(len(vals))
endPrices.append(mean)
return endPrices
def plotAverageOverTime(endPrices, title):
pylab.plot(endPrices)
pylab.title(title)
pylab.xlabel('Days')
pylab.ylabel('Price')
def plotDistributionAtEnd(mkt, title, color):
prices = []
sumSoFar = 0
for s in mkt.getStocks():
prices.append(s.getPrice())
sumSoFar += s.getPrice()
mean = sumSoFar/float(len(prices))
prices.sort()
pylab.plot(prices, color)
pylab.axhline(mean, color = color)
pylab.title(title)
pylab.xlabel('Stock')
pylab.ylabel('Last Sale')
pylab.semilogy()
def runTrial(showHistory, test, p):
colors = ['b','g','r','c','m','y','k']
mkt = Market(p['numStocks'], p['volUB'], p['dailyBiasRange'])
mkt.setBias(p['bias'])
endPrices = simMkt(mkt, p['numDays'], p['mo'])
pylab.figure(1)
plotAverageOverTime(endPrices, 'Average Closing Prices')
pylab.figure(2)
plotDistributionAtEnd(mkt, 'Distribution of Prices', colors[test%len(colors)])
if showHistory:
for s in mkt.getStocks():
s.showHistory(test+2, str(test))
def runTest(numTrials):
#Constants used in testing
numDaysPerYear = 200.0
params = {}
params['numDays'] = 200
params['numStocks'] = 500
params['bias'] = 0.1/numDaysPerYear #General market bias
params['volUB'] = 12.0/numDaysPerYear #Upper bound on volatility for a stock
params['mo'] = 1.1/numDaysPerYear #Momentum factor
params['dailyBiasRange'] = (0.0, 4.0/200.0)
for t in range(1, numTrials+1):
runTrial(True, t, params)
runTest(3)
pylab.show()
Here is some of Bard’s response, which was quite in depth:
The code you have provided is a Python program that simulates a stock market. The program defines three classes:
Stock: This class represents a single stock. It has properties for the stock's price, history, distribution, volatility, and last change influence.
SimpleMarket: This class represents a simple stock market. It has a list of stocks, a bias, and a volatility upper bound.
Market: This class inherits from SimpleMarket and adds a daily bias range property. The program also defines several functions:
• simMkt: This function simulates the market for a given number of days and momentum factor.
• plotAverageOverTime: This function plots the average closing price of the stocks over time.
• plotDistributionAtEnd: This function plots the distribution of prices at the end of the simulation.
• runTrial: This function runs a single simulation and plots the results.
• runTest: This function runs a number of simulations and plots the results.
The program is run by calling the runTest function with the desired number of simulations. The program will then generate three plots:
• A plot of the average closing price of the stocks over time.
• A plot of the distribution of prices at the end of the simulation.
• A plot of the history of prices for each stock.
The program can be used to simulate a variety of different stock markets. By changing the values of the parameters, you can change the volatility of the market, the bias of the market, and the momentum factor. You can also use the program to simulate different trading strategies. For example, you could simulate a strategy that buys stocks when they are undervalued and sells them when they are overvalued.
Bard also provided an explanation of how each part of the code worked:
• The Stock class represents a single stock. It has properties for the stock's price, history, distribution, volatility, and last change influence. The price property is the current price of the stock. The history property is a list of the stock's prices over time. The distribution property is a function that generates random prices for the stock. The volatility property is a measure of how much the stock's price can fluctuate. The lastChangeInfluence property is a measure of how much the stock's price is affected by its previous change.
• The SimpleMarket class represents a simple stock market. It has a list of stocks, a bias, and a volatility upper bound. The list of stocks property is a list of the stocks in the market. The bias property is a measure of how much the market is biased towards rising prices. The volatility upper bound property is a measure of how much the market can fluctuate.
• The Market class inherits from SimpleMarket and adds a daily bias range property. The daily bias range property is a measure of how much the market's bias can change from day to day.
• The simMkt function simulates the market for a given number of days and momentum factor. The number of days parameter is the number of days to simulate. The momentum factor parameter is a measure of how much the prices of stocks are affected by their momentum.
• The plotAverageOverTime function plots the average closing price of the stocks over time. The plotDistributionAtEnd function plots the distribution of prices at the end of the simulation.
• The runTrial function runs a single simulation and plots the results. The showHistory parameter is a boolean value that determines whether the history of prices for each stock is plotted. The test parameter is a unique identifier for the simulation.
• The runTest function runs a number of simulations and plots the results. The numTrials parameter is the number of simulations to run.
I hope this explanation is helpful. Please let me know if you have any other questions.
When I followed up, typing in, “where is the pylab module?” Bard responded with the location of the module, how to import it, a description of features, and a pointer to the documentation.
Additionally, here is the Bard-generated code for scraping InfoWorld.com for a list of articles:
import requests
from bs4 import BeautifulSoup
def scrape_infoworld():
"""Scrape the InfoWorld front page and return a list of articles."""
# Get the HTML from the InfoWorld front page.
response = requests.get("https://www.infoworld.com/")
soup = BeautifulSoup(response.content, "html.parser")
# Find all articles by looking for links with 'article' in the href.
articles = soup.find_all("a", href=lambda x: "article" in x["href"])
# Extract the title, author, and date from each article.
article_data = []
for article in articles:
title = article.find("h3").text
author = article.find("span", class_="byline").text
date = article.find("time").text
article_data.append({
"title": title,
"author": author,
"date": date,
})
# Sort the articles alphabetically by title.
article_data.sort(key=lambda x: x["title"])
# Remove any duplicate articles.
article_data = list(set(article_data))
# Sort the articles alphabetically by author last name.
article_data.sort(key=lambda x: x["author"].split()[-1])
# Sort the articles latest first.
article_data.sort(key=lambda x: x["date"], reverse=True)
return article_data
if __name__ == "__main__":
articles = scrape_infoworld()
# Print the articles.
for article in articles:
print(article)
Explanation:
This code will first get the HTML from the InfoWorld front page. Then, it will find all articles by looking for links with 'article' in the href. For each article, it will extract the title, author, and date. The articles will then be sorted alphabetically by title, removed of any duplicates, sorted alphabetically by author last name, and sorted latest first. Finally, the articles will be printed.
And here is the runtime result:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-afbe7808577e> in <cell line: 40>()
39
40 if __name__ == "__main__":
---> 41 articles = scrape_infoworld()
42
43 # Print the articles.
6 frames
<ipython-input-1-afbe7808577e> in <lambda>(x)
10
11 # Find all articles by looking for links with 'article' in the href.
---> 12 articles = soup.find_all("a", href=lambda x: "article" in x["href"])
13
14 # Extract the title, author, and date from each article.
TypeError: string indices must be integers
Bard has a chat interface and both generates and explains code, but it doesn’t have an interactive IDE integration. Given that Bard is completely free at this time, and requires no setup, it’s certainly worth trying when you need to generate, debug, and explain code in any of the 20 supported languages.
GitHub Copilot X is greatly improved over the original GitHub Copilot, and can sometimes generate a correct function and set of tests without much human help. It still makes mistakes and hallucinates (generates false information), but not nearly as much as it once did.
In addition to generating code within a programming editor, currently supporting only the most current versions of Visual Studio and the latest insider version of Visual Studio Code, Copilot X adds a GPT-4 chat panel to the editor. It also adds a terminal interface, support for generating unit tests and pull request descriptions, and the ability to extract explanations from documentation.
I asked the Copilot X chat what programming languages it supports, and it answered “̉I support a wide range of programming languages, including but not limited to: Python, JavaScript, TypeScript, Ruby, Java, C++, C#, PHP, Go, Swift, Kotlin, Rust, and many more.” I did my testing primarily in Python.
When I used the Copilot Chat facility to ask Copilot X to explain the MIT market simulation code, it gave a partially correct answer. I had to metaphorically pull its teeth to get it to explain the rest of the code.
Figure 4. Copilot X did a decent but incomplete job of explaining the market simulator.
Copilot X’s most notable failure was the web-scraping code generation task. The tool generated a bunch of superficially credible-looking code that didn’t use Beautiful Soup, but it was clear from reviewing the code that it would never work. I kept bringing the problems to Copilot Chat, but it just dug itself a deeper hole. I could probably have started over and given it better hints, including handing it an import from bs4
and adding some comments showing the HTML and directory structure of the InfoWorld home page. I didn’t do it because that would not be in character for the naive coder persona I had adopted for this round of tests.
Figure 5. Copilot X tried to generate the web scraping code without using Beautiful Soup (bs4). Later when I chatted about the solution it generated, it first claimed that it was using Beautiful Soup, but then admitted that it could not find an import.
As with all AI helpers, you have to take the code generated by Copilot X with a huge grain of salt, just as you would for a pull request from an unknown programmer.
In addition to generating code within an IDE, Copilot X adds a GPT-4 chat panel to the editor. It also adds a terminal interface, support for unit test generation, support for generating pull request descriptions, and the ability to extract explanations from technical documentation. Copilot X costs $10 per month for individuals and $19 per user per month for businesses.
GitHub Copilot X works decently on simple problems, but not necessarily better than the combination of Amazon CodeWhisperer in a code editor and Google Bard in a browser. It’s too bad that CodeWhisperer doesn’t yet have a chat capability or the facility for explaining code, and it’s too bad that Bard doesn’t exactly integrate with an editor or IDE.
I’d be tempted to recommend Copilot X if it hadn’t gone off the rails on my advanced code generation task—mainly because it integrates chat and code generation in an editor. At this point, however, Copilot X isn’t quite ready. Overall, none of the code generation products are really up to snuff, although both Bard and Copilot X do a decent job of code explanation.
All of these products are in active development, so my recommendation is to keep watching them and experimenting, but don’t put your faith in any of them just yet.
Next read this:
Posted by Richy George on 26 June, 2023
Data lakehouse provider Databricks on Monday said that it was acquiring large language model (LLM) and model-training software provider MosaicMLL for $1.3 billion in order to boost its generative AI offerings.
Databricks, which already offers an LLM named Dolly, is expected to add MosaicMLL’s models, training and inference capabilities to its lakehouse platform for enterprises to develop generative AI applications, the company said, underlining its open source LLM policy.
Dolly was developed on open data sets in order to cater to enterprises’ demand to control LLMs used to develop new applications, in contrast to closed-loop trained models, such as ChatGPT, that put constraints on commercial usage.
MosaicMLL’s models, namely MPT-7B and the recently released MPT-30B, are open source, putting them in line with Databricks’ existing policy.
Another advantage of these models, according to MosaicMLL, is the “zero human intervention” feature that allows the training systems to be automated.
“We trained MPT-7B with zero human intervention from start to finish: over 9.5 days on 440 GPUs, the MosaicML platform detected and addressed 4 hardware failures and resumed the training run automatically, and — due to architecture and optimization improvements we made — there were no catastrophic loss spikes,” MosaicMLL wrote in a blog post.
The deal calls for MosaicMLL’s entire team of over 60 employees, including co-founder CEO Naveen Rao, to move to Databricks, where they will continue to work on developing more foundation models, the companies said.
MosaicMLL’s existing customers, according to a company post, will still be able to access their LLMs and inference offerings. Existing customers include Allen Institute for AI, Generally Intelligent, Hippocratic AI, Replit and Scatter Labs. The San Francisco-based startup, which was founded in 2021, has raised nearly $64 million to date from investors including Lux Capital, DCVC, Future Ventures, Maverick Ventures, and Playground.
The $1.3 billion deal includes retention packages for MosaicMLL employees, Databricks said.
In May, the company acquired AI-centric data governance platform provider Okera for an undisclosed sum.
Databrick’s acquisition of MosaicMLL also comes just weeks after a rival, Snowflake, acquired Mountain View-based AI startup Neeva in an effort to add generative AI-based search to its Data Cloud platform.
Next read this:
Posted by Richy George on 26 June, 2023
Data lakehouse provider Databricks on Monday said that it was acquiring large language model (LLM) and model-training software provider MosiacML for $1.3 billion in order to boost its generative AI offerings.
Databricks, which already offers an LLM named Dolly, is expected to add MosiacML’s models, training and inference capabilities to its lakehouse platform for enterprises to develop generative AI applications, the company said, underlining its open source LLM policy.
Dolly was developed on open data sets in order to cater to enterprises’ demand to control LLMs used to develop new applications, in contrast to closed-loop trained models, such as ChatGPT, that put constraints on commercial usage.
MosiacML’s models, namely MPT-7B and the recently released MPT-30B, are open source, putting them in line with Databricks’ existing policy.
Another advantage of these models, according to MosiacML, is the “zero human intervention” feature that allows the training systems to be automated.
“We trained MPT-7B with zero human intervention from start to finish: over 9.5 days on 440 GPUs, the MosaicML platform detected and addressed 4 hardware failures and resumed the training run automatically, and — due to architecture and optimization improvements we made — there were no catastrophic loss spikes,” MosiacML wrote in a blog post.
The deal calls for MosiacML’s entire team of over 60 employees, including co-founder CEO Naveen Rao, to move to Databricks, where they will continue to work on developing more foundation models, the companies said.
MosiacML’s existing customers, according to a company post, will still be able to access their LLMs and inference offerings. Existing customers include Allen Institute for AI, Generally Intelligent, Hippocratic AI, Replit and Scatter Labs. The San Francisco-based startup, which was founded in 2021, has raised nearly $64 million to date from investors including Lux Capital, DCVC, Future Ventures, Maverick Ventures, and Playground.
The $1.3 billion deal includes retention packages for MosiacML employees, Databricks said.
In May, the company acquired AI-centric data governance platform provider Okera for an undisclosed sum.
Databrick’s acquisition of MosiacML also comes just weeks after a rival, Snowflake, acquired Mountain View-based AI startup Neeva in an effort to add generative AI-based search to its Data Cloud platform.
Next read this:
Posted by Richy George on 22 June, 2023
MongoDB on Thursday introduced new language support, easier installation of Atlas’ Kubernetes Operator, and a new Kotlin driver for its NoSQL Atlas database-as-a-service — all designed to streamline developer tasks, including work related to infrastructure management.
The new features were launched along with vector search and stream processing capabilities geared toward support for development of generative AI applications.
Noting that many developers want to use programming languages other than Javascript and Typescript to deploy Atlas on AWS, the company said that it was adding support for C#, Go, Java, and Python in order to help developers reduce the amount of time needed to manage infrastructure.
Typically, MongoDB developers have managed infrastructure-as-code (IaC) on AWS via the public cloud provider’s CloudFormation Public Registry, Partner Solution Deployments, and its Cloud Development Kit (CDK).
The company has also added support for Kotlin for developers building server-side applications. Previously, developers could use the MongoDB Realm Kotlin software development kit (SDK) for client-side development, but server-side developers relied on a community-created driver without official MongoDB support, or had to write extensive custom code, the company said.
“As a result, developers faced longer software development cycles to build server-side Kotlin applications on MongoDB and risked application reliability without a fully supported MongoDB Kotlin driver,” it added.
MongoDB is also providing an easier way to install the Atlas Kubernetes Operator — a tool that developers use to manage projects and database clusters.
“Using the MongoDB Atlas command line interface (CLI), developers can now install the MongoDB Atlas Kubernetes Operator and generate security credentials quickly in order to reduce operational overhead,” the company said, adding that developers will now have the option to import existing MongoDB Atlas projects and deployments with a single command.
The update, according to the company, is expected to provide greater agility for developers while working with containers.
While the company did not immediately provide information on the availability of the new features, it said that it was making the open source PyMongoArrow library generally available.
The library, according to the company, can be used to convert data stored on MongoDB using popular frameworks such as Apache Arrow Tables, Pandas, DataFrames and Numpy Arrays.
Next read this:
Posted by Richy George on 22 June, 2023
After trying to broaden its user base to include traditional database professionals last year, MongoDB is switching gears, adding features to turn its NoSQL Atlas database-as-a-service (DBaaS) into a more complete data platform for developers, including capabilities that support building generative AI applications.
In addition to introducing vector search for Atlas and integrating Google Cloud’s Vertex AI foundation models, the company announced a variety of new capabilities for the DBaaS at its MongoDB.local conference in New York Thursday, including new Atlas Search, data streaming, and querying capabilities.
“Everything that MongoDB has announced can be seen as a move to make Atlas a more comprehensive and complete data platform for developers,” said Doug Henschen, principal analyst at Constellation Research. “The more that MongoDB can provide to enable developers with all the tools that they need, the stickier the platform becomes for those developers and the enterprises they work for.”
Henschen’s perspective seem reasonable, given that the company has been competing with cloud data platform suppliers such as Snowflake, which offers a Native Application Framework, and Databricks, which recently launched Lakehouse Apps.
In an effort to help enterprise build applications based on generative AI from data stored in MongoDB, the company has introduced a vector search capability inside Atlas, dubbed Atlas Vector Search.
This new search capability, according to the company, will help support a new range of workloads, including semantic search with text, image search, and highly personalized product recommendations.
The search runs on vectors — multidimensional mathematical representations of features or attributes of raw data that could include text, images, audio or video, said Matt Aslett, research director at Ventana Research.
“Vector search utilizes vectors to perform similarity searches by enabling rapid identification and retrieval of similar or related data,” Aslett said, adding that vector search can also be used to complement large language models (LLMs) to reduce concerns about accuracy and trust through the incorporation of approved enterprise content and data.
MongoDB Atlas’ Vector Search will also allow enterprises to augment the capabilities of pretrained models such as GPT-4 with their own data via the use of open source frameworks such as LangChain and LlamaIndex, the company said.
These frameworks can be used to access LLMs from MongoDB partners and model providers, such as AWS, Databricks, Google Cloud, Microsoft Azure, MindsDB, Anthropic, Hugging Face and OpenAI, to generate vector embeddings and build AI-powered applications on Atlas, it added.
MongoDB’s partnership with Google Cloud to integrate Vertex AI capabilities is meant to accelerate the development of generative AI-based applications. Vertex AI, according to the company, will provide the text embedding API required to generate embeddings from enterprise data stored in MongoDB Atlas.
These embeddings can be later combined with the PaLM text models to create advanced functionality like semantic search, classification, outlier detection, AI-powered chatbots, and text summarization.
The partnership will also allow enterprises to get hands-on assistance from MongoDB and Google Cloud service teams on data schema and indexing design, query structuring, and fine-tuning AI models.
Databases from Dremio, DataStax and Kinetica are also adding generative AI capabilities.
MongoDB’s move to add vector search to Atlas is not unique but it will enhance the company’s competitiveness, Aslett said. “There is a growing list of specialist vector database providers, while multiple vendors of existing databases are working to add support to bring vector search to data already stored in their data platforms,” Aslett said.
In order to help enterprises manage real-time streaming data from multiple sources in a single interface, MongoDB has added a stream processing interface to Atlas.
Dubbed Atlas Stream Processing, the new interface, which can process any kind of data and has a flexible data model, will allow enterprises to analyze data in real-time and adjust application behavior to suit end customer needs, the company said.
Atlas Stream Processing bypasses the need for developers to use multiple specialized programming languages, libraries, application programming interfaces (APIs), and drivers, while avoiding the complexity of using these multiple tools, MongoDB claimed.
The new interface, according to Aslett, helps developers to work with both streaming and historical data using the document model.
“Processing data as it is ingested enables data to be queried continuously as new data is added, providing a constantly updated, real-time view that is triggered by the ingestion of new data,” Aslett said.
A report from Ventana Research claims that more than seven in 10 enterprises’ standard information architectures will include streaming data and event processing by 2025, so that they can provide better customer experiences.
Atlas Stream Processing, according to SanjMo’s principal analyst Sanjeev Mohan, can also be used by developers to perform functions like aggregations, as well as filter and do anomaly detection on data that is in Kafka topics, Amazon Kinesis or even MongoDB change data capture.
The flexible data model inside Atlas Stream Processing can also be modified over time to suit needs, the company said.
The addition of the new interface to Atlas can be seen as a move to play catchup with rival data cloud providers such as Snowflake and Databricks, which have already introduced features for processing real-time data, noted Constellation’s Henschen.
In order to help enterprises to maintain database and search performance on Atlas, the company has introduced a new feature, dubbed Atlas Search Nodes, that isolates search workloads from database workloads.
Targeted at enterprises that have already scaled their search workloads on MongoDB, Atlas Search Nodes provides dedicated resources and optimizes resource utilization to support performance of these specific workloads, including vector search, the company said.
“Enterprises may find that dedicating nodes in a cluster, specifically to search, can support operational efficiency by avoiding performance degradation on other workloads,” Aslett said, adding that this is a capability that was being adopted by multiple providers of distributed databases.
MongoDB’s updates to Atlas also include a new time-series data editing feature that the company claims is usually not allowed in most time-series databases.
The company’s Time Series Collections features will now allow enterprises to modify time-series data resulting in better storage efficiency, accurate results, and better query performance, the company said.
The feature to modify time-series data will help most enterprises, according to Mohan.
Other updates to MongoDB Atlas include the ability to tier and query databases on Microsoft Azure using the Atlas Online Archive and Atlas Data Federation features, the company said, adding that Atlas already supported tiering and querying on AWS.
As part of the updates announced at its MongoDB.local conference, the company said that it will be launching a new industry-specific Atlas database program for financial services, followed by other industry sectors such as retail, healthcare, insurance, manufacturing and automotive.
These industry-specific programs will see the company offer expert-led architectural design reviews, technology partnerships via workshops and other instruments for enterprises to build vertical-specific solutions. The company will also offer tailored MongoDB University courses and learning materials to enable developers for their enterprise projects.
While the company did not immediately provide information on the availability and pricing of the new features, it said that it was making its Relational Migrator tool generally available.
The tool is designed to help enterprises move their legacy databases to modern document-based databases.
Next read this:
Posted by Richy George on 22 June, 2023
MySQL is the most popular open source database in the world, according to DB-Engines, and it has ranked as the second most popular database overall for more than a decade. MySQL fueled the rise of the LAMP stack and has been a trusty companion to many a developer and DBA over the years. In October 2023, version 5.7 will reach end of life status, meaning that this version will no longer receive updates or security patches.
This is significant because, with four months to go, more than half of those running MySQL servers are currently on v5.7 according to the data received from those using Percona Monitoring and Management for their database management tasks and that have elected to share telemetry data with us. As this is a representative sample of database installations, that means there are a lot of database servers out there that are just four months away from end of life.
To prepare for the move, what should you be aware of? I’ve sketched out the costs and benefits of the various options below.
To start with, you should look at what is involved in the move from MySQL 5.7 to MySQL 8.0, which will be the only supported version of MySQL in the future. MySQL 8.0 has been on the market since 2016, so it is a very stable option, but there are some significant changes compared to the previous edition.
One major change is the enhancements to SQL (Structured Query Language) supported in MySQL 8.0 that have made it easier for developers and DBAs to support queries. For example, if you have trouble writing subqueries, you will rejoice in the support for lateral-derived joins and common table expressions (CTEs). There is also a new intersect clause to aid with sets.
MySQL 8.0 also supports new commands that are not included in MySQL 5.7. One example is EXPLAIN ANALYZE
, which is a big boon to query tuning. The EXPLAIN
command gives you the server’s estimated analysis of the performance of your query. Adding ANALYZE
causes the query to execute, and the numbers returned to report the real numbers of the query’s performance. This provides more insight into how queries run in practice, and makes it easier to find improvements. Alongside this, the INVISIBLE INDEX
command helps you test the efficiency of an index without risking a disastrous rebuild after a delete.
Alongside these changes, the updated default character set UTF8MB4 provides Unicode version 9.0 support, meaning that you can support international characters. This is especially useful if you have to support global operations.
Migrating to MySQL 8.0 is a one-way street, so you will have to determine whether your application and database will support the move. One efficient way to check this is by using MySQL Shell’s util.checkForServerUpgrade()
utility, which carries out 21 different tests to find any potential problems that might come up once you start the migration process. This includes checking for any tables with names that conflict with new reserved keywords, for partitioned tables that use engines with non-native partitioning, for circular directory references in tablespace data file paths, and for usage of removed functions. Similarly, the utility will look for issues around system variables that have been removed or changed to new default values.
Depending on your existing MySQL implementation, you may only need to make some minor changes to be ready. However, if your application comes back with multiple issues and updates, then you will have more work to carry out.
Alongside checking your systems for potential migration problems, you should also investigate your options overall. For example, is MySQL still the best database for you and your team, or should you consider alternatives? If you will have to put significant amounts of work into your application to bring it up to scratch, should you put that effort into a migration to a different platform? Equally, will you continue to run your database infrastructure in the same way, or should you use a different approach such as database as a service (DBaaS)?
There are three choices you could make. The first option is to do nothing. You might decide that the cost of moving an application to a new database version is too high and choose to continue running on database versions that are out of support. This is not ideal, but there may be circumstances when it is the best option. One company I work with had a similar situation when looking at MySQL, and decided that they would leave their systems as they were, because the application was not directly connected to the public internet and was due for a refresh in two years.
The amount of work to get the application migrated was higher than the cost to mitigate potential security risks and buy extended support, so they decided to stay on their current version of MySQL. This was an active decision with a real business case and risk management approach, rather than digging their head in the sand to ignore the problem.
The second option is to make the move, but change where you host your databases. For example, MySQL-compatible cloud services and hosting providers can manage these machines on your behalf rather than your having to run your own infrastructure. DBaaS options can take away some of the infrastructure management headaches, but they will have to be managed and updated in their own right as well.
The third option is to migrate to a different database. When your application and database installation have to be updated and the work will be significant, then any effort put in could be used for moving to a different database. This can be useful if you want to move your systems as a whole, but it can require additional planning to look at your business logic as well as your infrastructure.
The most common external option for MySQL migration is PostgreSQL, as it is a similarly popular open source database with a significant community around it. PostgreSQL was itself recently updated to support the SQL command MERGE
, which is commonly used across Microsoft SQL Server, Oracle Database, and MySQL. This was added in PostgreSQL 15 to make it easier to migrate to PostgreSQL without significant rewrites. This migration may require some rewriting, but if you are already having to make changes to move to MySQL 8.0, then why not stretch to a shift over?
A MERGE
migration can also support using a commercial or DBaaS version of PostgreSQL. There are many database services based on PostgreSQL, thanks to its flexible open source license, so many companies tout their ability to support this. However, it is worth looking at whether any of these options are fully compatible, and truly do support open source PostgreSQL, rather than being their own specific variant. This could be a one-way street similar to MySQL migration, but with fewer options once you have made the move.
Migrating from MySQL 5.7 to MySQL 8.0 or beyond will be a task that many developers and DBAs will have to support over the next few months. Start by planning ahead and understanding your options. By looking at your existing applications, how much work you will have to put in, and what you want or need from your application infrastructure in the future, you can evaluate the costs and benefits of the different paths ahead.
An in-place MySQL update, a full migration to a new platform, or even staying in place are all options that you can consider. However, rather than sitting back or putting your head in the sand, you can get ahead of the issues and make the most of your opportunities.
Dave Stokes is technology evangelist at Percona.
—
New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.
Next read this:
Posted by Richy George on 20 June, 2023
Databricks on Tuesday said that developers will be able to build applications on enterprise data stored in the company’s data lakehouse and list them on the Databricks Marketplace.
Dubbed Lakehouse Apps, these new applications will run on an enterprise’s Databricks instance and use company data along with security and governance features provided by Databricks’ platform, the company said, adding that the new capabilities are aimed at reducing time and effort to adopt, integrate, and manage data for artificial intelligence use cases.
“This avoids the overhead of data movement, while taking advantage of all the security, management, and governance features of the lakehouse platform,” according to dbInsights’s principal analyst Tony Baer .
Databricks’ new Lakehouse Apps can be seen as an answer to Snowflake’s Native Application Framework, launched last year to allow developers to build and run applications from within the Snowflake Data Cloud platform, analysts said.
Snowflake and MongoDB, according to Constellation Research’s principal analyst Doug Henschen, are also encouraging customers to think of and use their products as platforms for building applications.
“So last year Snowflake acquired Streamlit, a company that offered a framework for building data applications, and it introduced lightweight transactional capabilities, which had been a bit of a gap,” Henschen said, adding that MongoDB, which is already popular with developers, has also increased its analytical capabilities significantly.
In a move that is similar to what Snowflake has done, Databricks has partnered with several companies, such as Retool, Posit, Kumo.ai, and Lamini, to help with the development of Lakehouse Apps.
During the launch of the Native Application Framework, Snowflake had partnered with companies including CapitalOne, Informatica, and LiveRamp to develop applications for data management, cloud cost management, identity resolution and data integration.
While Databricks’ partnership with Retool will enable enterprises to build and deploy internal apps powered by their data, the integration with Posit will provide data professionals with tools for data science.
“With the help of Retool, developers can assemble UIs with drag-and-drop building blocks like tables and forms and write queries to interact with data using SQL and JavaScript,” Databricks said in a statement.
The company’s partnership with Lamini will allow developers to build customized, private large language models, the company added.
The Lakehouse Applications, just like Snowflake applications developed using the Native Application Framework, can be shared in the Databricks Marketplace.
The company has not provided details about revenue sharing or how agreements for these applications will work between two parties.
Snowflake charges 10% of the total transaction value for any applications sold through its marketplace. The company had earlier said that it would put a grading scale in place for higher value transactions.
Databricks’ new Lakehouse applications, according to Henschen, is aimed at increasing the “stickiness” of the company’s product offerings, especially at a time when most applications are driven by data and machine learning.
These new apps can be seen as a strategy to convince developers that Databricks’ platform can handle the transactional capabilities required to build a modern application, Henschen said.
The Lakehouse Apps are expected to be in preview in the coming year, the company said, adding that Databricks Marketplace will be made generally available later this month.
Databricks will also offer AI model sharing in the Databricks Marketplace in an effort to help its enterprise customers accelerate the development of AI applications and also help the model providers monetize them.
The company said that it will also curate and publish open source models across common use cases, such as instruction-following and text summarization, and optimize tuning or deploying these models on its platform.
“Databricks’ move to allow AI model sharing on the marketplace echoes what Snowflake is doing in its marketplace, which last year expanded from just data sets to include native applications and models as well,” Baer said.
Additionally, the marketplace will host new data providers including S&P Global, Experian, London Stock Exchange Group, Nasdaq, Corelogic and YipitData, the company said. Healthcare companies such as Datavant and IQVIA as well as companies dealing with geospatial data — such as Divirod, Safegraph and Accuweather —will also provide data sets on the marketplace.
Other data providers include LiveRamp, LexisNexis and ZoomInfo.
The AI model sharing capability is expected to be in preview next year.
The company also said that it was expanding its Delta Sharing partnership footprint by tying up with companies such as Dell, Twilio, Cloudflare and Oracle.
Delta Sharing is an open source protocol designed to allow users to transmit data from within Databricks to any other computing platform in a secure manner.
Next read this:
Copyright 2015 - InnovatePC - All Rights Reserved
Site Design By Digital web avenue