All posts by Richy George

6 performance tips for Entity Framework Core 7

Posted by on 27 July, 2023

This post was originally published on this site

Entity Framework Core (EF Core) is an open source ORM (object-relational mapping) framework that bridges the gap between the object model of your application and the data model of your database. EF Core makes life simpler by allowing you to work with the database using .NET objects, instead of having to write arcane data access code. 

In an earlier post here, we discussed five best practices to improve data access performance in EF Core. In this article, we’ll examine six more ways to improve EF Core performance. To work with the code examples provided below, you should have Visual Studio 2022 installed in your system. If you don’t already have a copy, you can download Visual Studio 2022 here.

Create a console application project in Visual Studio

First off, let’s create a .NET Core console application project in Visual Studio. Assuming Visual Studio 2022 is installed in your system, follow the steps outlined below to create a new .NET Core console application project in Visual Studio.

  1. Launch the Visual Studio IDE.
  2. Click on “Create new project.”
  3. In the “Create new project” window, select “Console App (.NET Core)” from the list of templates displayed.
  4. Click Next.
  5. In the “Configure your new project” window, specify the name and location for the new project.
  6. Click Next.
  7. In the “Additional information” window shown next, choose “.NET 7.0 (Standard Term Support)” as the Framework version you want to use.
  8. Click Create.

We’ll use this project to examine six ways to improve EF Core performance in the sections below.

Use eager loading instead of lazy loading

It should be noted that EF Core uses lazy loading by default. With lazy loading, the related entities are loaded into the memory only when they are accessed. The benefit is that data aren’t loaded unless they are needed. However, lazy loading can be costly in terms of performance because multiple database queries may be required to load the data.

To solve this problem, you should use eager loading in EF Core. Eager loading fetches your entities and related entities in a single query, reducing the number of round trips to the database. The following code snippet shows how eager loading can be used.

public class DataContext : DbContext


    public List<Author> GetEntitiesWithEagerLoading()


        List<Author> entities = this.Set<Author>()

            .Include(e => e.Books)


        return entities;



Use asynchronous instead of synchronous code

You should use async code to improve the performance and responsiveness of your application. Below I’ll share a code example that shows how you can execute queries asynchronously in EF Core. First, consider the following two model classes:

public class Author


    public int Id { get; set; }

    public string FirstName { get; set; }

    public string LastName { get; set; }

    public List<Book> Books { get; set; }


public class Book


    public int Id { get; set; }

    public string Title { get; set; }

    public Author Author { get; set; }


In the code snippet that follows, we’ll create a custom data context class by extending the DbContext class of EF Core library.

   public class DataContext : DbContext


        protected readonly IConfiguration Configuration;

        public DataContext(IConfiguration configuration)


            Configuration = configuration;


        protected override void OnConfiguring

        (DbContextOptionsBuilder options)




        public DbSet<Author> Authors { get; set; }

        public DbSet<Book> Books { get; set; }


Note that we’re using an in-memory database here for simplicity. The following code snippet illustrates how you can use async code to update an entity in the database using EF Core.

public async Task<int> Update(Author author)


    var dbModel = await this._context.Authors

       .FirstOrDefaultAsync(e => e.Id == author.Id);

       dbModel.Id = author.Id;

       dbModel.FirstName = author.FirstName;

       dbModel.LastName = author.LastName;

       dbModel.Books = author.Books;

       return await this._context.SaveChangesAsync();


Avoid the N+1 selects problem

The N+1 problem has been around since the early days of ORMs. In EF Core, this can occur when you’re trying to load data from two tables having a one-to-many or many-to-many relationship. For example, let’s say you’re loading author data from the Authors table and also book data from the Books table.

Consider the following piece of code.

foreach (var author in this._context.Authors)


    author.Books.ForEach(b => b.Title.ToUpper());


Note that the outer foreach loop will fetch all authors using one query. This is the “1” in your N+1 queries. The inner foreach that fetches the books represents the “N” in your N+1 problem, because the inner foreach will be executed N times.

To solve this problem, you should fetch the related data in advance (using eager loading) as part of the “1” query. In other words, you should include the book data in your initial query for the author data, as shown in the code snippet given below.

var entitiesQuery = this._context.Authors

    .Include(b => b.Books);

foreach (var entity in entitiesQuery)


   entity.Books.ForEach(b => b.Title.ToUpper());


By doing so, you reduce the number of round trips to the database from N+1 to just one. This is because by using Include, we enable eager loading. The outer query, i.e., the entitiesQuery, executes just once to load all the author records together with the related book data. Instead of making round trips to the database, the two foreach loops work on the available data in the memory.

Use IQueryable instead of IEnumerable

When you’re quering data in EF Core, use IQueryable in lieu of IEnumerable. When you use IQueryable, the SQL statements will be executed on the database server, where the data is stored. By contrast, if you use IEnumerable, all operations will be performed in the memory of the application server, requiring the data to be retrieved.

The following code snippet shows how you can use IQueryable to query data.

IQueryable<Author> query = _context.Authors;

query = query.Where(e => e.Id == 5);

query = query.OrderBy(e => e.Id);

List<Author> entities = query.ToList();

Disable query tracking for read-only queries

The default behavior of EF Core is to track objects retrieved from the database. Tracking is required when you want to update an entity with new data, but it is a costly operation when you’re dealing with large data sets. Hence, you can improve performance by disabling tracking when you won’t be modifying the entities.

For read-only queries, i.e. when you want to retrieve entities without modifying them, you should use AsNoTracking to improve performance. The following code snippet illustrates how AsNoTracking can be used to disable tracking for an individual query in EF Core.

var dbModel = await this._context.Authors.AsNoTracking()

    .FirstOrDefaultAsync(e => e.Id == author.Id);

The code snippet given below can be used to retrieve entities directly from the database without loading them into the memory.

public class DataContext : DbContext


    public IQueryable<Author> GetAuthors()


        return Set<Author>().AsNoTracking();



Use batch updates for large numbers of entities

The default behavior of EF Core is to send individual update statements to the database when there is a batch of update statements to be executed. Naturally, multiple hits to the database entail a significant performance overhead. To change this behavior and optimize batch updates, you can take advantage of the UpdateRange() method as shown in the code snippet given below.

    public class DataContext : DbContext


        public void BatchUpdateAuthors(List<Author> authors)


            var students = this.Authors.Where(a => a.Id >10).ToList();




        protected override void OnConfiguring

        (DbContextOptionsBuilder options)




        public DbSet<Author> Authors { get; set; }

        public DbSet<Book> Books { get; set; }


If you’re using EF Core 7 and beyond, you can use the ExecuteUpdate and ExecuteDelete methods to perform batch updates and eliminate multiple database hits. For example:

_context.Authors.Where(a => a.Id > 10).ExecuteUpdate();

Performance should be a feature

We’ve examined several key strategies you can adopt to improve data access performance using EF Core. You should use a benchmarking tool such as BenchmarkDotNet to measure the performance of your queries after applying the changes described in this article. (See my article on BenchmarkDotNet here.) Additionally, you should fine-tune your database design, indexes, queries, and stored procedures to get maximum benefits.

Performance should be a feature of your application. It is imperative that you keep performance in mind from the outset whenever you are building applications that use a lot of data.

Next read this:

Posted Under: Database
Why Raft-native systems are the future of streaming data

Posted by on 11 July, 2023

This post was originally published on this site

Consensus is fundamental to consistent, distributed systems. In order to guarantee system availability in the event of inevitable crashes, systems need a way to ensure that each node in the cluster is in alignment, such that work can seamlessly transition between nodes in the case of failures. Consensus protocols such as Paxos, Raft, and View Stamped Replication (VSR) help to drive resiliency for distributed systems by providing the logic for processes like leader election, atomic configuration changes, synchronization, and more.

As with all design elements, the different approaches to distributed consensus offer different trade-offs. Paxos is the oldest consensus protocol around and is used in many systems like Google Cloud Spanner, Apache Cassandra, Amazon DynamoDB, and Neo4j. Paxos achieves consensus in a three-phased, leaderless, majority-wins protocol. While Paxos is effective in driving correctness, it is notoriously difficult to understand, implement, and reason about. This is partly because it obscures many of the challenges in reaching consensus (e.g. leader election, reconfiguration), making it difficult to decompose into subproblems.

Raft (for reliable, replicated, redundant, and fault-tolerant) can be thought of as an evolution of Paxos focused on understandability. Raft can achieve the same correctness as Paxos but is easier to understand and implement in the real world, so often can provide greater reliability guarantees. For example, Raft uses a stable form of leadership, which simplifies replication log management, and its leader election process is more efficient.

redpanda raft 01 lg Redpanda Data

And because Raft decomposes the different logical components of the consensus problem, for example by making leader election a distinct step before replication, it is a flexible protocol to adapt for complex, modern distributed systems that need to maintain correctness and performance while scaling to PBs of throughput, all while being simpler to understand to new engineers hacking on the codebase.

For these reasons, Raft has been rapidly adopted for today’s distributed and cloud-native systems like MongoDB, CockroachDB, TiDB, and Redpanda in order to achieve greater performance and transactional efficiency.

How Redpanda implements Raft

When Redpanda founder Alex Gallego determined that the world needed a new streaming data platform to support the kind of GBps+ workloads that can cause Apache Kafka to fall over, he decided to rewrite Kafka from the ground-up.

The requirements for what would become Redpanda were 1) it needed to be simple and lightweight in order to reduce the complexity and inefficiency of running Kafka clusters reliably at scale; 2) it needed to maximize the performance of modern hardware in order to provide low latency for large workloads; and 3) it needed to guarantee data safety even for very large throughputs.

Implementing Raft provided a solid foundation for all three requirements:

  1. Simplicity. Every Redpanda partition is a Raft group, so everything in the platform is reasoning around Raft, including both metadata management and partition replication. This contrasts with the complexity of Kafka, where data replication is handled by ISR (in-sync replicas) and metadata management is handled by ZooKeeper (or KRaft), and you have two systems that must reason with one another.
  2. Performance. The Redpanda Raft implementation can tolerate disturbances to a minority of replicas, so long as the leader and a majority of replicas are stable. In cases when a minority of replicas have a delayed response, the leader does not have to wait for their responses to progress, mitigating impact on latency. Redpanda is therefore more fault-tolerant and can deliver predictable performance at scale.
  3. Reliability. When Redpanda ingests events, they are written to a topic partition and appended to a log file on disk. Every topic partition then forms a Raft consensus group, consisting of a leader plus a number of followers, as specified by the topic’s replication factor. A Redpanda Raft group can tolerate ƒ failures given 2ƒ+1 nodes; for example, in a cluster with five nodes and a topic with a replication factor of five, two nodes can fail and the topic will remain operational. Redpanda leverages the Raft joint consensus protocol to provide consistency even during reconfiguration.

Redpanda also extends core Raft functionality in some critical ways in order to achieve the scalability, reliability, and speed required of a modern, cloud-native solution. Its innovations on top of Raft include changes to the election process, heartbeat generation, and, critically, support for Apache Kafka ACKS. These innovations ensure the best possible performance in all scenarios, which is what enables Redpanda to be significantly faster than Kafka while still guaranteeing data safety. In fact, Jepsen testing has verified that Redpanda is a safe system without known consistency problems, and a solid Raft-based consensus layer.

But what about KRaft?

While Redpanda takes a Raft-native approach, the legacy streaming data platforms have been laggards in adopting modern approaches to consensus. Kafka itself is a replicated distributed log, but it has historically relied on yet another replicated distributed log—Apache ZooKeeper—for metadata management and controller election. This has been problematic for a few reasons:

  1. Managing multiple systems introduces administrative burden;
  2. Scalability is limited due to inefficient metadata handling and double caching;
  3. Clusters can become very bloated and resource intensive; in fact, it is not uncommon to see clusters with equal numbers of ZooKeeper and Kafka nodes.

These limitations have not gone unacknowledged by Apache Kafka’s committers and maintainers, who are in the process of replacing ZooKeeper with a self-managed metadata quorum: Kafka Raft (KRaft). This event-based flavor of Raft reduces the administrative challenges of Kafka metadata management, and is a promising sign that the Kafka ecosystem is moving in the direction of modern approaches to consensus and reliability.

Unfortunately, KRaft does not solve the problem of having two different systems for consensus in a Kafka cluster. In the new KRaft paradigm, KRaft partitions handle metadata and cluster management, but replication is handled by the brokers, so you still have these two distinct platforms and the inefficiencies that arise from that inherent complexity.

redpanda raft 02 lg Redpanda Data

Combining Raft with performance engineering

As data industry leaders like CockroachDB, MongoDB, Neo4j, and TiDB have demonstrated, Raft-based systems deliver simpler, faster, and more reliable distributed data environments. Raft is becoming the standard consensus protocol for today’s distributed data systems because it marries particularly well with performance engineering to further boost the throughput of data processing.

For example, Redpanda combines Raft with speedy architectural ingredients to perform at least 10x faster than Kafka at tail latencies (p99.99) when processing a 1GBps workload, on one-third the hardware, without compromising data safety. Traditionally, GBps+ workloads have been a burden for Apache Kafka, but Redpanda can support them with double-digit millisecond latencies, while retaining Jepsen-verified reliability.

How is this achieved? Redpanda is written in C++, and uses a thread-per-core architecture to squeeze maximum performance out of modern chips and network cards. These elements work together to elevate the value of Raft for a distributed streaming data platform.

redpanda raft 03 lg Redpanda Data

For example, because Redpanda bypasses the page cache and the Java Virtual Machine (JVM) dependency of Kafka, it can embed hardware-level knowledge into its Raft implementation. Typically, every time you write in Raft you have to flush in order to guarantee the durability of writes on disk. In Redpanda’s optimistic approach to Raft, smaller intermittent flushes are dropped in favor of a larger flush at the end of a call. While this introduces some additional latency per call, it reduces overall system latency and increases overall throughput, because it reduces the total number of flush operations.

While there are many effective ways to ensure consistency and safety in distributed systems (Blockchains do it very well with proof-of-work and proof-of-stake protocols), Raft is a proven approach and flexible enough that it can be enhanced, as with Redpanda, to adapt to new challenges. As we enter a new world of data-driven possibilities, driven in part by AI and machine learning use cases, the future is in the hands of developers who can harness real-time data streams.

Raft-based systems, combined with performance-engineered elements like C++ and thread-per-core architecture, are driving the future of data streaming for mission-critical applications.

Doug Flora is head of product marketing at Redpanda Data.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to

Next read this:

Posted Under: Database
How to use GPT-4 with streaming data for real-time generative AI

Posted by on 10 July, 2023

This post was originally published on this site

By this point, just about everybody has had a go playing with ChatGPT, making it do all sorts of wonderful and strange things. But how do you go beyond just messing around and using it to build a real-world, production application? A big part of that is bringing together the general capabilities of ChatGPT with your unique data and needs.

What do I mean by that? Let me give you an example of a scenario every company is thinking about right now. Imagine you’re an airline, and you want to have an AI support agent help your customers if a human isn’t available.

Your customer might have a question about how much it costs to bring skis on the plane. Well, if that’s a general policy of the airline, that information is probably available on the internet, and ChatGPT might be able to answer it correctly.

But what about more personal questions, like

Is my flight delayed?
Can I upgrade to first class?
Am I still on the standby list for my flight tomorrow?

It depends! First of all, who are you? Where and when are you flying? What airline are you booked with?

ChatGPT can’t help here because it doesn’t know the answer to these questions. This isn’t something that can be “fixed” by more innovation at OpenAI. Your personal data is (thankfully) not available on the public internet, so even Bing’s implementation that connects ChatGPT with the open web wouldn’t work.

The fundamental obstacle is that the airline (you, in our scenario) must safely provide timely data from its internal data stores to ChatGPT. Surprisingly, how you do this doesn’t follow the standard playbook for machine learning infrastructure. Large language models have changed the relationship between data engineering and model creation. Let me explain with a quick diagram.

gpt 4 streaming 01 Confluent

In traditional machine learning, most of the data engineering work happens at model creation time. You take a specific training data set and use feature engineering to get the model right. Once the training is complete, you have a one-off model that can do the task at hand, but nothing else. Most of the problem-specific smarts are baked in at training time. Since training is usually done in batch, the data flow is also batch and fed out of a data lake, data warehouse, or other batch-oriented system.

With large language models, the relationship is inverted. Here, the model is built by taking a huge general data set and letting deep learning algorithms do end-to-end learning once, producing a model that is broadly capable and reusable. This means that services like those provided by OpenAI and Google mostly provide functionality off reusable pre-trained models rather than requiring they be recreated for each problem. And it is why ChatGPT is helpful for so many things out of the box. In this paradigm, when you want to teach the model something specific, you do it at each prompt. That means that data engineering now has to happen at prompt time, so the data flow problem shifts from batch to real-time.

What is the right tool for the job here? Event streaming is arguably the best because its strength is circulating feeds of data around a company in real time.

In this post, I’ll show how streaming and ChatGPT work together. I’ll walk through how to build a real-time support agent, discuss the architecture that makes it work, and note a few pitfalls.

How ChatGPT works

While there’s no shortage of in-depth discussion about how ChatGPT works, I’ll start by describing just enough of its internals to make sense of this post.

ChatGPT, or really GPT, the model, is basically a very large neural network trained on text from the internet. By training on an enormous corpus of data, GPT has been able to learn how to converse like a human and appear intelligent.

When you prompt ChatGPT, your text is broken down into a sequence of tokens as input into the neural network. One token at a time, it figures out what is the next logical thing it should output.

Human: Hello.

AI: How

AI: How can

AI: How can I

AI: How can I help

AI: How can I help you

AI: How can I help you today?

One of the most fascinating aspects of ChatGPT is that it can remember earlier parts of your conversation. For example, if you ask it “What is the capital of Italy?”, it correctly responds “Rome”. If you then ask “How long has it been the capital?”, it’s able to infer that “it” means Rome as the capital, and correctly responds with 1871. How is it able to do that?

ChatGPT has something called a context window, which is like a form of working memory. Each of OpenAI’s models has different window sizes, bounded by the sum of input and output tokens. When the number of tokens exceeds the window size, the oldest tokens get dropped off the back, and ChatGPT “forgets” about those things.

gpt 4 streaming 02 Confluent

As we’ll see in a minute, context windows are the key to evolving ChatGPT’s capabilities.

Making GPT-4 understand your business

With that basic primer on how ChatGPT works, it’s easy to see why it can’t tell your customer if their flight was delayed or if they can upgrade to first class. It doesn’t know anything about that. What can we do?

The answer is to modify GPT and work with it directly, rather than go through ChatGPT’s higher-level interface. For the purposes of this blog post, I’ll target the GPT-4 model (and refer to it as GPT hereafter for concision).

There are generally two ways to modify how GPT behaves: fine-tuning and search. With fine-tuning, you retrain the base neural network with new data to adjust each of the weights. But this approach isn’t recommended by OpenAI and others because it’s hard to get the model to memorize data with the level of accuracy needed to serve an enterprise application. Not to mention any data it’s fine-tuned with may become immediately out of date.

That leaves us with search. The basic idea is that just before you submit a prompt to GPT, you go elsewhere and look up relevant information and prepend it to the prompt. You instruct GPT to use that information as a prefix to the prompt, essentially providing your own set of facts to the context window at runtime.

gpt 4 streaming 03 Confluent

If you were to do it manually, your prompt would look something like this:

You are a friendly airline support agent. Use only the following facts to answer questions. If you don’t know the answer, you will say “Sorry, I don’t know. Let me contact a human to help.” and nothing else.

The customer talking to you is named Michael.

Michael has booked flight 105.

Michael is flying economy class for flight 105.

Flight 105 is scheduled for June 2nd.

Flight 105 flies from Seattle to Austin.

Michael has booked flight 210.

Michael is flying economy class for flight 210.

Flight 210 is scheduled for June 10th.

Flight 210 flies from Austin to Seattle.

Flight 105 has 2 first class seats left.

Flight 210 has 0 first class seats left.

A customer may upgrade from economy class to first class if there is at least 1 first class seat left on the flight and the customer is not already first class on that flight.

If the customer asks to upgrade to first class, then you will confirm which flight.

When you are ready to begin, say “How can I help you today?”

Compared to fine-tuning, the search approach is a lot easier to understand, less error-prone, and more suitable for situations that require factual answers. And while it might look like a hack, this is exactly the approach being taken by some of the best-known AI products like GitHub Copilot.

So, how exactly do you build all this?

Constructing a customer 360

Let’s zoom out for a minute and set GPT aside. Before we can make a support agent, we have to tackle one key challenge—we need to collect all of the information that could be relevant to each customer.

Going back to the example of whether a customer can upgrade to first class, remember that the answer depends on a lot of different factors for the particular flight. To have enough context to answer it, you need to consolidate the data for:

  • Customer identity
  • Upcoming booked flights for the customer
  • Seat layout of the plane assigned to the flight
  • Current capacity for the flight
  • Rewards points for free upgrades

For most companies, this data is spread across a bunch of different systems like databases, data warehouses, SaaS applications, queues, and file systems. Much of it is not built to be queried interactively at low latency, and none of it is arranged to be easily consolidated. Communication between these systems is point-to-point, making it incredibly difficult to get a unified view of the data.

gpt 4 streaming 12 Confluent

Event streaming is a good solution to bring all of these systems together. By tapping into feeds of information as each of them changes, you can construct a unified view of each customer that’s easy to query with low latency.

gpt 4 streaming 05 Confluent

Confluent’s connectors make it easy to read from these isolated systems. Turn on a source connector for each, and changes will flow in real time to Confluent.

Because these streams usually contain somewhat raw information, you’ll probably want to process that data into a more refined view. Stream processing is how you transform, filter, and aggregate individual streams into a view more suitable for different access patterns. You probably want to ultimately sink that view into a relational database, key/value store, or document store.

Connecting the customer data to GPT

With the customer 360 data turned into a unified view, the next step is to programmatically connect that information with each prompt. At this point, the architecture looks like this:

gpt 4 streaming 06 Confluent

And the support agent interaction looks like this:

AI: How can I help you today?

Human: Hi. I’d like to upgrade my seat to first class. Can you do that?

AI: Absolutely, I’d be happy to assist with that. However, could you please specify which flight you’d like to upgrade to first class? Is it Flight 105 or Flight 210?

Human: 105 please

AI: Great news, Michael. We have 2 first class seats left on Flight 105. I can upgrade your economy class seat to first class for this flight. Shall I proceed with this change for you?

Human: Yep.

AI: Excellent, Michael. I’ve successfully upgraded your seat to first class on Flight 105. You will now enjoy all the perks and amenities of our first class service. Let me know if there’s anything else I can assist you with today. Safe travels!

Human: Can you upgrade my 210 seat to first class too?

AI: I’m sorry, Michael, but currently there are no first class seats left on Flight 210. Therefore, I’m unable to upgrade your seat for this flight. If there’s anything else you need help with, please let me know.

So when the application receives a chat message, all it needs to do is look up the customer ID who sent it, query the customer 360 view, and supply a block of useful information at the start of every prompt—similar to what I showed in the manual pre-prompt.

Connecting your knowledge base to GPT

This technique works great for questions about an individual customer, but what if you wanted the support agent to be broadly knowledgeable about your business? For example, if a customer asked, “Can I bring a lap infant with me?”, that isn’t something that can be answered through customer 360 data. Each airline has general requirements that you’d want to tell the customer, like that they must bring the child’s birth certificate.

Information like that usually lives across many web pages, internal knowledge base articles, and support tickets. In theory, you could retrieve all of that information and prepend it to each prompt as I described above, but that is a wasteful approach. In addition to taking up a lot of the context window, you’d be sending a lot of tokens back and forth that are mostly not needed, racking up a bigger usage bill.

How do you overcome that problem? The answer is through embeddings. When you ask GPT a question, you need to figure out what information is related to it so you can supply it along with the original prompt. Embeddings are a way to map things into a “concept space” as vectors of numbers. You can then use fast operations to determine the relatedness of any two concepts. 

OK, but where do those vectors of numbers come from? They’re derived from feeding the data through the neural network and grabbing the values of neurons in the hidden layers. This works because the neural network is already trained to recognize similarity.

To calculate the embeddings, you use OpenAI’s embedding API. You submit a piece of text, and the embedding comes back as a vector of numbers.

  -H "Content-Type: application/json" 
  -H "Authorization: Bearer $OPENAI_API_KEY" 
  -d '{
    "input": "Your text string goes here",
    "model": "text-embedding-ada-002"

  "data": [
      "embedding": [
      "index": 0,
      "object": "embedding"
  "model": "text-embedding-ada-002",
  "object": "list",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5

Since we’re going to use embeddings for all of our policy information, we’re going to have a lot of them. Where should they go? The answer is in a vector database. A vector database specializes in organizing and storing this kind of data. Pinecone, Weaviate, Milvus, and Chroma are popular choices, and more are popping up all the time.

gpt 4 streaming 07 Confluent

As a quick aside, you might be wondering why you shouldn’t exclusively use a vector database. Wouldn’t it be simpler to also put your customer 360 data there, too? The problem is that queries against a vector database retrieve data based on the distance between embeddings, which is not the easiest thing to debug and tune. In other words, when a customer starts a chat with the support agent, you absolutely want the agent to know the set of flights the customer has booked. You don’t want to leave that up to chance. So in this case it’s better to just query your customer 360 view by customer ID and put the retrieved data at the start of the prompt.

With your policies in a vector database, harvesting the right information becomes a lot simpler. Before you send a prompt off to GPT, you make an embedding out of the prompt itself. You then take that embedding and query your vector database for related information. The result from that query becomes the set of facts that you prepend to your prompt, which helps keep the context window small since it only uses relevant information.

gpt 4 streaming 08 Confluent

That, at a very high level, is how you connect your policy data to GPT. But I skipped over a lot of important details to make this work. Time to fill those in.



Page 2

Syncing your knowledge base to the vector database

The next step is to get your policy information into the vector database. The biggest decision to make here is how you’ll chunk the data.

Chunking refers to the amount of data that you put together in one embedding. If the chunk size is too large or too small, it’ll be harder for the database to query for related information. To give you an idea of how this works in other domains, you might choose to chunk a Wikipedia article by section, or perhaps by paragraph.

Now, if your policies change slowly or never change, you can scrape all of your policy documents and batch upload them to the vector database, but a better strategy would be to use stream processing. Here again, you can set up connectors to your file systems so that when any file is added or changed, that information can be made rapidly available to the support agent.

If you use stream processing, sink connectors help your data make the final jump, moving your embeddings into the vector database.

gpt 4 streaming 09 Confluent

Tying it all together

We’re now ready to bring all of this together into a working example. Here’s what the architecture looks like:

gpt 4 streaming 10 Confluent

This architecture is hugely powerful because GPT will always have your latest information each time you prompt it. If your flight gets delayed or your terminal changes, GPT will know about it during your chat session. This is completely distinct from current approaches where the chat session would need to be reloaded or wait a few hours (or days) for new data to arrive.

And there’s more. A GPT-enabled agent doesn’t have to stop at being a passive Q/A bot. It can take real action on your behalf. This is again something that ChatGPT, even with OpenAI’s plugins, can’t do out of the box because it can’t reason about the aftereffects of calling your internal APIs. Event streams work well here because they can propagate the chain of traceable events back to you. As an example, you can imagine combining command/response event pairs with chain-of-thought prompting to approach agent behavior that feels more autonomous.

The ChatGPT Retrieval Plugin

For the sake of giving a clear explanation about how all of this works, I described a few things a bit manually and omitted the topic of ChatGPT plugins. Let’s talk about that now.

Plugins are a way to extend ChatGPT and make it do things it can’t do out of the box. New plugins are being added all the time, but one in particular is important to us: the ChatGPT Retrieval Plugin. The ChatGPT Retrieval Plugin acts as a sort of proxy layer between ChatGPT and the vector database, providing the glue that allows the two to talk to each other.

In my example, I illustrated how you’d receive a prompt, make an embedding, search the vector database, send it to GPT, and so on. Instead of doing that by hand, the ChatGPT Retrieval Plugin makes the right API calls back and forth on your behalf. This would allow you to use ChatGPT directly, rather than going underneath to OpenAI’s APIs, if that makes sense for your use case. 

Keep in mind that plugins don’t yet work with the OpenAI APIs. They only work in ChatGPT. However, there is some work going on in the LangChain framework to sidestep that.

If you take this approach, one key change to the architecture above is that instead of connecting Apache Kafka directly to the vector database, you’d want to forward all of your customer 360 data to the Retrieval plugin instead—probably using the HTTP sink connector.

gpt 4 streaming 11 Confluent

Whether you connect these systems manually or use the plugin, the mechanics remain the same. Again, you can choose whichever method works best for your use case.

Capturing conversation and fine-tuning

There’s one last step to tidy up this example. As the support agent is running, we want to know what exactly it’s doing. What’s a good way to do that?

The prompts and responses are good candidates to be captured as event streams. If there’s any feedback (imagine an optional thumbs up/down to each response), we can capture that too. By again using stream processing, we can keep track of how helpful the agent is from moment to moment. We can feed that knowledge back into the application so that it can dynamically adjust how it constructs its prompt. Think of it as a bit like working with runtime feature flags.

gpt 4 streaming 12 Confluent

Capturing this kind of observability data unlocks one more opportunity. Earlier I mentioned that there are two ways to modify how GPT behaves: search and fine-tuning. Until now, the approach I’ve described has centered on search, adding information to the start of each prompt. But there are reasons you might want to fine-tune, and now is a good time to look at them.

When you add information to the start of a prompt, you eat up space in the context window, eroding GPT’s ability to remember things you told it in the past. And with more information in each prompt, you pay more for tokens to communicate with the OpenAI APIs. The incentive is to send the least amount of tokens possible in each prompt.

Fine-tuning is a way of side-stepping those issues. When you fine-tune a machine learning model, you make small adjustments to its neural network weights so that it will get better at a particular task. It’s more complicated to fine-tune a model, but it allows you to supply vastly more information to the model once, rather than paying the cost every time a prompt is run.

Whether you can do this or not depends on what model you’re using. This post is centered around the GPT-4 model, which is closed and does not yet permit fine-tuning. But if you’re using an open-source model, you have no such restrictions, and this technique might make sense.

So in our example, imagine for a moment that we’re using a model capable of being fine-tuned. It would make sense to do further stream processing and join the prompt, response, and feedback streams, creating a stream of instances where the agent was being helpful. We could feed all of those examples back into the model for fine-tuning as human-reinforced feedback. (ChatGPT was partly constructed using exactly this technique.)

Keep in mind that any information that needs to be real-time still needs to be supplied through the prompt. Remember, fine-tuning only happens once offline. So it’s a technique that should be used in conjunction with prompt augmentation, rather than something you’d use exclusively.

Known limitations

As exciting as this is, I want to call out two limitations in the approach outlined in this article.

First, this architecture predominantly relies on the context window being large enough to service each prompt. The supported size of context windows is expanding fast, but in the short term, this is a real limiter.

The second is that prompt injection attacks are proving challenging to defend against. People are constantly finding new ways to get GPT to ignore its previous instructions, and sometimes act in a malicious way. Implementing controls against injection will be even more important if agents are empowered to update existing business data as I described above.

In fact, we’re already starting to see the practical choices people are making to work around these problems.

Next steps

What I’ve outlined is the basic framework for how streaming and GPT can work together for any company. And while the focus of this post was on using streaming to gather and connect your data, I expect that streaming will often show up elsewhere in these architectures.

I’m excited to watch this area continue to evolve. There’s clearly a lot of work to do, but I expect both streaming and large language models to mutually advance one another’s maturity.

Michael Drogalis is a principal technologist on the TSG team at Confluent, where he helps make Confluent’s developer experience great. Before joining Confluent, Michael served as the CEO of Distributed Masonry, a software startup that built a streaming-native data warehouse. He is also the author of several popular open source projects, most notably the Onyx Platform.

Generative AI Insights, an InfoWorld blog open to outside contributors, provides a venue for technology leaders to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content.

Next read this:

Posted Under: Database
How Databricks is adding generative AI capabilities to its Delta Lake lakehouse

Posted by on 29 June, 2023

This post was originally published on this site

It’s been a busy few weeks for Databricks. After releasing a new iteration of its data lakehouse with a universal table format and introducing Lakehouse Apps, the company on Wednesday announced new tools aimed at helping data professionals develop generative AI capabilities. 

The new capabilities — which include a proprietary enterprise knowledge enginem dubbed LakehouseIQ, a new vector search capability, a low-code large language model (LLM) tuning tool named AutoML, and open source foundation models — are being added to the company’s  Delta Lake lakehouse.

The new capabilities draw on  technology from the company’s recent acquisitions — MosiacML this week, and Okera in May.

LakehouseIQ to open up enterprise search via NLP

The  new LakehouseIQ engine is meant to help enterprise users search for data and insights from its Delat Lake, without the need to seek technical help from data professionals. To simplify data search for nontechnical users the LakehouseIQ engine uses natural language processing (NLP).

In order to enable NLP-based enterprise searches, LakehouseIQ uses generative AI to understand jargon, data usage patterns, and concepts like organizational structure.  

It’s a different approach than the common practice of creating knowledge graphs, a method used by companies including Glean and Salesforce.  A knowledge graph is a representation of structured and unstructured data in the form of nodes and edges, where nodes represent entities (such as people, places, or concepts) and edges represent relationships between these entities.  

In contrast, the LakehouseIQ engine, according to SanjMo principal analyst Sanjeev Mohan, consists of machine learning models that infer the context of the data sources and make them available for searching via natural language queries.

Enterprise users will be able to access the search capabilities of LakehouseIQ via Notebooks and the Assistant in its SQL editor, the company said. The Assistant will be able carry out various tasks such as writing queries and answering data-related questions.

Databricks said that it is adding LakehouseIQ to many management features inside its lakehouse, in order to deliver automated suggestions. These could include informing the user about an incomplete data set, or suggestions for debugging jobs and SQL queries.

Additionally, the company is exposing LakehouseIQ’s API, to help enterprises use its abilities in any custom applications they develop, said Joel Minnick, vice president of Marketing at Databricks.

The LakehouseIQ-powered Assistant is currently in preview.

Delta Lake gets AI toolbox for developing generative AI use cases

The addition of the Lakehouse AI toolbox to its lakehouse is meant to support the development of enterprise generative AI applications such as the creation of  intelligent assistants, Databricks said. The toolbox consists of features including vector search, low-code AutoML, a collection of open source models, MLflow 2.5, and Lakehouse Monitoring.

“With embeddings of files automatically created and managed in Unity Catalog, plus the ability to add query filters for searches, vector search will help developers improve the accuracy of generative AI responses,” Minnick said, adding that the embeddings are kept updated using Databricks’ Model Serving.

Embeddings are vectors or arrays that are used to give context to AI  models, a process known as grounding. This process allows enterprises to avoid having to fully train or finetune AI models using the enterprise information corpus.

Lakehouse AI also comes with a low-code interface to help enterprises tune foundational models.

“With AutoML, technically skilled developers and non-technical users have a low code way to fine-tune LLMs using their own enterprise data. The end result is a proprietary model with data input from within their organization, not third-party,” Minnick said, underlining the company’s open source foundation model policy.

As part of Lakehouse AI, Databricks is also providing several foundation models that can be accessed via the Databricks’ marketplace. Models from Stable Diffusion, Hugging Face and MosiacML, including MPT-7B and Falcon-7B, will be provided, the company said.

The addition of MLflow 2.5 — including new features such prompt tools and an AI Gateway  — is meant to help enterprises manage operations around LLMs.

While AI Gateway will enable enterprises to centrally manage credentials for SaaS models or model APIs and provide access-controlled routes for querying, the prompt tools provides a new no-code interface designed to  allow data scientists to compare various models’ output based on a set of prompts before deploying them in production via Model Serving.

“Using AI Gateway, developers can easily swap out the backend model at any time to improve cost and quality, and switch across LLM providers,” Minnick said.

Enterprises will be able to continuously monitor and manage all data and AI assets within the lakehouse with the new Lakehouse Monitoring feature, Databricks said, adding that the feature provides end-to-end visibility into data pipelines.

Databricks’  already offers an  AI governance kit in the forms of  Unity Catalog.

Do Databricks’ updates leave Snowflake trailing?

The new updates from Databricks, specifically targeting development of generative AI applications in the enterprise, may leave Snowflake trailing, according to Constellation Research principal analyst Doug Henschen.

“Both Databricks and Snowflake want their customers to handle all their workloads on their respective platforms, but in my estimation, Databricks is already ready to help them with building custom ML [machine learning], AI and generative AI models and applications,” Henschen said, adding that Snowflake’s generative AI capabilities, such as the recently announced Snowpark Container Services, is currently in private preview.

Snowflake, according to Amalgam Insights principal analyst Hyoun Park, is just starting to build out language and generative AI capabilities through the NVIDIA NeMO partnership and the Neeva acquisition.

In contrast, most of Databricks’ capabilities are either in general availability or in public preview, analysts said.

Databricks’ new updates may also lead to query performance gains across generative AI use cases, according to Gartner analyst Aaron Rosenbaum, and this may act as a differentiator against rival Snowflake.

“While Snowflake and Databricks have many common customers, running a wide variety of SQL queries cheaply, quickly and simply is a goal for every one of them,” Rosenbaum said.

Next read this:

Posted Under: Database
Databricks Delta Lake 3.0 to counter Apache Iceberg tables

Posted by on 28 June, 2023

This post was originally published on this site

Databricks on Wednesday introduced a new version of its data lakehouse offering, dubbed Delta Lake 3.0, in order to take on the rising popularity of Apache Iceberg tables used by rival Snowflake.

As part of Delta Lake 3.0, the company has introduced a new universal table format, dubbed UniForm, that will allow enterprises to use the data lakehouse with other table formats such as Apache Iceberg and Apache Hudi, the company said.

A data lakehouse is a data architecture that offers both storage and analytics capabilities, in contrast to the concepts for data lakes, which store data in native format, and data warehouses, which store structured data (often in SQL format).

UniForm eliminates the need for manually converting files from different data lakes and data warehouses while conducting analytics or building AI models, Databricks said.

The new table format, according to analysts, is Databricks’ strategy to connect its data lakehouse with the rest of the world and take on rival Snowflake, especially on the backdrop of Apache Iceberg garnering more multivendor support in the past few years.

“With UniForm, Databricks is essentially saying, if you can’t beat them, join them,” said Tony Baer, principal analyst at dbInsight, likening the battle between the table formats to the one between Apple’s iOS and Google’s Android operating system.

However, Baer believes that the adoption of lakehouses will depend on the ecosystem they provide and not just table formats.

“Adoption of data lakehouses is still very preliminary as the ecosystems have only recently crystallized, and most enterprises are still learning what lakehouses are,” Baer said, adding that lakehouses may see meaningful adoption a year from now.

Contrary to Baer, Databricks said its Delta Lake has seen nearly one billion downloads in a year. Last year, the company open sourced its Delta Lake offering and this according to the company has seen the lakehouse get updates from contributing engineers from AWS, Adobe, Twilio, eBay, and Uber.

Delta Kernel and liquid clustering

As part of Delta Lake 3.0, the company has also introduced two other features — Delta Kernel and a liquid clustering feature.

According to Databricks, Delta Kernel addresses connector fragmentation by ensuring that all connectors are built using a core Delta library that implements Delta specifications.

This alleviates the need for enterprise users to update Delta connectors with each new version or protocol change, the company said.

Delta Kernel, according to SanjMo principal analyst Sanjeev Mohan, is like a connector development kit that abstracts many of the underlying details and instead provides a set of stable APIs.

“This reduces the complexity and time to build and deploy connectors. We expect that the system integrators will now be able to accelerate development and deployment of connectors, in turn further expanding Databricks’ partner ecosystem,” Mohan said.

Liquid clustering has been introduced to address performance issues around data read and write operations, Databricks said.

In contrast to traditional methods such as Hive-style partitioning that increases data management complexity due to its use of a fixed data layout to improve read and write performance, liquid clustering offers a flexible data layout format that Databricks claims will provide cost-efficient clustering as data increases in size.

Next read this:

Posted Under: Database
Snowflake updates target generative AI demand from enterprises

Posted by on 27 June, 2023

This post was originally published on this site

Cloud-based data warehouse company Snowflake is shifting its attention toward large language models and generative AI. Launched in 2014 with a focus on disrupting the traditional data warehouse market and big-data analytics, the company has continued to add new features, such as its Native Application Framework, to target different  sets of enterprise users.

At its annual Snowflake Summit Tuesday, the company announced Snowpark Container Services, a partnership with Nvidia, and updates to its Streamlit Python library designed to help enterprise users manage large language models (LLMs) and build applications using them from within its Data Cloud Platform.

Snowpark Container Services, currently in private preview, will allow enterprises to bring more diverse workloads, including LLMs, to the Data Cloud Platform, said Christian Kleinerman, senior vice president of product at Snowflake, adding that it also allows developers to build applications in any programming language.

The new container services acts as a linchpin, connecting enterprise data stored in Snowflake with LLMs, model training interfaces, model governance frameworks, third-party data augmenting applications, machine learning models, APIs, and Snowflake’s Native Application Framework.

“Snowpark Containerized Services will help companies to move workloads, such as machine learning models or LLMs, between public and private cloud based on the client’s preferences,” said Hyoun Park, lead analyst at Amalgam Insights.

The process of moving workloads securely will become increasingly important as enterprises discover that the massive data entry and usage associated with training LLMs and other machine learning models are potential compliance risks, causing  them to move these models to governed and isolated systems, Park added.   

Container Services will also help reduce the burden on Snowflake’s data warehousing engine as it will run in an abstracted Kubernetes environment, according to Doug Henschen, principal analyst at Constellation Research.

“Simply put, it is a way to run an array of application services directly on Snowflake data but without burdening the data warehouses and performance sensitive analytical applications that run on them,” Henschen said.

Nvidia partnership provides technology for LLM training

In order to help enterprises train LLMs with data they have stored in Snowflake, the company has partnered with Nvidia to gain access to its AI Platform, which combines hardware and software capabilities. Snowflake will run Nvidia NeMo, a part of the AI Platform, from within the Data Cloud, the company said, adding that NeMo can be used for developing generative AI-based applications such as chatbots and intelligent search engines.

In addition, Snowpark Container Services will allow enterprises to gain access to third-party generative AI model providers such as Reka AI, said Sanjeev Mohan, principal analyst at SanjMo.

Other LLMs, such as those from OpenAI, Cohere and Anthropic, also can be accessed via APIs, Mohan said.

Snowflake’s updates reveal a strategy that is aimed at taking on Databricks, analysts said.

“Databricks is currently offering far more capabilities for building native AI, ML [machine learning] models than Snowflake, especially with the MosiacML acquisition that promises abilities to train models cheaper and faster,” said Andy Thurai, principal analyst at Constellation Research.  

The difference in strategy between the two companies, according to dbInsights’ principal analyst Tony Baer, seems to be their approach in expanding their user bases.

“Snowflake is seeking to extend from its base of data and BI developers to data scientists and data engineers, while Databricks is approaching from the opposite side,” Baer said.

Document AI generates insights from unstructured data

The new Container Services will allow enterprises to access data-augmenting and machine learning tools, such as Hex’s notebooks for analytics and data science, AI tools from Alteryx, Dataiku, and SAS, along with a data workflow management tool from Astronomer that is based on Apache Airflow, the company said. Third-party software from Amplitude, CARTO,, Kumo AI, Pinecone, RelationalAI, and Weights & Biases are also available.

 Snowflake also said that it was releasing a self-developed LLM, dubbed Document AI, designed to generate insights from documents.

Document AI, which is built on technology from Snowflake’s acquisition of Applica last year, is targeted at helping enterprises make more use of unstructured data, the company said, adding that the new LLM can help enhance enterprise productivity.

DbInsights’ Baer believes that the addition of the new LLM is a step to keep pace with rival offerings from the stables of AWS, Oracle, and Microsoft.

MLOps tools and other updates

In order to help enterprises with machine learning model operations (MLOps), Snowflake has introduced the Snowpark Model Registry.

 The registry, according to the company, is a unified repository for an enterprise’s machine learning models. It’s designed to enable users to centralize the publishing and discovery of models, thereby streamlining collaboration between data scientists and machine learning engineers.

Although rivals such as AWS, Databricks, Google Cloud and Microsoft offer MLOps tools already, analysts see the new Model Registry as an important update.

“Model registries and repositories are one of the new great battlefields in data as companies choose where to place their treasured proprietary or commercial models and ensure that the storage, metadata, and versioning are appropriately governed,” Park said.

In addition, Snowflake is also advancing the integration of Streamlit into its Data Cloud Platform, bringing it into public preview for a final fine-tuning before its general release.

Further, the company said that it was extending the use of Apache Iceberg tables to an enterprise’s own storage.

Other updates, mostly targeted at developers, include the integration of Git and a new command line interface (CLI) inside the Data Cloud Platform, both of which are in private preview.

While the native Git integration is expected to support CI/CD workflows, the new CLI will aid in application development and testing within Snowflake, the company said.

In order to help developers ingest streaming data and eliminate the boundaries between batch and streaming pipelines, Snowflake also unveiled new features in the form of Dynamic Tables and Snowpipe Streaming.

While Snowpipe Streaming is expected to be in general availability soon, Dynamic Tables is currently in public preview.

Snowflake also said that is Native Application Framework was now in public preview on AWS.

Next read this:

Posted Under: Database
6 key features of SingleStore Kai for MongoDB

Posted by on 27 June, 2023

This post was originally published on this site

Much of the data accumulated in today’s world is in JSON (JavaScript Object Notation) format. However, many of the databases designed with a JSON-first mindset have not been able to provide the sort of in-app analytics available in classic SQL systems, leaving a huge gap in the amount of the world’s data that is able to be analyzed in real time. In an era when even millisecond lags are too slow, this is a gap in the market that needs to be addressed.

SingleStore Kai API for MongoDB is intended to solve this problem, and to do so in a way that is simple and straightforward. Let’s take a closer look at the key features of SingleStore Kai.

100x faster analytics on JSON data

With SingleStore Kai, you can perform complex analytics on JSON data for MongoDB applications faster and more efficiently. On some benchmarks, SingleStoreDB was able to drive 100x faster analytical performance for most queries. How is this speed boost achieved? The SingleStore MongoDB API proxy translates MongoDB queries into SQL statements that are executed by SingleStoreDB to power real-time analytics for your applications.

Vector functionality for JSON

The new era of generative AI requires real-time analytics on all data, including JSON collections. SingleStoreDB supports vectors and fast vector similarity search using the $dotProduct and $euclideanDistance functions. With SingleStore Kai, developers can harness the vector and AI capabilities on JSON collections within MongoDB, enabling use cases like semantic search, image recognition, and similarity matching.

No code changes or data transformations

Developers can continue to use existing MongoDB queries. They don’t have to normalize or flatten data, or do extensive schema migrations to power fast analytics for their applications. SingleStore Kai requires no code changes, data transformations, schema migrations, or changes to existing queries.

Same MongoDB tools and drivers

SingleStore Kai supports the MongoDB wire protocol and allows MongoDB clients to communicate with a SingleStoreDB cluster. Developers can take advantage of fast analytics on SingleStoreDB without having to learn a new set of tools or APIs. And they can continue to use the same MongoDB tools and drivers their customers are most familiar with.

Best of both worlds (NoSQL and SQL)

SingleStoreDB was already MySQL wire protocol compatible. With the addition of SingleStore Kai for MongoDB, the database gives developers essentially the best of both worlds—the schema flexibility and simplicity of a JSON document store and the speed, efficiency, and complex analytical capabilities of a relational SQL database.

Easy data replication

As part of this MongoDB API offering, SingleStoreDB includes a fast and efficient replication service (in private preview) that copies MongoDB collections into SingleStoreDB. This service is natively integrated into SingleStoreDB and leverages one of SingleStore’s most widely used features, SingleStore Pipelines, to drive speedy replication and real-time change data capture, enabling customers to get started quickly and easily.

Real-time data and real-time analytics play a crucial role in modern business. With SingleStore Kai, regardless of whether you traditionally work in SQL or NoSQL, you now have the ability to do real-time analytics on the majority of data in our fast-moving world.

Jason Thorsness is a principal software engineer at SingleStore.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to

Next read this:

Posted Under: Database
Review: CodeWhisperer, Bard, and Copilot X

Posted by on 27 June, 2023

This post was originally published on this site

When I wrote about the GitHub Copilot preview in 2021, I noted that the AI pair programmer didn’t always generate good, correct, or even running code, but was still somewhat useful. At the time, I concluded that future versions could be real time-savers. Two years later, Copilot is improving. These days, it costs money even for individuals, and it has some competition. In addition, the scope of coding assistants has expanded beyond code generation to code explanations, pull request summaries, security scanning, and related tasks.

Three tools for AI pair programming

Let’s start with a quick overview of the tools under review, then we’ll dive in for a closer look at each one.

  • Amazon CodeWhisperer is the product that competes most directly with Copilot. A “coding companion” like Copilot, CodeWhisperer integrates with Visual Studio Code and JetBrains IDEs, generates code suggestions in response to comments and code completions based on existing code, and can scan code for security issues. CodeWhisperer supports five programming languages well, and another 10 at a lesser degree of support. It can optionally flag and log references to code it uses and optionally filter out code suggestions that resemble open source training data.
  • Google Bard is a web-based interface to LaMDA (Language Model for Dialogue Applications), a conversational AI model capable of fluid, multi-turn dialogue. Bard recently added the ability to help with coding and topics about coding. When Bard emits code that may be subject to an open source license, it cites its sources and provides the relevant information. Bard is also good at code explanations.
  • GitHub Copilot X is “leveling up” from the original Copilot with chat and terminal interfaces, support for pull requests, and early adoption of OpenAI’s GPT-4. Currently, to access Copilot X you need to have an active Copilot subscription and join the waiting list, with no guarantee about when you’ll get access to the new features. It took about a month for my invitation to arrive after I joined the waiting list.

Using one of these code generators is not the only way to generate code. To begin with, you can access general-purpose transformers like GPT-4 and its predecessors, including ChatGPT, BingGPT/Bing Chat (available in the Edge browser), and There are also other code-specific AI tools, such as StarCoder, Tabnine, Cody, AlphaCode, Polycoder, and Replit Ghostwriter. In every case I’ve mentioned, it is vital to use discretion and carefully test and review the generated code before using it.

How the tools were tested

In my previous article about code generation, I evaluated the AI code generators based on the rather easy task of writing a program to determine the number of days between two dates. Most did okay, although some needed more guidance than others. For this review, I tried the code generators on the more difficult task of scraping for a list of articles. I gave them an outline but no additional help. None generated correct code, although some came closer than others. As an additional task, I asked the tools that support code explanation to explain a Python code example from an MIT Open Courseware introductory programming course.

For reference, the outline I gave to the code generators is:

Scrape front page:
	Find all articles by looking for links with ‘article’ in the href; extract title, author, date from each
	List all articles alphabetically by title; eliminate duplicates 
	List all articles alphabetically by author last name
	List all articles latest first

In general, I tried to act like a more naive programmer than I am, just to see what the tools would do.

Now, let’s look more closely at each of our code generators.

Amazon CodeWhisperer

Within your IDE, Amazon CodeWhisperer analyzes your English language comments and surrounding code to infer what code it should generate to complete what you are typing. Then, it offers code as a suggestion that you can either accept or reject, or you can ask CodeWhisperer for alternate code, or ignore and continue writing your own code. CodeWhisperer’s large language model (LLM) was trained on billions of lines of code, including Amazon and open source code. Any given suggestion is based not only on your comments and immediate code context, but also on the contents of other files open in the IDE.

In addition to code generation, CodeWhisperer can scan your Python, Java, and JavaScript code for security vulnerabilities and suggest fixes for them. The vulnerability lists it consults include Open Web Application Security Project (OWASP), crypto library best practices, AWS API best practices, and other API best practices. Security scans occur on-demand, unlike code completion, which is offered continuously as you code unless you turn off suggestions.

Programming languages and IDEs

CodeWhisperer’s best programming languages for code generation (the most prevalent languages in the training corpus) are Java, Python, JavaScript, TypeScript, and C#. It has been trained to a lesser extent on Ruby, Go, PHP, C++, C, Shell, Scala, Rust, Kotlin, and SQL.

There are CodeWhisperer plugins for Visual Studio Code and JetBrains IDEs. You can also activate CodeWhisperer for use inside AWS Cloud9 and AWS Lambda; in both cases, you must edit your IAM permissions as well as checking the Enable CodeWhisperer box.

I installed CodeWhisperer in Visual Studio Code. There are four steps:

  • Add the CodeWhisperer plugin to VS Code.
  • Add a connection to AWS.
  • Sign in on the AWS website.
  • Start CodeWhisperer from the AWS developer tools panel.
CodeWhisperer usage instructions. IDG

Figure 1. Once you have installed and authorized CodeWhisperer (at left), you can see the usage instructions, shown here on the right.

Code suggestions and completions

It’s worth watching some of the videos demonstrating CodeWhisperer’s capabilities, listed on the CodeWhisperer resource page. They’ll tell you more than I can in words and screenshots. While watching them, it became clear to me that CodeWhisperer shines on code that calls AWS APIs.

Nevertheless, here’s a screenshot of the generated code from CodeWhisperer:

CodeWhisperer generating code. IDG

Figure 2. CodeWhisperer generating code.

Note that the generated code did not work, and also that CodeWhisperer generated multiple functions for more or less the same purpose. It was fairly close, however, and CodeWhisperer kind of read my mind about using Beautiful Soup to scrape the web page.

CodeWhisper program. IDG

Figure 3. Trying the CodeWhisperer-generated program. 

Security scans and open source references

I ran the CodeWhisperer security scan on my project, as well. Unsurprisingly, no issues were detected in this file, although the CodeWhisperer scan found a resource leak in another file generated by GitHub Copilot X. The CodeWhisperer security scan is a differentiating feature that hasn’t yet surfaced in the competing AI code generators.

This file did not trigger any open source references, but when I opened the CodeWhisperer Reference Log, I found a link to the setting that controls whether CodeWhisperer will suggest code that has such references. In the Amazon examples I viewed, code suggestions with references displayed the license type at the top.

The bottom line for CodeWhisperer

CodeWhisperer generates code in an IDE and does security reviews, but it lacks a chat window and code explanations. The tool works more or less as advertised, and the price (free) is right for individual developers. For corporate developers, the $19 per month price for CodeWhisperer is the same as for Copilot. I’d favor CodeWhisperer over Copilot for shops that use AWS heavily and shops that need to know when code suggestions refer to open source.

Google Bard

Bard support for programming was announced April 21, 2023. The blog post mentioned “more than 20 programming languages including C++, Go, Java, Javascript, Python and Typescript.”

When I asked Bard for the full list, I received the following response:

Bard supports over 20 programming languages, including:

  • C++
  • Go
  • Java
  • JavaScript
  • Python
  • TypeScript
  • C#
  • PHP
  • Ruby
  • Swift
  • Kotlin
  • Rust
  • Julia
  • R
  • Haskell
  • Erlang
  • Scala
  • Lua
  • OCaml
  • Scheme

Bard can generate code in these languages, debug code, and explain code. It can also translate from one programming language to another.

Please note that Bard is still under development, so its capabilities may not be perfect. It is always a good idea to review AI-generated code carefully before using it.

I tested Bard on my code explanation and code generation tasks, which are shown below. The code explanation was quite good—so good that I wondered whether Bard had been trained on that MIT course code sample, which dates from 2008 and has been available on the web since then.

The code generation for the site scraper looked fairly credible at first glance but had several obvious problems on further inspection. It was a matter of clicking twice to open the code in Colab, but it threw a type error when I tried to run it, and it didn’t even get as far as the code I had identified as wrong.

Rather than fixing the problems, I tried running the code generation again, and the result made a bit more sense. It threw an attribute error when I tried to run it in Colab, however. I fed that traceback back to Bard, it generated a fix, I ran the “fixed” code in Colab, and there was a type error in a different spot. Enough, already! If I was actually a naive programmer, I would have thrown up my hands.

What about citing web pages and open source code? The Bard FAQ had this to say:

Bard, like some other standalone LLM experiences, is intended to generate original content and not replicate existing content at length. We’ve designed our systems to limit the chances of this occurring, and we will continue to improve how these systems function. If Bard does directly quote at length from a webpage, it cites that page.

Sometimes the same content may be found on multiple webpages and Bard attempts to point to a popular source. In the case of citations to code repositories, the citation may also reference an applicable open source license.

Given that Bard is completely free at this time, and requires no setup, it’s certainly worth trying when you need to generate, debug, and explain code in any of these 20 languages. Don’t forget to review, test, and debug any generated code.

Here is the instruction I gave to Bard and the resulting dialog:

I'm about to give you some code to explain. Please wait until the code is complete.

Sure, I will wait until you are finished giving me the code. Please feel free to take your time. I am here to help you in any way that I can.

Here are some tips for explaining code:
•	Start by explaining the overall purpose of the code. What does it do? What problem is it trying to solve?
•	Then, break down the code into smaller chunks and explain each one in turn.
•	Use clear and concise language. Avoid jargon and technical terms.
•	Use examples to illustrate your points.
•	Be patient and don't be afraid to ask questions.

I am confident that I can help you explain any code that you give me. Just let me know what you need and I will do my best to help.



Page 2

And here is the code I provided:

import pylab, random

class Stock(object):
    def __init__(self, price, distribution, vol):
        self.price = price
        self.history = [price]
        self.distribution = distribution
        self.vol = vol
        self.lastChangeInfluence = 0.0
    def setPrice(self, price):
        self.price = price
    def getPrice(self):
        return self.price
    def makeMove(self, bias, mo):
        oldPrice = self.price
        baseMove = self.distribution(self.vol) + bias
        self.price = self.price * (1.0 + baseMove)
        self.price += mo*random.choice([0.0, 1.0])*self.lastChangeInfluence
        change = self.price - oldPrice
        if change >= 0:
            self.lastChangeInfluence = min(change, oldPrice*0.01)
            self.lastChangeInfluence = max(change, -oldPrice*0.01)
    def showHistory(self, fig, test):
        pylab.title('Closing Prices, Test ' + test)

class SimpleMarket(object):
    def __init__(self, numStks, volUB):
        self.stks = []
        self.bias = 0.0
        for n in range(numStks):
            volatility = random.uniform(0, volUB)
            distribution = lambda vol: random.gauss(0.0, vol)
            stk = Stock(100.0, distribution, volatility)
    def addStock(self, stk):
    def setBias(self, bias):
        self.bias = bias
    def getBias(self):
        return self.bias
    def getStocks(self):
        return self.stks[:]
    def move(self, mo):
        prices = []
        for s in self.stks:
            s.makeMove(self.bias, mo)
        return prices

class Market(SimpleMarket):
    def __init__(self, numStks, volUB, dailyBiasRange):
        SimpleMarket.__init__(self, numStks, volUB)
        self.dailyBiasRange = dailyBiasRange
    def move(self, mo):
        prices = []
        dailyBias = random.gauss(self.dailyBiasRange[0], self.dailyBiasRange[1])
        for s in self.stks:
            s.makeMove(self.bias + dailyBias, mo)
        return prices

def simMkt(mkt, numDays, mo):
    endPrices = []
    for i in range(numDays):
        vals = mkt.move(mo)
        vals = pylab.array(vals)
        mean = vals.sum()/float(len(vals))
    return endPrices

def plotAverageOverTime(endPrices, title):

def plotDistributionAtEnd(mkt, title, color):
    prices = []
    sumSoFar = 0
    for s in mkt.getStocks():
        sumSoFar += s.getPrice()
    mean = sumSoFar/float(len(prices))
    pylab.plot(prices, color)
    pylab.axhline(mean, color = color)
    pylab.ylabel('Last Sale')

def runTrial(showHistory, test, p):
    colors = ['b','g','r','c','m','y','k']

    mkt = Market(p['numStocks'], p['volUB'], p['dailyBiasRange'])
    endPrices = simMkt(mkt, p['numDays'], p['mo'])
    plotAverageOverTime(endPrices, 'Average Closing Prices')
    plotDistributionAtEnd(mkt, 'Distribution of Prices', colors[test%len(colors)])
    if showHistory:
        for s in mkt.getStocks():
            s.showHistory(test+2, str(test))

def runTest(numTrials):
    #Constants used in testing
    numDaysPerYear = 200.0
    params = {}
    params['numDays'] = 200
    params['numStocks'] = 500
    params['bias'] = 0.1/numDaysPerYear #General market bias
    params['volUB'] = 12.0/numDaysPerYear #Upper bound on volatility for a stock
    params['mo'] = 1.1/numDaysPerYear #Momentum factor
    params['dailyBiasRange'] = (0.0, 4.0/200.0)

    for t in range(1, numTrials+1):
        runTrial(True, t, params)


Here is some of Bard’s response, which was quite in depth:

The code you have provided is a Python program that simulates a stock market. The program defines three classes:

Stock: This class represents a single stock. It has properties for the stock's price, history, distribution, volatility, and last change influence.

SimpleMarket: This class represents a simple stock market. It has a list of stocks, a bias, and a volatility upper bound.

Market: This class inherits from SimpleMarket and adds a daily bias range property. The program also defines several functions:
•	simMkt: This function simulates the market for a given number of days and momentum factor.
•	plotAverageOverTime: This function plots the average closing price of the stocks over time.
•	plotDistributionAtEnd: This function plots the distribution of prices at the end of the simulation.
•	runTrial: This function runs a single simulation and plots the results.
•	runTest: This function runs a number of simulations and plots the results.

The program is run by calling the runTest function with the desired number of simulations. The program will then generate three plots:
•	A plot of the average closing price of the stocks over time.
•	A plot of the distribution of prices at the end of the simulation.
•	A plot of the history of prices for each stock.

The program can be used to simulate a variety of different stock markets. By changing the values of the parameters, you can change the volatility of the market, the bias of the market, and the momentum factor. You can also use the program to simulate different trading strategies. For example, you could simulate a strategy that buys stocks when they are undervalued and sells them when they are overvalued.

Bard also provided an explanation of how each part of the code worked:

•	The Stock class represents a single stock. It has properties for the stock's price, history, distribution, volatility, and last change influence. The price property is the current price of the stock. The history property is a list of the stock's prices over time. The distribution property is a function that generates random prices for the stock. The volatility property is a measure of how much the stock's price can fluctuate. The lastChangeInfluence property is a measure of how much the stock's price is affected by its previous change.

•	The SimpleMarket class represents a simple stock market. It has a list of stocks, a bias, and a volatility upper bound. The list of stocks property is a list of the stocks in the market. The bias property is a measure of how much the market is biased towards rising prices. The volatility upper bound property is a measure of how much the market can fluctuate.

•	The Market class inherits from SimpleMarket and adds a daily bias range property. The daily bias range property is a measure of how much the market's bias can change from day to day.

•	The simMkt function simulates the market for a given number of days and momentum factor. The number of days parameter is the number of days to simulate. The momentum factor parameter is a measure of how much the prices of stocks are affected by their momentum.

•	The plotAverageOverTime function plots the average closing price of the stocks over time. The plotDistributionAtEnd function plots the distribution of prices at the end of the simulation.

•	The runTrial function runs a single simulation and plots the results. The showHistory parameter is a boolean value that determines whether the history of prices for each stock is plotted. The test parameter is a unique identifier for the simulation.

•	The runTest function runs a number of simulations and plots the results. The numTrials parameter is the number of simulations to run.

I hope this explanation is helpful. Please let me know if you have any other questions.

When I followed up, typing in, “where is the pylab module?” Bard responded with the location of the module, how to import it, a description of features, and a pointer to the documentation.

Additionally, here is the Bard-generated code for scraping for a list of articles:

import requests
from bs4 import BeautifulSoup

def scrape_infoworld():
  """Scrape the InfoWorld front page and return a list of articles."""

  # Get the HTML from the InfoWorld front page.
  response = requests.get("")
  soup = BeautifulSoup(response.content, "html.parser")

  # Find all articles by looking for links with 'article' in the href.
  articles = soup.find_all("a", href=lambda x: "article" in x["href"])

  # Extract the title, author, and date from each article.
  article_data = []
  for article in articles:
    title = article.find("h3").text
    author = article.find("span", class_="byline").text
    date = article.find("time").text
      "title": title,
      "author": author,
      "date": date,

  # Sort the articles alphabetically by title.
  article_data.sort(key=lambda x: x["title"])

  # Remove any duplicate articles.
  article_data = list(set(article_data))

  # Sort the articles alphabetically by author last name.
  article_data.sort(key=lambda x: x["author"].split()[-1])

  # Sort the articles latest first.
  article_data.sort(key=lambda x: x["date"], reverse=True)

  return article_data

if __name__ == "__main__":
  articles = scrape_infoworld()

  # Print the articles.
  for article in articles:


This code will first get the HTML from the InfoWorld front page. Then, it will find all articles by looking for links with 'article' in the href. For each article, it will extract the title, author, and date. The articles will then be sorted alphabetically by title, removed of any duplicates, sorted alphabetically by author last name, and sorted latest first. Finally, the articles will be printed.

And here is the runtime result:

TypeError                                 Traceback (most recent call last)
<ipython-input-1-afbe7808577e> in <cell line: 40>()
     40 if __name__ == "__main__":
---> 41   articles = scrape_infoworld()
     43   # Print the articles.

6 frames

<ipython-input-1-afbe7808577e> in <lambda>(x)
     11   # Find all articles by looking for links with 'article' in the href.
---> 12   articles = soup.find_all("a", href=lambda x: "article" in x["href"])
     14   # Extract the title, author, and date from each article.

TypeError: string indices must be integers

The bottom line for Bard

Bard has a chat interface and both generates and explains code, but it doesn’t have an interactive IDE integration. Given that Bard is completely free at this time, and requires no setup, it’s certainly worth trying when you need to generate, debug, and explain code in any of the 20 supported languages.

GitHub Copilot X

GitHub Copilot X is greatly improved over the original GitHub Copilot, and can sometimes generate a correct function and set of tests without much human help. It still makes mistakes and hallucinates (generates false information), but not nearly as much as it once did.

In addition to generating code within a programming editor, currently supporting only the most current versions of Visual Studio and the latest insider version of Visual Studio Code, Copilot X adds a GPT-4 chat panel to the editor. It also adds a terminal interface, support for generating unit tests and pull request descriptions, and the ability to extract explanations from documentation.

I asked the Copilot X chat what programming languages it supports, and it answered “̉I support a wide range of programming languages, including but not limited to: Python, JavaScript, TypeScript, Ruby, Java, C++, C#, PHP, Go, Swift, Kotlin, Rust, and many more.”  I did my testing primarily in Python.

When I used the Copilot Chat facility to ask Copilot X to explain the MIT market simulation code, it gave a partially correct answer. I had to metaphorically pull its teeth to get it to explain the rest of the code.

Copilot X explanation. IDG

Figure 4. Copilot X did a decent but incomplete job of explaining the market simulator.

Copilot X’s most notable failure was the web-scraping code generation task. The tool generated a bunch of superficially credible-looking code that didn’t use Beautiful Soup, but it was clear from reviewing the code that it would never work. I kept bringing the problems to Copilot Chat, but it just dug itself a deeper hole. I could probably have started over and given it better hints, including handing it an import from bs4 and adding some comments showing the HTML and directory structure of the InfoWorld home page. I didn’t do it because that would not be in character for the naive coder persona I had adopted for this round of tests.

Copilot X responds to user feedback. IDG

Figure 5. Copilot X tried to generate the web scraping code without using Beautiful Soup (bs4). Later when I chatted about the solution it generated, it first claimed that it was using Beautiful Soup, but then admitted that it could not find an import.

As with all AI helpers, you have to take the code generated by Copilot X with a huge grain of salt, just as you would for a pull request from an unknown programmer.

The bottom line for Copilot X

In addition to generating code within an IDE, Copilot X adds a GPT-4 chat panel to the editor. It also adds a terminal interface, support for unit test generation, support for generating pull request descriptions, and the ability to extract explanations from technical documentation. Copilot X costs $10 per month for individuals and $19 per user per month for businesses. 


GitHub Copilot X works decently on simple problems, but not necessarily better than the combination of Amazon CodeWhisperer in a code editor and Google Bard in a browser. It’s too bad that CodeWhisperer doesn’t yet have a chat capability or the facility for explaining code, and it’s too bad that Bard doesn’t exactly integrate with an editor or IDE.

I’d be tempted to recommend Copilot X if it hadn’t gone off the rails on my advanced code generation task—mainly because it integrates chat and code generation in an editor. At this point, however, Copilot X isn’t quite ready. Overall, none of the code generation products are really up to snuff, although both Bard and Copilot X do a decent job of code explanation.

All of these products are in active development, so my recommendation is to keep watching them and experimenting, but don’t put your faith in any of them just yet.

Next read this:

Posted Under: Tech Reviews
Databricks’ $1.3 billion MosaicML acquisition to boost generative AI offerings

Posted by on 26 June, 2023

This post was originally published on this site

Data lakehouse provider Databricks on Monday said that it was acquiring large language model (LLM) and model-training software provider MosaicMLL for $1.3 billion in order to boost its generative AI offerings.

Databricks, which already offers an LLM named Dolly, is expected to add MosaicMLL’s models, training and inference capabilities to its lakehouse platform for enterprises to develop generative AI applications, the company said, underlining its open source LLM policy.

Dolly was developed on open data sets in order to cater to enterprises’ demand to control LLMs used to develop new applications, in contrast to closed-loop trained models, such as ChatGPT, that put constraints on commercial usage.

MosaicMLL’s models, namely MPT-7B and the recently released MPT-30B, are open source, putting them in line with Databricks’ existing policy.

Another advantage of these models, according to MosaicMLL, is the “zero human intervention” feature that allows the training systems to be automated.

“We trained MPT-7B with zero human intervention from start to finish: over 9.5 days on 440 GPUs, the MosaicML platform detected and addressed 4 hardware failures and resumed the training run automatically, and — due to architecture and optimization improvements we made — there were no catastrophic loss spikes,” MosaicMLL wrote in a blog post.

The deal calls for MosaicMLL’s entire team of over 60 employees, including co-founder CEO Naveen Rao, to move to Databricks, where they will continue to work on developing more foundation models, the companies said.  

MosaicMLL’s existing customers, according to a company post, will still be able to access their LLMs and inference offerings. Existing customers include Allen Institute for AI, Generally Intelligent, Hippocratic AI, Replit and Scatter Labs. The San Francisco-based startup, which was founded in 2021, has raised nearly $64 million to date from investors including Lux Capital, DCVC, Future Ventures, Maverick Ventures, and Playground.

The $1.3 billion deal includes retention packages for MosaicMLL employees, Databricks said.

In May, the company acquired AI-centric data governance platform provider Okera for an undisclosed sum.

Databrick’s acquisition of MosaicMLL also comes just weeks after a rival, Snowflake, acquired Mountain View-based AI startup Neeva in an effort to add generative AI-based search to its Data Cloud platform.

Next read this:

Posted Under: Database
Databricks’ $1.3 billion MosiacML acquisition to boost generative AI offerings

Posted by on 26 June, 2023

This post was originally published on this site

Data lakehouse provider Databricks on Monday said that it was acquiring large language model (LLM) and model-training software provider MosiacML for $1.3 billion in order to boost its generative AI offerings.

Databricks, which already offers an LLM named Dolly, is expected to add MosiacML’s models, training and inference capabilities to its lakehouse platform for enterprises to develop generative AI applications, the company said, underlining its open source LLM policy.

Dolly was developed on open data sets in order to cater to enterprises’ demand to control LLMs used to develop new applications, in contrast to closed-loop trained models, such as ChatGPT, that put constraints on commercial usage.

MosiacML’s models, namely MPT-7B and the recently released MPT-30B, are open source, putting them in line with Databricks’ existing policy.

Another advantage of these models, according to MosiacML, is the “zero human intervention” feature that allows the training systems to be automated.

“We trained MPT-7B with zero human intervention from start to finish: over 9.5 days on 440 GPUs, the MosaicML platform detected and addressed 4 hardware failures and resumed the training run automatically, and — due to architecture and optimization improvements we made — there were no catastrophic loss spikes,” MosiacML wrote in a blog post.

The deal calls for MosiacML’s entire team of over 60 employees, including co-founder CEO Naveen Rao, to move to Databricks, where they will continue to work on developing more foundation models, the companies said.  

MosiacML’s existing customers, according to a company post, will still be able to access their LLMs and inference offerings. Existing customers include Allen Institute for AI, Generally Intelligent, Hippocratic AI, Replit and Scatter Labs. The San Francisco-based startup, which was founded in 2021, has raised nearly $64 million to date from investors including Lux Capital, DCVC, Future Ventures, Maverick Ventures, and Playground.

The $1.3 billion deal includes retention packages for MosiacML employees, Databricks said.

In May, the company acquired AI-centric data governance platform provider Okera for an undisclosed sum.

Databrick’s acquisition of MosiacML also comes just weeks after a rival, Snowflake, acquired Mountain View-based AI startup Neeva in an effort to add generative AI-based search to its Data Cloud platform.

Next read this:

Posted Under: Database
Page 7 of 12« First...56789...Last »

Social Media

Bulk Deals

Subscribe for exclusive Deals

Recent Post



Subscribe for exclusive Deals

Copyright 2015 - InnovatePC - All Rights Reserved

Site Design By Digital web avenue