Prisma.js: Code-first ORM in JavaScript
Posted by Richy George on 16 August, 2023
![]()
This
post was originally published on
this site
Prisma is a popular data-mapping layer (ORM) for server-side JavaScript and TypeScript. Its core purpose is to simplify and automate how data moves between storage and application code. Prisma supports a wide range of datastores and provides a powerful yet flexible abstraction layer for data persistence. Get a feel for Prisma and some of its core features with this code-first tour.
An ORM layer for JavaScript
Object-relational mapping (ORM) was pioneered by the Hibernate framework in Java. The original goal of object-relational mapping was to overcome the so-called impedance mismatch between Java classes and RDBMS tables. From that idea grew the more broadly ambitious notion of a general-purpose persistence layer for applications. Prisma is a modern JavaScript-based evolution of the Java ORM layer.
Prisma supports a range of SQL databases and has expanded to include the NoSQL datastore, MongoDB. Regardless of the type of datastore, the overarching goal remains: to give applications a standardized framework for handling data persistence.
The domain model
We’ll use a simple domain model to look at several kinds of relationships in a data model: many-to-one, one-to-many, and many-to-many. (We’ll skip one-to-one, which is very similar to many-to-one.)
Prisma uses a model definition (a schema) that acts as the hinge between the application and the datastore. One approach when building an application, which we’ll take here, is to start with this definition and then build the code from it. Prisma automatically applies the schema to the datastore.
The Prisma model definition format is not hard to understand, and you can use a graphical tool, PrismaBuilder, to make one. Our model will support a collaborative idea-development application, so we’ll have User, Idea, and Tag models. A User
can have many Ideas
(one-to-many) and an Idea
has one User
, the owner (many-to-one). Ideas
and Tags
form a many-to-many relationship. Listing 1 shows the model definition.
Listing 1. Model definition in Prisma
datasource db {
provider = "sqlite"
url = "file:./dev.db"
}
generator client {
provider = "prisma-client-js"
}
model User {
id Int @id @default(autoincrement())
name String
email String @unique
ideas Idea[]
}
model Idea {
id Int @id @default(autoincrement())
name String
description String
owner User @relation(fields: [ownerId], references: [id])
ownerId Int
tags Tag[]
}
model Tag {
id Int @id @default(autoincrement())
name String @unique
ideas Idea[]
}
Listing 1 includes a datasource definition (a simple SQLite database that Prisma includes for development purposes) and a client definition with “generator client” set to prisma-client-js
. The latter means Prisma will produce a JavaScript client the application can use for interacting with the mapping created by the definition.
As for the model definition, notice that each model has an id
field, and we are using the Prisma @default(autoincrement())
annotation to get an automatically incremented integer ID.
To create the relationship from User
to Idea
, we reference the Idea
type with array brackets: Idea[]
. This says: give me a collection of Ideas
for the User
. On the other side of the relationship, you give Idea
a single User
with: owner User @relation(fields: [ownerId], references: [id])
.
Besides the relationships and the key ID fields, the field definitions are straightforward; String
for Strings
, and so on.
Create the project
We’ll use a simple project to work with Prisma’s capabilities. The first step is to create a new Node.js project and add dependencies to it. After that, we can add the definition from Listing 1 and use it to handle data persistence with Prisma’s built-in SQLite database.
To start our application, we’ll create a new directory, init an npm
project, and install the dependencies, as shown in Listing 2.
Listing 2. Create the application
mkdir iw-prisma
cd iw-prisma
npm init -y
npm install express @prisma/client body-parser
mkdir prisma
touch prisma/schema.prisma
Now, create a file at prisma/schema.prisma
and add the definition from Listing 1. Next, tell Prisma to make SQLite ready with a schema, as shown in Listing 3.
Listing 3. Set up the database
npx prisma migrate dev --name init
npx prisma migrate deploy
Listing 3 tells Prisma to “migrate” the database, which means applying schema changes from the Prisma definition to the database itself. The dev
flag tells Prisma to use the development profile, while --name
gives an arbitrary name for the change. The deploy
flag tells prisma to apply the changes.
Use the data
Now, let’s allow for creating users with a RESTful endpoint in Express.js. You can see the code for our server in Listing 4, which goes inside the iniw-prisma/server.js
file. Listing 4 is vanilla Express code, but we can do a lot of work against the database with minimal effort thanks to Prisma.
Listing 4. Express code
const express = require('express');
const bodyParser = require('body-parser');
const { PrismaClient } = require('@prisma/client');
const prisma = new PrismaClient();
const app = express();
app.use(bodyParser.json());
const port = 3000;
app.listen(port, () => {
console.log(`Server is listening on port ${port}`);
});
// Fetch all users
app.get('/users', async (req, res) => {
const users = await prisma.user.findMany();
res.json(users);
});
// Create a new user
app.post('/users', async (req, res) => {
const { name, email } = req.body;
const newUser = await prisma.user.create({ data: { name, email } });
res.status(201).json(newUser);
});
Currently, there are just two endpoints, /users GET
for getting a list of all the users, and /user POST
for adding them. You can see how easily we can use the Prisma client to handle these use cases, by calling prisma.user.findMany()
and prisma.user.create()
, respectively.
The findMany()
method without any arguments will return all the rows in the database. The create()
method accepts an object with a data field holding the values for the new row (in this case, the name and email—remember that Prisma will auto-create a unique ID for us).
Now we can run the server with: node server.js
.
Testing with CURL
Let’s test out our endpoints with CURL, as shown in Listing 5.
Listing 5. Try out the endpoints with CURL
$ curl http://localhost:3000/users
[]
$ curl -X POST -H "Content-Type: application/json" -d '{"name":"George Harrison","email":"george.harrison@example.com"}' http://localhost:3000/users
{"id":2,"name":"John Doe","email":"john.doe@example.com"}{"id":3,"name":"John Lennon","email":"john.lennon@example.com"}{"id":4,"name":"George Harrison","email":"george.harrison@example.com"}
$ curl http://localhost:3000/users
[{"id":2,"name":"John Doe","email":"john.doe@example.com"},{"id":3,"name":"John Lennon","email":"john.lennon@example.com"},{"id":4,"name":"George Harrison","email":"george.harrison@example.com"}]
Listing 5 shows us getting all users and finding an empty set, followed by adding users, then getting the populated set.
Next, let’s add an endpoint that lets us create ideas and use them in relation to users, as in Listing 6.
Listing 6. User ideas POST endpoint
app.post('/users/:userId/ideas', async (req, res) => {
const { userId } = req.params;
const { name, description } = req.body;
try {
const user = await prisma.user.findUnique({ where: { id: parseInt(userId) } });
if (!user) {
return res.status(404).json({ error: 'User not found' });
}
const idea = await prisma.idea.create({
data: {
name,
description,
owner: { connect: { id: user.id } },
},
});
res.json(idea);
} catch (error) {
console.error('Error adding idea:', error);
res.status(500).json({ error: 'An error occurred while adding the idea' });
}
});
app.get('/userideas/:id', async (req, res) => {
const { id } = req.params;
const user = await prisma.user.findUnique({
where: { id: parseInt(id) },
include: {
ideas: true,
},
});
if (!user) {
return res.status(404).json({ message: 'User not found' });
}
res.json(user);
});
In Listing 6, we have two endpoints. The first allows for adding an idea using a POST
at /users/:userId/ideas
. The first thing it needs to do is recover the user by ID, using prisma.user.findUnique()
. This method is used for finding a single entity in the database, based on the passed-in criteria. In our case, we want the user with the ID from the request, so we use: { where: { id: parseInt(userId) } }
.
Once we have the user, we use prisma.idea.create
to create a new idea
. This works just like when we created the user
, but we now have a relationship field. Prisma lets us create the association between the new idea
and user
with: owner: { connect: { id: user.id } }
.
The second endpoint is a GET
at /userideas/:id
. The purpose of this endpoint is to take the user ID and return the user including their ideas. This gives us a look at the where
clause in use with the findUnique
call, as well as the include
modifier. The modifier is used here to tell Prisma to include the associated ideas. Without this, the ideas would not be included, because Prisma by default uses a lazy loading fetch strategy for associations.
To test the new endpoints, we can use the CURL commands shown in Listing 7.
Listing 7. CURL for testing endpoints
$ curl -X POST -H "Content-Type: application/json" -d '{"name":"New Idea", "description":"Idea description"}' http://localhost:3000/users/3/ideas
$ curl http://localhost:3000/userideas/3
{"id":3,"name":"John Lennon","email":"john.lennon@example.com","ideas":[{"id":1,"name":"New Idea","description":"Idea description","ownerId":3},{"id":2,"name":"New Idea","description":"Idea description","ownerId":3}]}
We are able to add ideas and recover users with them.
Many-to-many with tags
Now let’s add endpoints for handling tags within the many-to-many relationship. In Listing 8, we handle tag creation and associate a tag
and an idea
.
Listing 8. Adding and displaying tags
// create a tag
app.post('/tags', async (req, res) => {
const { name } = req.body;
try {
const tag = await prisma.tag.create({
data: {
name,
},
});
res.json(tag);
} catch (error) {
console.error('Error adding tag:', error);
res.status(500).json({ error: 'An error occurred while adding the tag' });
}
});
// Associate a tag with an idea
app.post('/ideas/:ideaId/tags/:tagId', async (req, res) => {
const { ideaId, tagId } = req.params;
try {
const idea = await prisma.idea.findUnique({ where: { id: parseInt(ideaId) } });
if (!idea) {
return res.status(404).json({ error: 'Idea not found' });
}
const tag = await prisma.tag.findUnique({ where: { id: parseInt(tagId) } });
if (!tag) {
return res.status(404).json({ error: 'Tag not found' });
}
const updatedIdea = await prisma.idea.update({
where: { id: parseInt(ideaId) },
data: {
tags: {
connect: { id: tag.id },
},
},
});
res.json(updatedIdea);
} catch (error) {
console.error('Error associating tag with idea:', error);
res.status(500).json({ error: 'An error occurred while associating the tag with the idea' });
}
});
We’ve added two endpoints. The POST
endpoint, used for adding a tag, is familiar from the previous examples. In Listing 8, we’ve also added the POST
endpoint for associating an idea with a tag.
To associate an idea and a tag, we utilize the many-to-many mapping from the model definition. We grab the Idea
and Tag
by ID and use the connect
field to set them on one another. Now, the Idea
has the Tag
ID in its set of tags and vice versa. The many-to-many association allows up to two one-to-many relationships, with each entity pointing to the other. In the datastore, this requires creating a “lookup table” (or cross-reference table), but Prisma handles that for us. We only need to interact with the entities themselves.
The last step for our many-to-many feature is to allow finding Ideas by Tag
and finding the Tags
on an Idea
. You can see this part of the model in Listing 9. (Note that I have removed some error handling for brevity.)
Listing 9. Finding tags by idea and ideas by tags
// Display ideas with a given tag
app.get('/ideas/tag/:tagId', async (req, res) => {
const { tagId } = req.params;
try {
const tag = await prisma.tag.findUnique({
where: {
id: parseInt(tagId)
}
});
const ideas = await prisma.idea.findMany({
where: {
tags: {
some: {
id: tag.id
}
}
}
});
res.json(ideas);
} catch (error) {
console.error('Error retrieving ideas with tag:', error);
res.status(500).json({
error: 'An error occurred while retrieving the ideas with the tag'
});
}
});
// tags on an idea:
app.get('/ideatags/:ideaId', async (req, res) => {
const { ideaId } = req.params;
try {
const idea = await prisma.idea.findUnique({
where: {
id: parseInt(ideaId)
}
});
const tags = await prisma.tag.findMany({
where: {
ideas: {
some: {
id: idea.id
}
}
}
});
res.json(tags);
} catch (error) {
console.error('Error retrieving tags for idea:', error);
res.status(500).json({
error: 'An error occurred while retrieving the tags for the idea'
});
}
});
Here, we have two endpoints: /ideas/tag/:tagId
and /ideatags/:ideaId
. They work very similarly to find ideas
for a given tag ID and tags on a given idea ID. Essentially, the querying works just like it would in a one-to-many relationship, and Prisma deals with walking the lookup table. For example, for finding tags on an idea, we use the tag.findMany
method with a where
clause looking for ideas with the relevant ID, as shown in Listing 10.
Listing 10. Testing the tag-idea many-to-many relationship
$ curl -X POST -H "Content-Type: application/json" -d '{"name":"Funny Stuff"}' http://localhost:3000/tags
$ curl -X POST http://localhost:3000/ideas/1/tags/2
{"idea":{"id":1,"name":"New Idea","description":"Idea description","ownerId":3},"tag":{"id":2,"name":"Funny Stuff"}}
$ curl localhost:3000/ideas/tag/2
[{"id":1,"name":"New Idea","description":"Idea description","ownerId":3}]
$ curl localhost:3000/ideatags/1
[{"id":1,"name":"New Tag"},{"id":2,"name":"Funny Stuff"}]
Conclusion
Although we have hit on some CRUD and relationship basics here, Prisma is capable of much more. It gives you cascading operations like cascading delete, fetching strategies that allow you to fine-tune how objects are returned from the database, transactions, a query and filter API, and more. Prisma also allows you to migrate your database schema in accord with the model. Moreover, it keeps your application database-agnostic by abstracting all database client work in the framework.
Prisma puts a lot of convenience and power at your fingertips for the cost of defining and maintaining the model definition. It’s easy to see why this ORM tool for JavaScript is a popular choice for developers.
What SQL users should know about time series data
Posted by Richy George on 15 August, 2023
![]()
This
post was originally published on
this site
SQL often struggles when it comes to managing massive amounts of time series data, but it’s not because of the language itself. The main culprit is the architecture that SQL typically works in, namely relational databases, which quickly become inefficient because they’re not designed for analytical queries of large volumes of time series data.
Traditionally, SQL is used with relational database management systems (RDBMS) that are inherently transactional. They are structured around the concept of maintaining and updating records based on a rigid, predefined schema. For a long time, the most widespread type of database was relational, with SQL as its inseparable companion, so it’s understandable that many developers and data analysts are comfortable with it.
However, the arrival of time series data brings new challenges and complexities to the field of relational databases. Applications, sensors, and an array of devices produce a relentless stream of time series data that does not neatly fit into a fixed schema, as relational data does. This ceaseless data flow creates colossal data sets, leading to analytical workloads that demand a unique type of database. It is in these situations where developers tend to shift toward NoSQL and time series databases to handle the vast quantities of semi-structured or unstructured data generated by edge devices.
While the design of traditional SQL databases is ill-suited for handling time series, using a purpose-built time series database that accommodates SQL has offered developers a lifeline. SQL users can now utilize this familiar language to develop real-time applications, and effectively collect, store, manage, and analyze the burgeoning volumes of time series data.
However, despite this new capability, SQL users must consider certain characteristics of time series data to avoid potential issues or challenges down the road. Below I discuss four key considerations to keep in mind when diving head-first into SQL queries of time series data.
Time series data is inherently non-relational
That means it may be necessary to reorient the way we think about using time series data. For example, an individual time series data point on its own doesn’t have much use. It is the rest of the data in the series that provides the critical context for any single datum. Therefore, users look at time series observations in groups, but individual observations are all discrete. To quickly uncover insights from this data, users need to think in terms of time and be sure to define a window of time for their queries.
Since the value of each data point is directly influenced by other data points in the sequence, time series data is increasingly used to perform real-time analytics to identify trends and patterns, allowing developers and tech leaders to make informed decisions very quickly. This is much more challenging with relational data due to the time and resources it can take to query related data from multiple tables.
Scalability is of paramount importance
As we connect more and more equipment to the internet, the amount of generated data grows exponentially. Once these data workloads grow beyond trivial—in other words, when they enter a production environment—a transactional database will not be able to scale. At that point, data ingestion becomes a bottleneck and developers can’t query data efficiently. And none of this can happen in real time, because of the latency due to database reads and writes.
A time series database that supports SQL can provide sufficient scalability and speed to large data sets. Strong ingest performance allows a time series database to continuously ingest, transform, and analyze billions of time series data points per second without limitations or caps. As data volumes continue to grow at exponential rates, a database that can scale is critical to developers managing time series data. For apps, devices, and systems that create huge amounts of data, storing the data can be very expensive. Leveraging high compression reduces data storage costs and enables up to 10x more storage without sacrificing performance.
SQL can be used to query time series
A purpose-built time series database enables users to leverage SQL to query time series data. A database that uses Apache DataFusion, a distributed SQL query engine, will be even more effective. DataFusion is an open source project that allows users to efficiently query data within specific windows of time using SQL statements.
Apache DataFusion is part of the Apache Arrow ecosystem, which also includes the Flight SQL query engine built on top of Apache Arrow Flight, and Apache Parquet, a columnar storage file format. Flight SQL provides a high-performance SQL interface to work with databases using the Arrow Flight RPC framework, allowing for faster data access and lower latencies without the need to convert the data to Arrow format. Engaging the Flight SQL client is necessary before data is available for queries or analytics. To provide ease of access between Flight SQL and clients, the open source community created a FlightSQL driver, a lightweight wrapper around the Flight SQL client written in Go.
Additionally, the Apache Arrow ecosystem is based on columnar formats for both the in-memory representation (Apache Arrow) and the durable file format (Apache Parquet). Columnar storage is perfect for time series data because time series data typically contains multiple identical values over time. For example, if a user is gathering weather data every minute, temperature values won’t fluctuate every minute.
These same values provide an opportunity for cheap compression, which enables high cardinality use cases. This also enables faster scan rates using the SIMD instructions found in all modern CPUs. Depending on how data is sorted, users may only need to look at the first column of data to find the maximum value of a particular field.
Contrast this to row-oriented storage, which requires users to look at every field, tag set, and timestamp to find the maximum field value. In other words, users have to read the first row, parse the record into columns, include the field values in their result, and repeat. Apache Arrow provides a much faster and more efficient process for querying and writing time series data.
A language-agnostic software framework offers many benefits
The more work developers can do on data within their applications, the more efficient those applications can be. Adopting a language-agnostic framework, such as Apache Arrow, lets users work with data closer to the source. A language-agnostic framework not only eliminates or reduces the need for extract, transform, and load (ETL) processes, but also makes working on large data sets easier.
Specifically, Apache Arrow works with Apache Parquet, Apache Flight SQL, Apache Spark, NumPy, PySpark, Pandas, and other data processing libraries. It also includes native libraries in C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust. Working in this type of framework means that all systems use the same memory format, there is no overhead when it comes to cross-system communication, and interoperable data exchange is standard.
High time for time series
Time series data include everything from events, clicks, and sensor data to logs, metrics, and traces. The sheer volume and diversity of insights that can be extracted from such data are staggering. Time series data allow for a nuanced understanding of patterns over time and open new avenues for real-time analytics, predictive analysis, IoT monitoring, application monitoring, and devops monitoring, making time series an indispensable tool for data-driven decision making.
Having the ability to use SQL to query that data removes a significant barrier to entry and adoption for developers with RDBMS experience. A time series database that supports SQL helps to close the gap between transactional and analytical workloads by providing familiar tooling to get the most out of time series data.
In addition to providing a more comfortable transition, a SQL-supported time series database built on the Apache Arrow ecosystem expands the interoperability and capabilities of time series databases. It allows developers to effectively manage and store high volumes of time series data and take advantage of several other tools to visualize and analyze that data.
The integration of SQL into time series data processing not only brings together the best of both worlds but also sets the stage for the evolution of data analysis practices—bringing us one step closer to fully harnessing the value of all the data around us.
Rick Spencer is VP of products at InfluxData.
—
New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.
10 ways generative AI upends the traditional database
Posted by Richy George on 8 August, 2023
![]()
This
post was originally published on
this site
For all the flash and charisma of generative AI, the biggest transformations of this new era may be buried deep in the software stack. Hidden from view, AI algorithms are changing the world one database at a time. They’re upending systems built to track the world’s data in endless regular tables, replacing them with newer AI capabilities that are complex, adaptive, and seemingly intuitive.
The updates are coming at every level of the data storage stack. Basic data structures are under review. Database makers are transforming how we store information to work better with AI models. The role of the database administrator, once staid and mechanistic, is evolving to be more expansive. Out with the bookish clerks and in with the mind-reading wizards.
Here are 10 ways the database is changing, adapting, and improving as AI becomes increasingly omnipresent.
Vectors and embeddings
AI developers like to store information as long vectors of numbers. In the past, databases stored these values as rows, with each number in a separate column. Now, some databases support pure vectors, so there’s no need to break the information into rows and columns. Instead, the databases store them together. Some vectors used for storage are hundreds or even thousands of numbers long.
Such vectors are usually paired with embeddings, a schema for converting complex data into a single list of numbers. Designing embeddings is still very much an art, and often relies on knowledge of the underlying domain. When embeddings are well-designed, databases can offer quick access and complex queries.
Some companies like Pinecone, Vespa, Milvus, Margo, and Weaviate are building new databases that specialize in storing vectors. Others like PostgreSQL are adding vectors to their current tools.
Query models
Adding vectors to databases brings more than convenience. New query functions can do more than just search for exact matches. They can locate the “closest” values, which helps implement systems like recommendation engines or anomaly detection. Embedding data in the vector space simplifies tricky problems involving matching and association to mere geometric distance.
Vector databases like Pinecone, Vespa, Milvus, Margo and Weaviate offer vector queries. Some unexpected tools like Lucene or Solr also offer a similarity match that can deliver similar results with large blocks of unstructured text.
Recommendations
The new vector-based query systems feel more magical and mysterious than what we had in days of yore. The old queries would look for matches; these new AI-powered databases sometimes feel more like they’re reading the user’s mind. They use similarity searches to find data items that are “close” and those are often a good match for what users want. The math underneath it all may be as simple as finding the distance in n-dimensional space, but somehow that’s enough to deliver the unexpected. These algorithms have long run separately as full applications, but they’re slowly being folded into the database themselves, where they can support better, more complex queries.
Oracle is just one example of a database that’s targeting this marketplace. Oracle has long offered various functions for fuzzy matching and similarity search. Now it directly offers tools customized for industries like online retail.
Indexing paradigms
In the past, databases built simple indices that supported faster searching by particular columns. Database administrators were skilled at crafting elaborate queries with joins and filtering clauses that ran faster with just the right indices. Now, vector databases are designed to create indices that effectively span all the values in a vector. We’re just beginning to figure out all the applications for finding vectors that are “nearby” each other.
But that’s just the start. When the AI is trained on the database, it effectively absorbs all the information in it. Now, we can send queries to the AI in plain language and the AI will search in complex and adaptive ways.
Data classification
AI is not just about adding some new structure to the database. Sometimes it’s adding new structure inside the data itself. Some data arrives in a messy pile of bits. There may be images with no annotations or big blobs of text written by someone long ago. Artificial intelligence algorithms are starting to clean up the mess, filter out the noise, and impose order on messy datasets. They fill out the tables automatically. They can classify the emotional tone of a block of text, or guess the attitude of a face in a photograph. Small details can be extracted from images and the algorithms can also learn to detect patterns. They’re classifying the data, extracting important details, and creating a regular, cleanly delineated tabular view of the information.
Amazon Web Services offers various data classification services that connect AI tools like SageMaker with databases like Aurora.
Better performance
Good databases handle many of the details of data storage. In the past, programmers still had to spend time fussing over various parameters and schemas used by the database in order to make them function efficiently. The role of database administrator was established to handle these tasks.
Many of these higher-level meta-tasks are being automated now, often by using machine learning algorithms to understand query patterns and data structures. They’re able to watch the traffic on a server and develop a plan to adjust to demands. They can adapt in real-time and learn to predict what users will need.
Oracle offers one of the best examples. In the past, companies paid big salaries to database administrators who tended their databases. Now, Oracle calls its databases autonomous because they come with sophisticated AI algorithms that adjust performance on the fly.
Cleaner data
Running a good database requires not just keeping the software functioning but also ensuring that the data is as clean and free of glitches as possible. AIs simplify this workload by searching for anomalies, flagging them, and maybe even suggesting corrections. They might find places where a client’s name is misspelled, then find the correct spelling by searching the rest of the data. They can also learn incoming data formats and ingest the data to produce a single unified corpus, where all the names, dates, and other details are rendered as consistently as possible.
Microsoft’s SQL Server is an example of a database that’s tightly integrated with Data Quality Services to clean up any data with problems like missing fields or duplicate dates.
Fraud detection
Creating more secure data storage is a special application for machine learning. Some are using machine learning algorithms to look for anomalies in their data feed because these can be a good indication of fraud. Is someone going to the ATM late at night for the first time? Has the person ever used a credit card on this continent? AI algorithms can sniff out dangerous rows and turn a database into a fraud detection system.
Google’s Web Services, for instance, offers several options for integrating fraud detection into your data storage stack.
Tighter security
Some organizations are applying these algorithms internally. AIs aren’t just trying to optimize the database for usage patterns; they’re also looking for unusual cases that may indicate someone is breaking in. It’s not every day that a remote user requests complete copies of entire tables. A good AI can smell something fishy.
IBM’s Guardium Security is one example of a tool that’s integrated with the data storage layers to control access and watch for anomalies.
Merging the database and generative AI
In the past, AIs stood apart from the database. When it was time to train the model, the data would be extracted from the database, reformatted, then fed into the AI. New systems train the model directly from the data in place. This can save time and energy for the biggest jobs, where simply moving the data might take days or weeks. It also simplifies life for devops teams by making training an AI model as simple as issuing one command.
There’s even talk of replacing the database entirely. Instead of sending the query to a relational database, they’ll send it directly to an AI which will just magically answer queries in any format. Google’s offers Bard and Microsoft is pushing ChatGPT. Both are serious contenders for replacing the search engine. There’s no reason why they can’t replace the traditional database, too.
The approach has its downsides. In some cases, AIs hallucinate and come up with answers that are flat-out wrong. In other cases, they may change the format of their output on a whim.
But when the domain is limited enough and the training set is deep and complete, artificial intelligence can deliver satisfactory results. And it does it without the trouble of defining tabular structures and forcing the user to write queries that find data inside them. Storing and searching data with generative AI can be more flexible for both users and creators.