This SEG Seismic Soundoff podcast episode features Bluware’s Chief Software Architect, Morten Ofstad, who dives deep into leveraging AI for seismic interpretation. Hear how Bluware is leading the charge with a human-centric approach, leveraging AI to enhance — not replace — expert interpreters.
Listen to the full podcast episode here.
Andrew Geary: “Welcome to SEG seismic sound off conversations addressing the challenges of energy, water, and climate. I’m your host, Andrew Geary. In this episode, Morten Ofstad highlights how he uses artificial intelligence and seismic interpretation, focusing on the advantages of a data-centric approach over the traditional model-centric method. Morten emphasizes the limitations of pre-trained black box deep learning models and advocates for interactive deep learning to improve interpretation accuracy.
The discussion highlights VDS, a data format designed for random access and compression and emphasizes the importance of empowering geoscientists to interact directly with AI driven interpretation processes. Morten will challenge listeners to explore further how data-centric AI tools can be integrated into their workflows and challenge you to move beyond simply asking questions and receiving answers, and instead utilize AI to interrogate your data and gain deeper insights. So what are the primary distinctions between model-centric and data-centric approaches to artificial intelligence and seismic interpretation?”
Morten Ofstad: “The model-centric approach is probably what most of research is using, which is focusing on the network architectures, like what performance you can get by changing the blocks of your machine learning model and the hyper parameters of the training. You assume that your question, or what you are trying to solve, has a perfect answer. You have your data set, and you have the correct answer for your data set. Then you run your training, and you see how close to the correct answer you get. You’re not really changing the data. The data preparation is kind of upfront. And what you’re trying to do is get closer to the correct answer by changing the model and changing the hyper parameters for the training.
The data-centric approach is more of what you would take as a domain expert, where you can control what data you are training the model with and the labels. ‘What is the correct answer for my data?’ This is an approach that works much better. If you have a question that doesn’t have a perfect answer. I think all geological interpretation is open-ended. If you ask the same questions to different interpreters, you’ll get slightly different answers. So, there isn’t a perfect answer. Of course, you can develop better models that have the blocks that are good at picking up faults or identifying thin features in your data, and you can tweak that a lot, but at the end of the day, you want to get answers for data that you haven’t trained on. If that data isn’t exactly the same as what you have trained on, you might get bad answers.”
Andrew Geary: “I imagine a lot of bells kind of dinged off for geoscientists when they heard the word not a ‘perfect’ answer because that is kind of the world that they roll in. Why do you think the model-centric AI approach has leveled out on these benchmark tests over the last few years?”
Morten Ofstad: “I think it’s leveled out because the data sets aren’t good enough. You reach a point where trying to change the model architecture doesn’t do very much because your datasets and your labels are imperfect. And there is a limit to how far you can get by just trying to train on the same data again and again.”
Andrew Geary: “That makes a lot of sense. Can you describe the black box deep learning concept in seismic interpretation and maybe its potential limitations?”
Morten Ofstad: “I think the main thing is that you have a pre-trained model that someone or a team of data scientists have prepared. As a geoscientist, you are just applying that model and seeing what comes out of it. There is no interaction and there is no insight into why the model did what it did and that doesn’t really allow you to use your knowledge to help get to the right result. It means that any kind of geophysical or geological knowledge you are adding to the interpretation will have to be done in post-processing where you’re taking the result of that black box model and trying to edit it and make it fit what you think should be the right answer.”
Andrew Geary: “Yeah, you said a word there— interacting. How does the interactive deep learning approach aim to address the limitations of this black box method?”
Morten Ofstad: “That’s a very good question. I think the main thing is that you start with a clean slate. You don’t start with a labeled dataset. You are training the model on your data. You start by labeling some of the data by labeling the features you are looking for. And then you see what the model is predicting and then you can see if that is what you want or not and change the labels in the areas where the model isn’t predicting what you want. And in that way, it becomes a back and forth between you and the machine learning system and you can introduce your own interpretation and edit the labels until the model is producing the output that you want.”
Andrew Geary: “So in being able to edit these labels on the go, is that the main difference in the labeling process between interactive deep learning and traditional seismic interpretations?”
Morten Ofstad: “Yes and the fact that when you correct the inference that the model produces, it goes back into the training phase. Most of these pre-trained models that are used are trained for a very long time on a very big data set, and then it’s supposed to be able to handle anything you can throw at it after having been trained on this massive data set.
But in the interactive loop, you’re trying to make a much smaller model which only produces the correct answer for the data set you are trying to interpret. And it doesn’t have to produce the correct answer for every dataset in the world. And that means you can train it much faster, and you can see what it’s producing. Then it might alert you to things you’ve missed. So, if you start labeling faults and you see there might be some faults that you missed, you can accept the inference you get from the model. You can add some labels and say, ‘okay, that fault should really be part of it’.
If you start with a pre-trained model, you’re at the mercy of the people who created the training data set and hope that they didn’t miss anything. It’s not really a realistic expectation that the people who train the model did a perfect job.”
Andrew Geary: “Yeah, we started this talking about model-centric, which is sort of the traditional method so far. But for someone who hasn’t implemented a data-centric AI approach, what are some of the considerations and challenges associated with implementing a data-centric versus a model-centric approach in seismic interpretation?”
Morten Ofstad: “I think it’s largely that if you are doing the model-centric approach, you probably have data scientists working in Jupyter notebooks to develop the models with very limited ability to visualize the results. If you are getting poor results, there isn’t really a way to highlight ‘why am I getting poor results? Is it because there is some piece of the training data where they missed the fault?’ You are now effectively training the model to tell you there is no fault where there really should be a fault. There is no way to pinpoint that problem in your data.
The tools aren’t really there, and there isn’t any way you can have every dataset in the world in your training dataset. So, it’s really an impossible task to do the data preparation in a way where you can create a pre-trained model that will work when you apply it to anything.”
Andrew Geary: “So you’ve talked a bit about the model-centric approach, especially if the data is bad, then just everything is going to be bad, and you may not even know that it’s poor quality data on that end. With this data-centric approach, would you more quickly realize if you’re working with poor quality seismic data, or would you have to do some other hoops to understand that’s the issue?”
Morten Ofstad: “I think anyone who is a geoscientist will look at the data and see if it’s bad. But it’s more about coaxing out what little information you can get from that bad data. You might be looking at specific responses, specific waveforms, that you can recognize that hitting some specific interface. I’m not a geophysicist, but I know that they have their ways of looking at a very weak signal in all the noise, and that can be different for whatever they’re looking for. I think it’s much easier when you are trying to train it on one specific dataset to get that detail out of that data.”
Andrew Geary: “How could the data-centric approach improve the quality of interpretations?”
Morten Ofstad: “I think with this interactive method you have a choice. If you want to do the interpretation faster because it is machine assisted, you can interpret more data faster or you can spend more time and get the higher quality answer. So, it leaves it up to the geoscientist to make that trade off. Do you want to have the highest possible quality interpretation, or do you want to get results very fast?”
Andrew Geary: “We’re talking a lot about data, and I think a lot of people intuitively understand that data is important. But you’ve also talked a lot about labels. What are the potential implications of inaccurate or inappropriate labels and training these data sets? And do you think that is something that people understand, the impact that the labeling can have on their seismic interpretations?”
Morten Ofstad: “We have been running benchmarks comparing our interactive data-centric approach with traditional modeling methods. In these comparisons, traditional models typically rely on a training dataset to make predictions, which are then validated against a separate dataset. However, we have observed that our method often identifies faults that are absent in the conventional ‘correct’ answers. This discrepancy can lead to our model being penalized for detecting features overlooked by the interpreter who generated the reference labels.
This also happens during training of traditional models. Your model is trying to evolve to predict those faults correctly and it gets punished for it, resulting in suboptimal performance. Consequently, the quality of labels is crucial for effective model training.
One advantage of our approach is that it provides insight during the training process. When we analyze the model’s output, we can identify instances where it correctly predicts features that are missing from the labels. This allows us to refine the labeling dynamically, enhancing the model’s accuracy in real-time rather than relying on predefined labels.”
Andrew Geary: “You mentioned you’re not a geophysicist. You said before you’re a computer scientist, before we hit recording. Has that added value to determining these geoscientific problems that you’re trying to address coming from a different background? How has it helped you approach this differently than your geoscientist colleagues were approaching it?”
Morten Ofstad: “I think it’s always good to have sort of new eyes on a problem. I personally, I come from a background in computer graphics and there are a lot of things you do in image processing that are related to geophysics in the sense that you’re doing signal processing. You’re using Fourier transforms, frequency analysis, and different filters, creating different attributes, which is used for computer vision and things like that. So that background has helped me to understand processing that is done in geophysics.
But at the same time, it gives you sort of a different perspective than you see in other methods that are typically used in computer vision, for example, that are not typically used in geophysics but could be applied to do the same kind of thing.
I was part of developing this compression algorithm for seismic data, where we are using techniques for image compression to compress seismic data. That is an enabler for the machine learning workflows because you can train faster if you can load data faster. You can load data faster if it’s compressed, because there is less data to load. You can train on more data at the same time and get better results faster.”
Andrew Geary: “Yeah, that makes a lot of sense. There was an American show called ‘Silicon Valley’ where their company made data smaller by compressing it down, and that was changing the world. So, I think people understand that.”
Morten Ofstad: “I’ve seen that show! I really like it, but it’s too close to home. I’ve been through all that stuff with tech startups and investors, and all of it is very, very close to reality!”
Andrew Geary: “I’m sure a lot of people feel that when you say that!
We’re changing here a little bit to talk about something a little different. What is VDS?”
Morten Ofstad: “VDS is a data format that our [Bluware] tools are using for storing seismic data. It’s a format which is designed for random access. You can take subsets of the data and access any subset quickly. And it also has the option of compressing data using this wavelet compression technique. So that makes it very good fit for visualization or machine learning. We designed it first for visualizing data quickly. Our first product was a computer graphics visualization toolkit. But it turns out that you need the same random-access ability to create subsets on the fly and so on to implement machine learning efficiently.”
Andrew Geary: “So how do you find that this format, VDS, enables the data-centric approach we’ve been talking about?”
Morten Ofstad: “It has the features of random access and so on. That allows us to skip the data preparation step that you would normally do with these machine learning methods, where you must create a TensorFlow record file with all random cuts of your data and the labels. You try to create some kind of balanced selection of your data and so on. Preparing that, especially if you started with SEG-Y which doesn’t have random access, takes a long time. Then of course, there is no way to change the labels. For example, you’ll have to prepare the whole data set again, which can be a very big operation that takes a long time.
To enable this interactive approach where you are editing labels on-the-fly and re-training immediately with the new labels, you can’t have this data preparation step where you must go through all your data and make random cuts and so on. Having this format enables us to do that job efficiently and leverage compression to make the training faster.”
Andrew Geary: “This VDS format that you have talked about, can people access that outside of Bluware?”
Morten Ofstad: “Yes, as part of the OSDU industry collaboration. We have created an open-source implementation of the libraries required to read and write VDS format, which is called OpenVDS. It’s available from the community website that OSDU has and it’s available to everyone.”
Andrew Geary: “Well, that’s a very nice thing that people can use this format and experiment. I’m sure people that have used SEG-Y for many years might be excited to try a different format to use their data. What are some future directions and emerging trends in applying data-centric AI to seismic interpretation?”
Morten Ofstad: “I think that we are at the point where it’s important to enable geoscientists to do their work faster and the data-centric approach allows the experts to contribute. It gives them tools that are more like auto-complete or something like that. You are doing the interpretation, but the machine is helping you to do it faster and more accurately. I think that is going to change how this is done.
This has already happened for Copilot, for example for programming, which is what I do for a living. It’s changed how you code because you can just get the model to do all the boring boilerplate code. Then, you as an expert, are doing the real decisions instead of spending your time doing things that are obvious or not needed. I think for geoscience we are going in the same direction. We must create systems that enable the users to be in charge.”
Andrew Geary: “I like that, ‘enable the users to be in charge’. That could be a good tagline there!
What challenge would you like to leave the listener from this conversation?”
Morten Ofstad: “I would like the listeners to think about how they can start leveraging team learning tools in their workflows and why they would need other tools than something that just comes up with a result. You can sort of take it or leave it and trying to go beyond thinking about just these language models that you can ask a question, and you get an answer, but see how that kind of approach will let you interrogate your data instead. So, you’re not asking questions in language, but you are trying to get the data to speak to you and tell a story.”
Andrew Geary: “So, Morten the last question here is, if you had to describe your journey in one word, what would it be and why?”
Morten Ofstad: “I think it would be ‘enablement’. We are trying to enable the geoscientists to do their work better and faster. That is what we are trying to accomplish with our InteractivAI product.”
Andrew Geary: “Well, Morten I appreciate your time on this. Thanks for joining us from Oslo and I appreciate your insights on this new AI approach that hopefully will get some geoscientists excited.”
Morten Ofstad: “Thank you very much, Andrew. It’s been a pleasure talking to you.”