Chris Mungall: Collaborative Knowledge Graphs in the Life Sciences – Episode 37
MP3•एपिसोड होम
Manage episode 498330644 series 3644573
Larry Swanson द्वारा प्रदान की गई सामग्री. एपिसोड, ग्राफिक्स और पॉडकास्ट विवरण सहित सभी पॉडकास्ट सामग्री Larry Swanson या उनके पॉडकास्ट प्लेटफ़ॉर्म पार्टनर द्वारा सीधे अपलोड और प्रदान की जाती है। यदि आपको लगता है कि कोई आपकी अनुमति के बिना आपके कॉपीराइट किए गए कार्य का उपयोग कर रहा है, तो आप यहां बताई गई प्रक्रिया का पालन कर सकते हैं https://hi.player.fm/legal।
Chris Mungall Capturing knowledge in the life sciences is a huge undertaking. The scope of the field extends from the atomic level up to planetary-scale ecosystems, and a wide variety of disciplines collaborate on the research. Chris Mungall and his colleagues at the Berkeley Lab tackle this knowledge-management challenge with well-honed collaborative methods and AI-augmented computational tooling that streamlines the organization of these precious scientific discoveries. We talked about: his biosciences and genetics work at the Berkeley Lab how the complexity and the volume of biological data he works with led to his use of knowledge graphs his early background in AI his contributions to the gene ontology the unique role of bio-curators, non-semantic-tech biologists, in the biological ontology community the diverse range of collaborators involved in building knowledge graphs in the life sciences the variety of collaborative working styles that groups of bio-creators and ontologists have created some key lessons learned in his long history of working on large-scale, collaborative ontologies, key among them, meeting people where they are some of the facilitation methods used in his work, tools like GitHub, for example his group's decision early on to commit to version tracking, making change-tracking an entity in their technical infrastructure how he surfaces and manages the tacit assumptions that diverse collaborators bring to ontology projects how he's using AI and agentic technology in his ontology practice how their decision to adopt versioning early on has enabled them to more easily develop benchmarks and evaluations some of the successes he's had using AI in his knowledge graph work, for example, code refactoring, provenance tracking, and repairing broken links Chris's bio Chris Mungall is Department Head of Biosystems Data Science at Lawrence Berkeley National Laboratory. His research interests center around the capture, computational integration, and dissemination of biological research data, and the development of methods for using this data to elucidate biological mechanisms underpinning the health of humans and of the planet. He is particularly interested in developing and applying knowledge-based AI methods, particularly Knowledge Graphs (KGs) as an approach for integrating and reasoning over multiple types of data. Dr. Mungall and his team have led the creation of key biological ontologies for the integration of resources covering gene function, anatomy, phenotypes and the environment. He is a principal investigator on major projects such as the Gene Ontology (GO) Consortium, the Monarch Initiative, the NCATS Biomedical Data Translator, and the National Microbiome Data Collaborative project. Connect with Chris online LinkedIn Berkeley Lab Video Here’s the video version of our conversation: https://youtu.be/HMXKFQgjo5E Podcast intro transcript This is the Knowledge Graph Insights podcast, episode number 37. The span of the life sciences extends from the atomic level up to planetary ecosystems. Combine this scale and complexity with the variety of collaborators who manage information about the field, and you end up with a huge knowledge-management challenge. Chris Mungall and his colleagues have developed collaborative methods and computational tooling that enable the construction of ontologies and knowledge graphs that capture this crucial scientific knowledge. Interview transcript Larry: Hi everyone. Welcome to episode number 37 of the Knowledge Graph Insights podcast. I am really delighted today to welcome to the show Chris Mungall. Chris is a computational scientist working in the biosciences at the Lawrence Berkeley National Laboratory. Many people just call it the Berkeley Lab. He's the principal investigator in a group there, has his own lab working on a bunch of interesting stuff, which we're going to talk about today. So welcome, Chris, tell the folks a little bit more about what you're up to these days. Chris: Hi, Larry. It's great to be here. Yeah, so as you said, I'm here at Berkeley Lab. We're located in the Bay Area. We're just above UC Berkeley campus. We have a nice view of the San Francisco Bay looking into San Francisco, and so we're a national lab, so we're part of the Department of Energy National Lab system, and we have multiple different areas here in the lab looking at different aspects of science from physics, energy technologies, material science. I'm in the biosciences area, so we are really interested in how we can advance biological science in areas relevant to national scale challenges really in different areas like energy, the environment, health and bio-manufacturing. Chris: My own particular research is really focused on the role of genes and in particular the role of genes in complex systems. So this could be the genes that we have in our own cells, the genes in human beings, how they all work together to hopefully create a healthy human being. One part of my research also looks at the role of genes in the environment, and in particular the role of genes inside tiny old microbes that you'll find in the ocean water and in the soil. And how these genes all work together, both to help drive these microbial systems, help them work together and how they all work together really to drive ecosystems and biogeochemical cycles. Chris: So I think the overall aim is really just to get a picture of these genes and how they interact in these kind of complex systems and build up models of complex systems from scales right the way from atoms through the way through to organisms and indeed all the way to earth-scale systems. So my work is all computational. I don't have a wet lab. So one thing that we realized early on is just when you are sequencing these genomes and trying to interpret the genes, you're generating a lot of information and you need to be able to organize that somehow. And so that's how we arrived at working on knowledge graphs, basically to assemble all of this information together and to be able to use it in algorithms to help us interpret biological data and help us figure out the role of genes in these organisms. Larry: Yeah, many of the people I've talked to on this podcast, they come out of the semantic technology world and apply it in some place or another. It sounds like you came to this world because of the need to work with all the data you've got. What was your learning curve? Was it just another thing in your computational toolkit? Chris: Yeah, in some ways. In fact, my background is, if you go back far enough, my original background is more on the computational side and my undergrad was in AI, but this is back when AI meant good old-fashioned AI and symbolic reasoning and developing Prolog rules to reason about the world and so on. And at that time, I wasn't so interested in that side of AI. I really wanted to push forward with some of the more nascent neural network type approaches. But in those days, we didn't really have the computational power and I thought, "Well, maybe I really need to, I actually learned something about biological systems before trying to simulate them." So that's how I got involved in genomics. This was around about the time of just before the sequencing of the human genome, and I just got really interested in this area, a position came up here at Lawrence Berkeley National Laboratory, and I just got really involved in analyzing some of these genomes. Chris: And in doing this, I came across this project called the Gene Ontology that was developed by some of my colleagues originally in Cambridge and at Lawrence Berkeley National Laboratory. And the goal here was really as we were sequencing these genomes and we were figuring out there's 20,000 genes in the human genome, we discovered we had no way to really categorize what the functions of these different genes were. And if you think about it, there's multiple different ways that you can describe the function of any kind of machine, whether it's a molecular machine inside one of your cells or your car or your iPhone or whatever. You can describe it in terms of what the intent of that machine is. You can describe it in terms of where that machine is localized and what it does, and how that machine works as part of a larger ensemble of machines to achieve some larger objective. Chris: So my colleagues came up with this thing called the gene ontology, and I looked at that and I said, "Hey, I've got this background in symbolic reasoning and good old-fashioned AI. Maybe I could play a role in helping organize all of this information and figuring out ways to connect it together as part of a larger graph." We didn't call them knowledge graphs at this time, but we're essentially building knowledge graphs at the time and make use of, in those days quite early semantic web technologies. This is even before the development of all the web ontology language, but there was still this notion that we could use, we could use rules in combination with graphs to make inferences about things. And I thought, "Well, this seems like an ideal opportunity to apply some of this technology." Larry: That's interesting. It's funny we didn't plan this, but the episode right before you in the queue was of my friend Emeka Okoye. He's a guy who was building knowledge graphs in the late '90s, early 2000s, mostly the early 2000s before the term had been coined, and I think maybe even before a lot of the RDF and OWL and all that stuff was there. So you mentioned Prolog earlier, and what was your toolkit then, and how has it evolved up to the present? That's a huge question. Yeah. Chris: I didn't mean to get into my whole early days with Prolog. Yeah, I've definitely had some interest in applying a lot of these logic programming technologies. As you're aware,
…
continue reading
10 एपिसोडस