Waiting..
Auto Scroll
Sync
Top
Bottom
Select text to annotate, Click play in YouTube to begin
Good morning, everybody. Thank you for coming out at 9 o'clock. It's early and I appreciate you coming to a talk on linked data. That's a compliment, both to me and to all of you. So I wanted to thank you all for inviting me to present today. This is my first time coming to CNI and it's been an amazing opportunity to meet people
and to learn more about sort of a different side of this field that I'm used to talking in. So I'll be happy to hear to share with you what I've been learning. So my background, I'm David Newbury. I'm the Assistant Director for Software and User Experience at Getty Out in Los Angeles. And I am not a librarian. I paid my way through college, sitting behind a reference desk at the library at Penn
State, which was lovely, but I don't think that qualifies me to speak on behalf of librarians. So I'm here talking as a technologist. My background really is someone who understands computers and what you can do with them. And he's trying to figure out how that fits in the cultural realm. And what I do at the Getty is I lead the team that does the public digital infrastructure, the parts of our organization that you could see if you were on the web that support
the mission of the Getty as an organization. And Getty is a library archive museum, research center in Los Angeles. It does lots and lots of things. Many of those things I don't do. I'm also not a conservationist, nor am I a museum curator. But one of the areas that I can support is Getty has always been a place that provides digital leadership in that space of linked data for cultural heritage.
It's one of those things that we've decided to put energy in time to thinking what can be done. And I'm here today to share with you some of the things that we've learned trying to do this over the past couple of years. If you were here and you don't know what linked open data is, this is probably not the best introduction to it. But in a very short summary, it's a bunch of technologies that came out of the desire
to do, to say, the web did a bunch of things really well. What would it mean to do those same sorts of things really well for structured data? What would it mean to use URLs or URIs as identifiers for things? What would it mean if we had networks of information? Lots and lots of connected documents together rather than just a table.
And what would it mean if we could formalize the way in which we describe things together? I don't know if you've noticed, but over the past 15 years, linked data has not revolutionized the world, changed everything, and made computers operate the same way. The Internet works. The Internet continues to be successful. Link data not quite so much. We so talk about it a whole lot. And part of what I've been trying to figure out as I figure out, why are we doing linked
data at the Getty, is that in my mind, linked data is the reason it appeals to us in cultural heritage, is because it solves some of our social problems, not just our technology. We live in a world where there's cheap storage, where things are ubiquitously connected, and where search algorithms, and what they do is they recontextualize the labor behind the culture data work that we do as an organization.
We live in a world where this is a hard drive. This holds what a library used to hold on something about the size of a postage stamp. And we live in a world of mass agitization where we can use computational techniques like AI to generate metadata. And we've been cataloging records for decades and decades. And this means we have more data in our institutions than we have the ability to provide context
for. And so as we think about what we're doing, that data overload, and that limited user attention that we heard about in our keynote, are the collections access problems we're looking for in the next decade or so. We live in a world where everyone is connected to everything all the time. I have a phone, I spend too much time on it, I'm sure you all do too. And one of the things that does is it makes us feel like the entire sea of knowledge in
the human experiences part of a single digital ecosystem. And the questions that we're asking increasingly are not things that are contained within my organization or your institution or any institution at all. They're really the information that extends beyond the boundaries of individual institutions. And we've also lived in a world where there's where search is the way in which people find information.
And the commercial tools that are out there have given people the expectation that answers are there for your asking. And they do it in a way that says that hides the amount of work it takes to generate the knowledge that you get for answers behind a search box. And so when we think about why linked data is seen as a solution, it provides the sort of structures that would help us manage that scale of data.
And identifiers, it would maintain the authority in that globally distributed environment to provide links back to ourselves for the information that we're providing. And it provides ontologies, which is a tool that could maybe enable that kind of complex data retrieval across data sets. It's a technology that comes across as something that might help solve the problems that we're seeing based on living in an ecosystem where the internet is one of the dominant things that
exists in our ecosystem today. And so what I want to do now is sort of lay out from the perspective of the getty how we got to the place where we're trying to solve some of these problems and understand what we can do with these technologies. And I'm going to start out back with the getty vocabularies, which are a service provided for the getty that provides the SOARI for discovery of cultural information.
AET, ULAN, TGM. And in 2014, getty was one of the first organizations to put it out as a large linked data ecosystem. We're not the only people who did this. A Yale Center for British Art, the Rikes Museum, British Museum. But it was this wave of what would happen if the museum adopted this technology? What could be done with it?
And personally at that time I was over at the Carnegie Museum of Erick and Pittsburgh working on archives of the film there, their film archive. And it was part of this early project working with very, very clever archivists who said, what would happen if rather than try to display a finding aid because their finding aid was not that interesting? What if we really said the relationships between people and events and material and artworks
was the framing way in which you would describe a collection? And between those two pieces of work, another project came out, not because of those, but related, I got involved related to those, which was the American Art Collaborative. And this was a 2017 project to take 14 art museum collections together and use these sort of linked data principles that these things were built on to say, could you bridge
across 14 institutions? Could you find connections? Could you provide a unified discovery environment for that many institutions at the same time? And what came out of that was a project that is called linked art. Rob Sanderson, who I'm sure many of you know, he's a regular here and a good colleague of mine. He and I worked together to create this data model called linked art based on the data
model of the American Art Collaborative, which was the underlying sort of connective data tissue, building on years and years of work in the academic community under C-Dark to say, what if we had a tool that could bridge these things together? And that brings me to where we are at the Gidey right now. We've been doing linked data since 2014 with the vocabularies, but over the past five
years, what we've done is said, what if we take those sort of technologies, a single standard in linked art to linked things together, the vocabularies as connective glue, and used it to power our entire discovery ecosystem, including our archival collections, but also our museum collections on top of that. What if we used it also to power the audio guide that we provide to our visitors, or to
provide interesting novel experiential interfaces on top of the collections that we have? What if we worked with third parties to use it, both large parties like Google, Arts, and Culture, and small projects like the Spanish Art in the US, which is a project of the cultural office of the embassy of Spain defined Spanish artworks in American museums and bring them together? And so we've worked across projects at all of these scales to pull records together and
say, what would happen if you really tried to build this system out? So we end up with a unified API model, a way to access that data that spans across all of these collections and brings them together into a single data model. So what I want to do is talk a little bit about what it means to do that kind of work. And this is how we all wish it would work.
You have a staff interface, people put data in the system like Archive Space, there's a little arrow, and suddenly you have a website on the other side that shows all of that. This is how people imagine data ecosystems should work, and all the works in that little arrow. And I want to show you, this is what that little arrow looks like in the Gettie ecosystem. Every one of those yellow red boxes is a system of record in our ecosystem.
Every arrow in the system is a piece of code or something that we've written at the Gettie, to try to bridge into this middle line, which is all the different ways that we represent out that information. I also love just showing these architecture diagrams, because you spend a lot of time drawing them and nobody ever looks at them. But what I want to say here is, we don't build these systems because I like to draw architecture
diagrams. We build them because what we really want to do is support people. Digital infrastructure isn't designed to make computers happy because computers aren't happy. We do it to empower people to be more effective in meeting their mission. And so when we think about who those people are, the reason we have the ecosystem we have
is because we have many different constituents with many different kinds of needs. So we have catalogers who are one of those core audiences that we support. And they have workflows. They need to be productive. They need to actually do their work. And different disciplines have different workflows. Having spent time with museum registrars and librarians and archivists and content editors,
they work really differently. They have different disciplines. They do different kinds of work. They need different systems to do that. And so we're going to build this. We can't say, hey, all of you should work exactly the same way because that doesn't work. And then there are engineers. They need to know how to get data in and out of these systems. And engineers are not willing to spend five years learning how to be an archivist in order to be an engineer.
They have a different set of disciplinary skills. But they need to use the things that they were trained to use to do this effectively. And you also have your systems people, your IT teams, your application owners. And what they really want is you not to break stuff. And it turns out engineers pulling hundreds of thousands of records out of systems break stuff. And so you have to build your system around that set of expectations.
Those sort of people's needs to say, you know, the catalogers really hate it when the engineers break the systems. Stop doing that. And then we have our actual audiences. Because we're not doing this because catalogers just like cataloging. We're doing that work because we want to get it in front of people. And they want to learn what they want to learn on the topic that they're interested in learning about. And there's lots of different kinds of users that we have to support in our institutions.
We have professionals asking very, very complicated questions about scholarship. We also have lots of people who really want pictures. And they want to get pictures so they can make collages for their kids art project at school. And both of those are users who have needs that we need to support because they are aligned with the mission of our organization. And both of them are looking for information out of this. And there's also that sort of separate question beyond information seeking.
There's this research use case where there are people who are not looking for information, but they're looking for questions that haven't been asked, patterns that they haven't seen before. Things in the data that we don't already know and couldn't share with them, but that they could discover with their expertise. And so that's our user ecosystem of things we're doing. And we have a bunch of systems in our place to handle those different user needs.
I will say I don't have to meet the need of catalogers because we live in a world where there are professional tools. They may or may not be high quality, but they're the tools that we know how to use to build the systems that we do. And there's not a lot of benefit for me to saying let's reinvent the way a library cataloging system works. Or let me re-build archive space from scratch. Those are systems out there. Our job is, and people know how to use them.
My job is to let people use the systems they know how to use. But when we want to provide access to that data, what we end up doing is recontextualizing that data. We have to change the lens from that around someone who knows the discipline, who knows the form into one that reaches the way that users expect to see that data, which is often a really different shape to different flavor because their needs are different than the
needs of catalogers. And so when we do that, we end up displaying records that are aggregations of data coming from multiple different systems because the tool that you would use to catalog the source is different from the tool you'd use to capture digital media, which is different from the tool you'd use to capture collections records, which is different from where you put the audio for the audio guide. But all of those are part of a user's understanding of what an artwork might be.
And so we end up having to change that lens of what we do into a different form. And it has trade-offs when you do that kind of thing. And so when we start thinking about how we would discover that, imagine if you had a record for our painting irises. And you had another record in the different system for Van Gogh as an artist. You could think of these as two different documents, two different, you could print these out
on a piece of paper, you could hand them to people, you could say, look, I've got these two different things. And that's really useful. But you could also look at it as a single piece of information. Since Van Gogh painted irises, you could glue these things together and say, look, I've got one viewpoint on this. From the point of view of the facts on here, the data itself, it's the same thing. These two documents, this one graph, they have the same amount of information.
But they make different things differently easy or hard in our ecosystem. If you think about that idea of a document of a piece of paper, they're really designed for that access, for that information seeking behavior. We take some bunch of that data that we have in our ecosystem and package it up by the person who's providing the information, by us, it's the getty. In a context based on the sort of context that people would understand, we say, this is
the record about an artwork because you have a shared idea of what an artwork's information might be. It's probably not deep in the weeds around like Van Gogh had a brother named Theo Van Gogh and Theo Van Gogh was married and that wife turns out that's not part of what we described the artwork, that's part of the ecosystem knowledge around it. But graphs, on the other hand, are really optimized for asking those sorts of questions,
for saying, I have a context, maybe I'm interested in siblings of artists and I want to be able to ask that kind of question. It's not I'm a museum question, it's a research question. How do I provide access to that kind of data as well? Because you could imagine two questions that we have. You could say, oh, hey getty, do you have images bigger than 25, 100 pixels that were in both New York and Paris, created by artists before 1850?
You could also ask a question like, what's the label information for the painting irises? I will say no one has ever asked this first question in my entire time here. But I get asked at least through the API, what's the label information, thousands and thousands of times a day? And so when you have that, you end up needing to build the possibility for both of these sorts of systems. You need something that will help you show documents that will help you map to those
understandings, those contexts that people bring to the information very quickly. And documents are also the way the internet works, right? You want to be able to use the sorts of affordances that the engineers understand. Rest APIs, JSON documents, cache control. Because those are things that engineers know that will let this stuff work fast. It will also let you hire people who know how the internet works and don't have to understand
all the complex crazy stuff that we call children, heritage people do, they make it possible. But when you're doing research, scholars come with their own questions, their own context, their own things that they want to know about this kind of information. And meeting those needs mean they don't want me to draw a boundary around what the question they might ask is. They want to come with their own boundaries, their own context.
They've spent decades building enough knowledge to be able to ask a question that I haven't thought of, and they need to be empowered to do that kind of work. It's complicated because it moves that burden from the person who creates the data to the person who uses the data. But it makes those new questions possible, even if it might be inefficient or complicated, to ask those sorts of questions.
And so when we build out this sort of infrastructure that we've been building at the Getty, we're trying to build to support both of those use cases at the same time. An easy tool for engineers to create documents and to keep it behind the scenes in sync with a large graph of all the data that we have together so that a sufficiently qualified researcher can use that to ask questions.
It also allows us to keep records in sync because as much as we love that one era of A Space to the website, what we really have is archive space to the API, to this website, to our search interface, and also to that other website, oh, and there's that aggregator on the other system, and every time an archivist changes a record in that system, all those other systems have to be updated.
And so we need to build out the ability to keep those systems in sync. And so the way we did this, we looked through a bunch of things, and there's a standard pattern, one of the WC3 standards, which is activity stream, that I will tell you we crypt entirely from the triple-f community, which spent a long time solving this. And we said, oh, yeah, that's a really good idea, let's do that one. And we do that because when we can find a standard that supports our use case, it makes
it really easy not only for us, but for other people to build those sort of integrations. And this sort of discovery of records changing, the main use case beyond ours is that aggregator use case. And it's really easy to go say, hey, aggregator, there's really good documentation on the triple-f website, that's a pattern you may have seen before, you can get our data the same way. It also means I don't have to write that documentation.
The other cool thing that we've added is the knowing what records of change is really valuable, but the other question, really a more scholarly question is, how did it change? How has information changed over time? And so we built our infrastructure in support in Memento, which is one of the protocols behind the net archive, to give our data the equivalent of that way back machine, be able to say, what did that record look like a year ago?
What did that record look like two years ago? To give you that audit log, so you can go and look at how that changed over time, and to give ourselves the ability to audit what changed when? I will say this works for some of our systems, like the vocabulary, where that's public knowledge. We have other systems where that is restricted, because we aren't at a point yet where we want to be able to say, hey, do you want to see all the mistakes we made in our cataloging
over the past decade? I don't mind, because it's not my mistake, but there's a certain amount of change management to get that out. And questions about liability in there, but the capability is there, even if we need to build the business case for it. And so my question is, that's a lot of really interesting technology, but why are we doing it the way we're doing it?
And what I wanted to say is, building out a couple dozen applications over the past five years, we've learned nothing I have built requires link data to have been built this way. Link data is not a magic bullet. It doesn't empower anything that wasn't otherwise possible. Which kind of makes sense if you think about it from the perspective of what you're doing.
Because every time I build an application, I know the context of that application. I know the story I want to tell through that application. And I could build something that doesn't require any of that link data nonsense to tell that story effectively. But what we've learned is that we have many, many different communities of users, and they want different stories.
And the effort of taking that sea of knowledge created by our researchers, our curators, our archivists, our catalogers, and trying to understand how to transform that into a different shape for every other use case. After the second application, you realize you're just doing the same thing over and over again. And so by taking all of the data that we have and trying to create that sort of everything
API that holds all of that data, it's a lot more work to do it. But it lets us represent that data in a way that matches the users' varying models of the world in many different interfaces. I can say we have a visitor audience. They want a sort of very slimmed down version of that object. They want to see a picture, but they want to hear the audio of it. And use the same record to say, here's our collections page designed for our professional
audience that's going to give you the full bibliography and provenance and records of that. Or here's the full finding aid view of our archives. It has the full listing, the context, the scope notes, all what you would need to understand at collection fully. But also take that same data and say, here's a way to flip through the pictures of Los Angeles in the context of the spatial area of Los Angeles.
You lose all of the context that you have in the archives, but you bring a different set of context. Tell a different story on top of that work. And that's why we do this is because being able to present our work in multiple interfaces lets us meet different audience needs at the same time, with the same data, with the same infrastructure. One of the things we've learned in doing this is it needs to be as simple as possible.
Because that shared data model will make our developers more effective. Because I can teach my engineering team, this is the patterns you use to access this complex data. And because we're using that same pattern over and over again, once you've taught someone, oh, here's how you find a person from an object. You can do that in a library context, or an archive context, or a museum context.
That lets our team support more information. Because they don't have to retrain them in different contexts. They have to train them in that universal model. Building it on top of the web technologies also helps in that staff acquisition. I can hire somebody who knows how to build a basic website and say, hey, for the first six months, this is just a website. You've got an API. I get the data the way you've always been taught to get APIs.
And when they start going, oh, but I have these questions that are more complicated than the API allows. I can pull back the curtain and say, oh, look, there's another query interface that's even more complicated. Guess what? You get to learn a new thing this week. And that kind of being able to stagger the complexity of our ecosystem for the engineering staff helps me manage the kind of complex things we do with the kind of people who we can hire in cultural heritage.
I also say we use standards a lot and we use them for interoperability because we want to be able to work with you, but also because I really can't get my developers to write documentation. Nobody wants to write documentation. You can't interoperate with other people without documentation. And so how do I bring it so that the only things we have to document are the special things we do on top of the standards other people use.
The triple-f community has amazing documentation. We want to use that. The linked art community has amazing documentation. We want to use that. The CRM community has documentation and we want to be able to use that when we can. But being able to build off of what the W3C has done gives us the ability to work with other people and point at really high quality documentation and then use our time, our energy
and resources to document the cultural knowledge that we put on top of those existing standards. I can teach any engineer to work with any API, but this lets us find ways to work together more effectively. The other thing that we've learned here when we're thinking about how to keep this simple is we need to keep that everything model as simple as we can get away with.
One of the things I always talk to people about is if you're going to create structured data, you're doing it because you want the computer to do computer stuff on top of that data. Humans don't work with structured data. Humans work with text. And if you aren't looking to do something computer-y with your data, showing people text is a really good way to share knowledge with them because it turns out we're way better communicating with each other than we are with computers or computers with us.
And the other thing that we've learned over the time is that you can always add complexity to what you've done when you discover, oh, I'm doing to computer something more than I thought I had to computer it. But you can't take that away because someone somewhere in the world is depending on the thing you've built. And we need to maintain that there's a temptation when you dive into the road just deep enough to say, I can model the world. I can represent all of human knowledge in a structured data model.
I can do everything. We have a couple of projects where we've decided maybe to go down that route and it turns up, you can easily build beyond your ability to understand. And so figuring out how to start small and build up really helps you do this kind of work productively. And the other problem that we've learned is something that I call disciplinary misdeeds.
Because the hardest part of doing this has not been the technology, it's not been the APIs, it's not been training engineers. It's been working with the practitioners with the change management doing it this way requires. Because when we do this recontextualization, when we cross the streams between various disciplines where we take the information that someone has spent their career cataloging and throw
a 90% out of it to present it to an audience who doesn't care, that really, really irks people who have spent their entire career doing it in a way that they were trained is the best way you could do this. And some of the things that you can do with technology run into conflict with longstanding practices that were put in place before that technology existed.
And so if you're doing this kind of work, that change management work is critical and important because it turns out it doesn't matter how cool your graph is or how cool your technology is, if you lose the trust of the people who you are building it in your institution for. And so having in conversations with them, having those dialogues, building those connections, explaining that yes, I'm blowing up your discipline, but I'm not doing it because I don't
respect or understand why it is. I'm trying to help it reach more audiences who can be brought into the fold is critical to the work. And it takes the most time of anything we do in this organization. And I also think it's really important to do that collaboration work, right? Because the questions that I'm increasingly being asked about data aren't questions about the Getty. I love the Getty.
I don't think it's a fascinating topic of research for the entire world. And the questions you can ask just based on Getty data turn out to really be about the Getty, about our collecting practices, about the things we hold, about the lens that we put on our cataloging practices. But the really important questions are about the world, the community, the way that these materials that we have interact in a larger environment.
And so how do we help people tell those stories? How do we work together? The models that we're doing, the code that we're writing, the tools that we're building, we're doing it not just for us, but so that we can all work together to share that information and to make it easier for someone else. And so that's what we have been building out over the past, you know, six years at the Getty.
And we're continuing to do it. Over the next several years we're going to add more systems into this ecosystem. This fall we're working on the Getty Provenance Index, adding another 22 million records. These transactions between art dealers. It's a research data set. It's one where we really have leaned into the complexity of the data that we have because its goal is mostly to enable that there's professionals in the field to find new insights
into their collections and into the art market. There are not a lot of, you know, 12th graders who are interested in transcriptions of dealers dot books from the 1800s. But there's a lot of people who it's critical to their work and we want to build a tool to support that. We're also beginning the plan of what the Getty Vocabulary will look like for the next decade going forward. And it really is not here a question of the standards of the technologies, if the really
well used tool. But understanding how the audiences have changed. The Getty Vocabies were originally designed and the system was designed to create an index for a book that you could put on your shelf to help you copy the right terminology into your spreadsheet. That's not how the world works anymore. And so how can we take the incredible scholarship that went into that resource and make it available to the new ways that people want to use data in our ecosystem.
And we're using this platform and standard to put in better place collaborations across the field to really test whether or not that sort of cross ecosystem work is possible. And we're working with colleagues at Smithsonian, some of whom are in the room, to say how could we do a joint access and discovery system? How could we take the archives of the Johnson Public Machine Company, which are the, it's the photo morgue of magazines like Ebony and Jet, and use this sort of infrastructure to
provide discovery and access to five million collections jointly held across both of our organizations to really test whether those standards make this kind of thing possible in the world. And so to summarize what we've been talking about, why do we do link data? It's not the technologies, it's not the ontologies. Those are things I'm happy to sit down and talk to you about for 12 hours, any time
anyone wants, but that's not where the value comes from what we're doing. It comes from the ecosystem that information in different contexts for different applications. The value is really that audience. How do we support different user needs, different conceptual models, different lenses on the world of what we're looking for. And it's really in that community. How do we work together to take the collections in all of our institutions and make them available
to people who want to ask questions bigger than any one of our organizations can answer? So why do we do link data? We do it for the humans. Thank you so much for listening. I appreciate it. And I'm happy to talk to you. And so I think I'm supposed to invite you all to ask questions at this point.
I also invite harassment. Christine. I've been working with the vocabulary years ago and helping to develop some of the name matching algorithms. And what was fascinating then was that in library land, he was set before, he was a blackboard and then point everything to it.
And the art industry scholars said, no, no, that's, you can't do that. So that's a scholarly question. We do not trust librarians to do that. So the culture, verging was the most passing part of it. How are you able to recognize him? The library, the archivist, the curator, and very different conceptual models bring this
together. I think this is where our focus is increasingly on interfaces rather than data models. Because it's very possible to write a data model that preserves the provenance of who asserted what label for what person. I have learned that sometimes that's really, really critical. If someone's doing a research project and wants that art history, answer to know why.
Part of it is 80% of the people want to say, oh, I have a number. You give me the name of that thing, a much more library use case. And so part of why we built out of the infrastructure is to be able to say, we can do both of those things. Let's have a lens on that data that's library focused. Let's have a lens on it that's archival focused. The staff, the vocabularies team, the people who provide our data, we can have all that richness.
But how do we hide the parts that are irrelevant, not because they aren't valuable, but because they don't help solve the need that person has right now. So a lot of the information work that we're doing now is around user experience and interface design. And the ability to show say, hey, you're this kind of person, you want this kind of interface. There's other ones if you have other questions. But that's how we're going to help you solve your problem today.
Does that answer your question? Lovely. I would love to know more. Great. Danielle. Hi. So, I mean, there's link data and then there's link-to-open data. There's what you can tell, that's one organization, like we have in the catty, versus the loftier goal of creating languages across various organizations, but some of which is done
through the data itself, but some requires work done by a variety of organizations. You started to kind of tease about that, what would mean for you to work with? One other organization, but when you think about the ecosystem as a whole, the vast and different organizations, maybe this is next year's CNI presentation, I don't know,
but can you give us a little bit more about where you see the field going in a broader sense, and what are the resources and resources that you have in the organization? I think that there is that vision that we have of everyone's systems working together seamlessly, and I can build an application and put it on top of your data, and it'll be just as cool.
That is a long way in the future, not because of technology, but because it requires organizations to agree on a model of the world. I've spent six years trying to get the get-eat different departments to agree on a model of the world, and it hasn't been entirely successful yet. That's the hard part. But I think the other thing we've learned, and it goes back to that, use computers for what computers are good at,
and people for what people are good at. The usual problem I have, or that I hear is not, I wish I could build an application that could access the world's universal knowledge in a single seamless way. That's like an engineer pipe dream. It works in very, very small cases like Triple-I-F, but not at all when it's not as trivial as I want a couple pixels from a JPEG.
But links are incredibly valuable, because they help someone follow a knowledge trail out into the world beyond your institution. And so, when we did our museum collection page, one of the KPIs, one of the things I said, this is going to be the success of this product, was how many people would follow a link from our collections website to some other organizations website. Because the goal of that thing is to help people answer knowledge-based questions, and we don't have all the knowledge.
And so, I think if you're looking for ways to build that linked environment, adding links to knowledge is the critical thing to do there. Making it easier for robots to find those links, making it easier to build applications on top of those links, that can come later. But if we can build the cultural norm that me linking to you is not a diminishment of me, it increases all of our capacity. That, for me, is where the future linked data needs to be, about linking for people.
We can build the APIs down the road when that comes to it. How this process of looking at what you do on the computer side helps you to think about the integral of data and kind of making choices about where do I push this resource as it's coming in, and sort of what your pipeline, they have changed. As you will, we'll build a half of these buildings and expose them in the future.
I think the biggest change that it's put in us, and I will say that as a work in progress, because what we've learned is you have to actually show people that the cataloging work has value, and these changes can be seen by people. And we're very, we're like a year in to being able to say, oh, that data field you put in, you can see it on the internet. But the biggest benefit that we've seen in the short term is, is making a staff comfortable and not copying data and doing repetitive tasks because the computer can do that.
And so we say, why don't you just, you don't have to put all the information about Van Gogh and every time you write that record. Trust that we have a record for Van Gogh, and if you point to that, it will look as good as if you've done it. We have to handle some of those issues of different people, different perspectives on the data. But that's an engineering problem we can manage, that decreasing the labor that someone has to do to meet their professional standard, freeze up time for them to say, if you don't do that, could you do this other thing which will have benefit in this ecosystem.
And so it has to be this push and pull of will take some work off your plate so that we can add it in a different way that provides more value out. And then we have to see, they have to see that value quickly, then to feel like it's worth their time doing. So it often puts the burden on me to build out a pipeline to take data we don't have yet outward. And I have to work with the leaders in those cataloging fields to build the trust that I'll build it and that then they will do their part and populate it.
But it usually has to be in that order because I think catalogers have been asked too many times to create data for future theoretical uses that they never see. It's time for sort of our engineers to say we're going to put our skin in the game and build it with the hopes that you will then come along and fill it out. That would be really nice to do.
At this point, it's mostly focus groups on the interface. There's a little bit of automation. The only place we're currently really trying to do that because we recognize the value of that. I wish I had a slide of it. But if you ever go to our museum collection page about a third of the way down the page, we've got what we call the Netflix bar, the maybe you'd like this kind of interface, which we found is really useful for the audience that wants that browse experience who's come into the website because they want to find something beautiful for inspiration and creative reuse.
And so what we've done that we built out is to say we have five boxes there. We've guessed five different ways you could find a connection to say if you like this and this might also be interesting to you. And we're testing which of those people actually like better. Did you want something from the same time period? Did you want something with the same colors? Did you want something by the same artist? So we can help because the use case there isn't that scholarly like we're providing a contextual layer here.
It really is a we want to help you accomplish this sort of subjective task better. And so we track that to help tune that because it's a place we can do that. We don't track users. It's not sort of inner ethos. And so we don't have that ability to both recommenders based on things you actually do. We have to use the expertise of our staff to say this might be a connection and then see if that works.
But those sort of analytic pipelines are hard to build both from engineering perspective. But even more because you need people who are willing to take the time to interpret it and build hypothesis. And that's a skill I would love to have. But it had not gotten to you. There might be time for one more comment or question.
Well, thank you so much for coming. I really appreciate it.
End of transcript

This page is an adaptation of Dan Whaley's DropDoc web application.