Waiting..
Auto Scroll
Sync
Top
Bottom
Select text to annotate, Click play in YouTube to begin
Hello, good morning everybody. Thanks so much for coming. Thank you to C&I Clifford for giving us an opportunity today to share some of what we're learning with Jenny. How many of you here were at the 9 AM session on Jenny? Okay, great. You know, I think that this is obviously AI is everywhere.
What we're going to be trying to do today is share with you what we're learning. I mean, we're not doing a kind of product demo. The world has changed since November of 2002. And some of you will remember this two-hour transcript interview from the New York Times. I want to be alive. We've been working on machine learning and AI for a while now.
Prior to chat GPT, we were working on things like building a semantic index to improve research, building a citation graph of all the citations in J-Store. This is what was different when this came on board, the visualizing a conversation with the computer. When you saw that happen, chat GPT went from zero to one million users in five days,
reached 50 million people in five weeks, and they have 180 million users today. So this idea of AI, which we, you know, I think all of us have been thinking about and knowing was coming, we're working on it, it kind of got a whole new life when you saw the computer have a conversation with a person. That was something that was dramatically different that really appealed to people's consciousness.
And in our experience in a J-Store, one of the things that I felt at that time was if you're not offering that conversation for users, I didn't know if it was going to be six months or a year or two years, but if when users came to your site, you were not having that conversation, you weren't going to be relevant. Like they were going to just go somewhere else. Like that convenience was going to be more important, maybe even than substance sometimes.
And we've seen that in all of our experiences. And so we set about internally at Ithaca to really understand the impact of this. And so, you know, we actually use a lot of the tools, our general counsel, Nancy Copens, our legal group uses a product called Co-Council, which is a GPT-enabled assistant co-pilot. We recently started using something called Glean, which takes all of your internal documents
from your intranet and allows your teams to do conversational queries and understand all of your policies and things like that. Just really trying to become familiar with this and it's impact because it's going to change everything and we wanted to be able to understand that and learn from it and share it with the community, which is what we're going to try to do here today. Our guiding principles for how we deploy these is first and foremost.
We want to make sure we do this in ways that uphold not only our values, but the values of the scholarly community, which includes equitable access to these tools. And some of the challenges that exist between halves and halves and nots in this world, we want to learn and we know we need to move fast because everything else is moving fast, but we want to proceed cautiously and thoughtfully and implement AI in ways that support the
user experience and educational experience and not displace it. And again, we consider ourselves a learning organization. We talk about that a lot internally and we want all of our teams to be learning all the time and we want to be an organization that contributes to learning. And so that's what we're trying to do here is to share some of those lessons about what we're doing. We are in a kind of a special place. We feel very fortunate.
We have Ithaca SNR on one side of the Ithaca house and we have JSTOR on the other. And what I mean by this and that is that the Ithaca SNR team does research independently about things that are impacting the community and then we have JSTOR where we have an operational entity that's actually engaged in using and doing these things. So those two things can inform each other and can help us inform the community and we try to take advantage of that whenever we can. So we announced last May a project that's led by Dylan Rudiger which is a multi-year collaborative
research project that is focused on helping colleges and universities understand the impact of these technologies on them. It's Danielle is also here. She was there to help start this project now at the Mellon Foundation. We in doing this, one of the things people will say immediately and we all thought the same thing to your project like that doesn't make any sense.
Everything's moving way too fast for that. So there are pieces and parts that continue to progress long iteratively as we're doing this project to assess the impact of these things and explore how the institutions can use them differently and create strategies around it. One of the first things that's come out of this project is it was said earlier, I think it was by Elias about just AI lyruses and the urgent priority that there is on campuses
and maybe it was Leo in the previous session, getting sort of the precise understanding of how these things are actually used, what they do and what faculty and students need to learn, how generative AI works and what are the barriers to engagement in understanding how these things work and obviously university policies and procedures about how to handle equitable access and how these things are impacting the learning and education process
and the research process obviously. One of the things that has come out of this early project is something was mentioned earlier, I was at Elias that said this, about the AI product tracker that we have, Ith Gas and R, put out this product tracker, which basically has descriptions of different products. I mean those 20 institutions or 19 institutions, you know, see a lot of different products
and we're really focusing here on products that are specifically for education or higher education. And it's a simple tool, it's not perfectly comprehensive but we're putting in all the products that we're hearing about and it includes basic data about what they are, what's are there price, do they have a pricing model around them, how do they work, that kind of thing and it's, you know, we're just continuing to build it and adapt it and develop it
over time. And so I think that's a helpful tool. I also want to commend this issue brief that goes along with the product tracker that was written by Dylan and also Claire Bytosh, another one of our Ith Gas and our colleagues. It's a really great overview of the things that we're learning about these products in
the early stage and classifies those products and some of the themes that come out. And I give full credit to them for this, I just want to highlight, so it breaks down the products into three different types, discovery and sometimes products go over more than one of these. But discovery, how some are using Genie I to aid in discovery, how some of these products are working on understanding, which is what we're going to talk about today, how users
can understand the content better by using these tools and then also creation, things like advanced AI versions of Grammarly, where things can anticipate where you're trying to write, maybe improve your academic language, those types of things, their products that are getting created in the area of actually creating content. Some of the themes that come out that they talk about in this issue brief, obviously
those tools that reach many, many users have an advantage in the sense that they get more data and can use that data to inform their products. Another thing is one of the things that OpenAI has done, in addition to having those 180 million users, what they did is they created an API over their language model that allows companies to deploy these.
The consequence of that is that many, many, many companies, I mentioned earlier, Gleene and those other companies in the commercial sphere. But in our sphere, most of these products in the product tracker are overlaid on OpenAI and their API. That's one of the things about that is right now that's kind of dominating the landscape. That will have implications of a variety of kinds, but it's just something worth noting.
People should know that you all probably know in this audience, but each time you send the query, there's a charge. There's a cost associated with that. What is that going to look like? Is that going to go up? Is that going to go down? How is that going to play out? Typically so far, they'll, each release, they kind of raise the price on the better release and lower the price on the previous release. But nevertheless, there's a lot to be thought about in terms of business models there.
How will these technologies become embedded in the teaching process? There's some of these, when you think about scale and how yesterday's talk was talking about how commercial scale is so large compared to universities. Now how do you do these partnerships? ASU has a partnership with OpenAI. I don't really know what that means. What are they going to build? How is that going to change? Are some of the institutions that have the resources to do these kinds of collaborations
going to be way ahead of institutions that don't have that kind of scale? Those are things to be watched that we're all watching out for. As we go with those sort of highlights of what we're doing, I just want to introduce Beth now. Beth LaPenci is our product leader for our Genai research tool. When we originally got excited about what was happening with ChatGPT, we kind of stood
up this project and said, okay, we have to do this really fast. We have to get a beta of a research assistant on the platform before the fall semester. This was in the early spring last year. Beth stepped up and led that group to do that. She's the best person to be here to talk about what we're doing with our research tool. We'll go ahead and walk you through it.
I just want to say that the goal here is to show you what we're learning. Not to promote the product. We don't have a product. We're not charging for this product. It's a beta. We're sharing what we're seeing and what the changes are going to potentially mean for all of us who are serving students and faculty in this domain. All right.
As Kevin described, we got started in the spring with this very intensive effort. We worked with a team of about 20 engineers, data science engineers, designers, researchers, et cetera, and really wanted to get something put together as quickly as possible. It took us about six to eight weeks to do this.
What we wanted to do with that speed is really get the quicker we would get this beta out and into the hands of users. The sooner we would start learning. We, even from before the beta, was available. We were working with users and building out the capabilities with that feedback in mind.
One of the main things that we had to do during this early stage of the build out was really figure out what the scope needed to be, what capabilities did we want to build. Early proof of concept covered a whole range. As Kevin described, there was discovery and understanding and creation. We were working in both the discovery and the understanding space and really ended up
focusing on the content itself, understanding and enabling users to deeply engage with the material and we'll get to discovery as another stage of the work. The rollout strategy, we chose this beta approach because we wanted to bring this into the community in a very controlled way where we can have a small number of users react to what we're
learning, build it out and keep rolling that out and these numbers across the bottom represent where we are today. All right, so now I'm going to run through a few screenshots that really illustrate what our tool does currently so that you can have that context as we go through the findings. This is a JStore article page.
This is an article that we're looking at and you can see up at the top. I've run a search. The search is what are characteristics of Gothic literature. And in this on the side, you see we have this new chat box where the user can engage with the content and this very first action. The user doesn't have to do anything. They land on the page and as long as they run a search, we immediately process a prompt
that says what in your voice, how is the query you put in? So how is what are the characteristics of Gothic literature related to this text? And the response comes back. The characteristics of Gothic literature include evoking fear, et cetera. So it gives you a custom response, a custom summary of the document that tells you basically
why did I get this response? Why did I get this article? And here what it actually has to do with your research task. So custom summary. Then you can see across the bottom, we have a set of preset prompts. These are highly optimized to generate responses that we've crafted.
So the first one is what is this text about? This is a summary of the entire document for JStore and many journals and humanities and social sciences. As in this article here, there's not a summary, sorry, an abstract built-in to the document. And so this summary is really important for getting that very quick understanding of
what the document is about. We have recommend topics. So the user, if you are seeking additional paths of research, you can use this feature. Some of the topics that appear here are Gothic novel as formulaic genre, Gothic novel as literary experience and so on.
So you get up to 10 possible searches. These are search terms. So you can click on those and execute a semantic search and get more content that's similar to this one in these specific topics. Then you can engage with the Show Me Related Content option. And this generates up to 10 conceptually similar documents, like recommenders.
And then finally, you are able to talk to the document. So you can click on, ask a question about this text. You can execute a question in this case. The question was, what does this say about the castle of Otranto, a Gothic novel? So it describes the document mentions that the castle of Otranto, et cetera, and goes
on for a paragraph about the mention how that topic is covered in the document. And one thing I'll point out, you can see there's a few footnotes in here, two that are called out. For any instance where, in most cases, the users will be able to trace back from the response to the point in the article where that information was pulled from.
So you can see arrow pointing to the highlighted text that is the start of the segment that was used to generate that answer. So that I agree we're going to have a conversation here. Anyway, I'll just speak loudly. Maybe just say a word about why we're focused on being in the article. A lot of people are doing searches and thinking about across the whole corpus.
What were the reasons for staying grounded in the article as we did this work? Let me just advance. Oh, yeah. There's a few reasons why we started with this article scope. So one is that we did really want to bring the most possible, the greatest possible value to our users. And this is the many of the challenges that we see with our users is really being able
to work quickly through this kind of workflow. And then we wanted to really make sure that the experience that the user had was very trustworthy and the traceability was there. So in this specific example, it's very clear where this came from. There's a very little chance that we'll be pulling in information that's not in this
document. So it's very clear we put these guardrails up that keep the user within the scope of this document. So yeah, encourage the trust and reliability of the responses. All right. So how does this work? I wanted to give this picture of what's actually happening behind the scenes, especially
with this question and answer. So first, I will say that we're using a combination of OpenAI's GPT 3.5 to do this as well as some open source, smaller open source models to generate the vectors for the semantic search. So it's all working in combination. So what happens? The user types and the questions.
So what we just saw was, what does it say about the castle of a Toronto? Type that in. The system, when they hit enter, the system is generating a vector, a conceptual representation of that question. So that's the question is vectorized. And then we use that vector to identify all of the relevant portions within that text.
So the document, the article that we were looking at has already been processed, broken into chunks of text and vectors created. So we basically use the question to run a search against the document to find which sections are relevant and are likely going to contain the answer to the question. So then we take the question the user put in, the text from the document, the sections
of text and a prompt that we have carefully created to give precise instructions on how to process that. So all of that is sent to OpenAI. The prompt, it follows the prompt instructions, uses only, it's instructed very clearly to only use content that is in the segments of text that we selected and it generates a
response. And we get that response back and we show it to the user. All right, so how do we- Oh, yeah, go ahead. The one thing on that is that in that process that query is not training at LLM, it's not actually retained by the LLM, by virtue of the license between it.
And we don't actually train an LL model in doing this work either there or on our own. Yep. So we go through quite a process to make sure that the responses that come back are actually high quality and relevant. There's two main paths that we take. There's some human evaluation. You will have seen, I didn't point it out, let me scroll back.
You can see after each one of the responses that are submitted, we have a user rating ability as well as a submit feedback below. So we're constantly collecting feedback from users on their perception of what is- of the quality of the response. They can do positive and negative rating.
If they do negative, they get an additional set of options where they can flag why it was negative. So inaccurate and complete, confusing or harmful and then give a description. So this information comes in, we constantly review this. We have people like myself and my lead engineer review these things every morning and get a sense of what's going on. But we also do programmatic analysis of this data.
Generally, I will say that about 80% of the feedback that we get is positive. And negative feedback we get through this mechanism is generally because the user is trying to do something that the tool doesn't do or it's either designed not to do or it hasn't been built out to do yet.
And then we have, and this is a work in progress, we have automated mechanisms for assessing several different aspects of the responses. So toxicity, this is measuring using a hate speech model. We are able to generate a score across the body of content that tells us like how harmful it is.
And we've done this work in a test set and we're working on doing it across the whole body. Similarly, there's a faithfulness score which allows us to measure with the response how close it is to the content in the document. Relevancy is just what it sounds like, is how much does the how close of a match is this
for the question that was asked. And we would measure this relevancy in a similar way that we do with search. And then similarity is, another is similar to faithfulness, but it is really looking at the scope of what is in the response and making sure that it's complete. Well, one thing to highlight about this is that it raised the question of whether you
mediate or don't. And I think this is, in general, when we're talking today, we're going to be talking about things that relate to us, but as much trying to extrapolate out to like what the role and the library is because you all are going to be seeing all these tools, not just from us, but from everyone else. So this question of mediation is a big one. So we get a question that looks like it's obviously the exam question.
Do we answer it? Do we know if it's the exam question? What level of mediation do you do or do you step out when you see these questions? So right now we're not mediating, but it's an open question as to whether one should or shouldn't, except for in these areas of like toxicity. All right. So we're not going to go into detail on this slide, but what I wanted to show you was each of these lines that we're looking at represents the features that I walked through
on the in the screenshots. I mean, you can see a big difference here between the blue and red and then the orange and green. So the blue one is the question and answer feature. Most popular, most users are engaging with that, followed by summaries. The two that aren't highly used are the topics and related items. The two discovery features, which is very kind of surprising to me, but it tells us something
really interesting. The users are very drawn to the conversational aspect and this interaction around unpacking and interrogating the content of the document to further their understanding. Along with these data, we also see that session duration and number of pages visited during
a session are extremely higher than non-beta users, so over double, over double. So that's really great to see that it's really driving that kind of engagement. All right. So here, just talk about summarize for a second. So this, if you recall, is when the user clicks what is this item about and it produces a
paragraph or so that describes the content. This is actually this user is referring to both versions of the custom summary and the general summary. And this is a quote that came in through the feedback mechanism. And I just want to call out how interesting this is. So they say, I really love this tool. It is great how it attempts to relate the search query back to the content and the article.
I find that I'm spending more time capturing and downloading the AI summaries than the downloading the PDFs. They go on to say that my current research process is to do a search, open a bunch of tabs, copy these summaries and the citations into a text file, and then review the contents later. And then later go back and get the PDFs to download.
And there, this was really a request for all of that to happen in one click. So it's just a really great example of how the presence of these capabilities are already starting to change how the user engages with the content. One other thing to add about the summaries, the potential for the summaries is incredibly powerful because translation.
Obviously, I mean, JSTOR is available all over the world. And it's mostly all in English language. But if these summaries can be translated into all these different languages, the level of access for people from those different languages is hugely higher. And we have a lot of secondary schools about 3,000 or more that have access. So being able to do translations that are at reading level or grade level.
So there's all kinds of potential there in addition to these new ways of using the summaries that really change the way that the user's engaged and can engage with the resource and has applications for accessibility as well. All right. So question and answer, the most popular feature, what are they actually asking, what do these conversations look like?
So looking at the data, and we have thousands of questions come in every day. And so it is, we do, like I mentioned before, with the feedback, we do a combination of human review, which is, again, myself and my lead engineer do a lot of kind of scanning of this. We've also been doing using LLM GPT-4 internally to process large amounts of this data so that
we can really start to understand the nature of the questions, what's coming in, how many fall into which category, things like that, both for our own understanding of what users are doing. But also, in some cases, we can see like there's a whole body of things that aren't working well. And so we can categorize those and then follow up on them and see an example of that, which
I didn't call out here, is users typing in, give me a summary, or give me a longer summary or translate this summary. Right now the tool doesn't do that, but we can really see that that is something that our users are actually wanting and asking for. So at this 70%, the big category, really are a variety of categories of users, a template
that's something to understand the content that they're working with, which no surprise, that's what we designed it for, but it's great to see that that's what's happening. The kinds of questions that come in fall into these categories, there's miscellaneous as well, but the most common is does this article mention the topic I'm interested in?
Does it mention this? Does it mention that? And even a whole series of these. So what's happening here, I believe, is that users are the first thing that they're trying to do is really evaluate. They've had a few other mechanisms, but then they get into it and really want to see like how does it cover this topic? Is it on this version of this topic or another version? And so they're kind of, it's like control F, but for the concepts rather than for the
words. So we see that the most. Then the next one is that we see a questions about the nature of the discussion on a certain topic. And so these are much more refined and detailed questions. I interpret these ones as a more advanced researcher who has something very specific in mind
and they're trying to see how that's playing out in the literature. The next one is very definitional. Just what is this concept? So in the example we were looking at earlier, what is the gothic response? What is this even talking about? So this would be likely an undergraduate student or someone who's new to the field that
they're doing this research in. And so it's helping them to level set and understand what they're working with. What methods are used? So basically logistics around the creation of the document. And then in kind of a lesser amount, how are two concepts related? And these again, I think are the more advanced scholars trying to really unpack and identify
connections and new concepts. The 30% of the things that don't fall into those categories, like I mentioned, things the tool can't do and won't do. We have made choices to not do certain things. Help with creating. We'll see some examples of this. Help me write a paper.
Help me identify how to do this thing. And then the humorous ones are people testing the tool, trying to push boundaries and just seeing what they can get away with. All right. So now we've learned all these things. We've done a range of user interviews, data analysis, info coming in from all directions.
So what we want to do now is look at these main categories of users and how they're experiencing the tool. So the first group here is novice researchers. These can be our high school students, undergrads, even master students. And what they're trying to do in general is find and evaluate materials for relevance.
This is a main job that these people have. And then learning how, not yet doing solidly, but really learning how to develop hypotheses and write academically. So they're very early in this process. And why do they like the AI tool that we've built? Why does it matter to them? Helps them get to the right material faster.
In the right material, you'll see again in the next slide, this is very dependent on the user and what they're trying to accomplish. It helps them to develop a framework to probe the material and understand why it's relevant to them. And to also make this content, a scholarly content, more accessible to them. So we have a quote here. I won't read the whole thing, but you can read.
So this student says, I get more out of the article. I want to read the article anyway, but the tool would prepare me to get more out of it like it's Shakespeare. You know the plot beforehand so that you can really get into it. I understand this. I'm prepped. And then when I go to read it, I actually can absorb the information.
Okay. So now we have a few examples of some actual conversations that happen. So in this case, the student, and I will say in some cases we can tell this is what role the users are in some cases we don't know. So we're expecting this to be a student.
How can this paper be useful for a person who wants to create a civic education curriculum for middle school students in the Ukraine? That's actually a teacher. I see now. But what happens? So this is the question from the user and then the response, the an abbreviated response. It was longer than this. But what you can see is broken down by section ways to accomplish this task.
So one thing I just want to highlight right now is that before this, the level of engagement that like 99% of JStore users would have with the interface would be to hit print. Like literally like more download PDF. And so the activity that would have normally happened outside that space, somewhere else
with a professor, teacher, colleague, or a librarian, or reference librarian, or whatever, that's happening in here. And that's going to happen in all the tools very soon that the difference is literally between almost nothing in the resource to all this activity that you're seeing in these examples. Yeah.
And here are a handful of other examples. I'd love to hear your reactions to these different things. Right and annotated bibliography for this document. Make sure to summarize the author's argument sources, methods, and conclusions. Can you find quotes to put into an essay to support my opinion that Galileo thinks religion
and science are compatible? What does the second paragraph on page five mean? Right I'm not going to read this whole one. But right in essay that describes the author's insights about abiding and so on. Right in essay that may be testing the tool.
In some cases like Kevin mentioned, in some cases we are, the tool does answer these things currently. Another reason why we're squarely in a beta state is so that we can identify these and understand them. Some of these may be indication of a student who is not familiar enough with how to do these tasks and if they're asking for help.
Or it might be somebody who just wants some stuff done for them. So yeah, we've got some work to do to really understand how best to handle these different kinds of requests. The next user group is experienced researchers. So these may be PhD students, faculty, independent researchers.
And the tasks that they are trying to accomplish are finding the right material and developing conceptual novel conceptual relationships across subject areas and content types. So they're very exploratory looking to find evidence for the topics that they're covering. And the JSTORE AI tool matters to them because these two main reasons, it helps to democratize
research by providing R.A. support. We've seen examples and talked to faculty where they don't have that at an institution where they don't have the budget to have research assistants working with them. This can help them be more efficient with their work as an example.
And it makes it easier to probe concepts and get related items behind to expand on that target in scope. So it makes it easier to easier and faster to get through the range of work that they're trying to do. And in this example, I didn't mention these are users who we actually talked to. So these are quotes from interviews.
I quite often would like to ask my research assistant to go and explore for me. Part of what they would be looking for is who is mentioned, what reference, what arguments do they mention, what school of thoughts. So this would be a very simple way and it would certainly make their lives easier. All right. So here are a handful of examples that are the type that I describe which are much more detailed
in nature, more specific. So I'm not going to read all of these, but using this case explain thoroughly, explain thoroughly why it is so difficult to strike the right balance between restorative and a retributive justice. Does the text draw any principles or concepts from the field of positive psychology and so
on? So these are much more nuanced in their nature. And then the third group is instructors. So this might be instructional librarians or faculty teaching courses. And they are in search of support for execution of student research and support in the development
of research skills of their students. And the tool matters to them because it helps them to teach students how to find and evaluate materials and in a new way than how they've done in the past. It helps with assignment creation and it helps to facilitate the group conversations on how to develop research perspectives.
We have had very interesting scenarios where we're in a limited beta. So you have to have a personal account and be granted in. And we have many examples, even though we haven't invited instructors and classes to join that they're sending these requests in. Can you add my entire class into the beta so that we can work with it? And so this is an example of these group conversations really working with the whole class.
And in this one, it says it makes research interactive. I think it would change how I teach that it would be a little less introductory. I'm not going to read this whole thing, but you can leave it up here for a second. You can scan through. But essentially this is like how changing the way that this user, Muriel, really even thinks about the way that she would teach a course.
So I think that's a very great example. And just a few examples here. Instructor might ask, we saw this, a few examples like this. What would be questions to ask a college history class about this article? There's other similar ones like what would it in class, possible in class essays be for
this article, things like that? And then another one here says, how would one make a PowerPoint following this concept in the document? And so the response, it was quite lengthy. I couldn't fit it all on the screen, but it goes through and says, like here's basically a slide roadmap for you to build that PowerPoint slide off of this document, which is very exciting.
All right. So in summary, we have this one kind of comprehensive quote from this faculty member, Andy. He says, I can do in a day what used to take me four or five days. AI isn't doing any of the intellectual work for me. I'm still having to come to grips with what the concepts mean, what the particular author
is trying to say and how I can integrate what they're saying into something of public value. It's not doing the thinking for me. It is showing me where to look. And this is a great, in the summary quote, and this is really exactly what we're looking for and hope to hear more users expressing the same opinion. And that's it.
So we'll open it up for questions. Those two links at the bottom. The two links at the bottom are to the S&R product tracker and the issue brief for that. So we'll open up for questions. Thank you so much for this presentation. I'm interested, he's familiar with Control app.
It's very clear when something is or isn't there. It's very clear to the user what the account is. I feel like to some extent this is like the most direct application of literary theory I can think of. In terms of how much do you see a text independent of other texts. I'm thinking, for example, if you're working with like a reference librarian and they're ready to give you a certain article. Let me see. Now FYI, this has been disproved. Or FYI, this is a racist article, fundamentally, or whatever.
To what extent is the advice and guidance just kind of only dependent on that article, not able to say FYI, this is actually in this proven. How do you set that boundary, like with Control app, there's a very clear boundary. How do you set that expectation on the user? FYI don't know about the constellation of publications that may have later rejected
this premise. How do you think about that setting the expectation and the boundaries for what AI knows and doesn't know about the context for the text? That's a great question. We do have a little bit of helper text around the tool and some help information ways to help give a user a path to learn that information.
We don't really believe that most users will seek that out though, right? So we do have some work to do. Right now, the tool can only answer questions that can be derived from the material. So it can't even do questions which we see a lot like is this peer reviewed? We will figure out how to do that is a very common question.
And so that context setting is really an important thing that we have been thinking about. There is, we need to figure that out essentially. And I also did not mention we're working right now on just the item that you're looking at. We have work in progress where we start to broaden that scope. So for example, if you're looking at a book chapter and you're asking a question, we want
that tool to be able to answer that question from the entire book, for example. And so even in that case, having that context of like here's what's going on and how what you're doing here sits within the larger context. So. These are the exact questions though, right? Because for one, sometimes that student might be at an institution that doesn't have that
reference library, but that reference library has a particular perspective on whether that context is one way or the other. And so there's the potential for this to have that context in every case, but who judges which, which side it goes on, which is where it gets really, really powerful and challenging, right? And you can see that. Whereas in the current environment, that happens a million times all over in every different
library, but nobody really knows which advice was given in which case. Whereas here, you do know. So what does that mean? How do you deploy that? How do you not deploy that sort of the issues of mediation or non-mediation? So huge questions around this that we surely don't have the answers for, but we're trying to learn. Is there potential kind of following on that? Is there potential for integration with other tools of the library and the school might
have the interest of delivering chat or sending up questions once it gets to the point where they may be too many. Oh, interesting idea. We haven't thought about that. That's something that would be appealing. Like, when we can tell that they've hit a wall, maybe bump them into their library environment
somehow. Is that? Yeah. Interesting. I'm in the back. This is really great. Thank you so much. Kevin, you said something that sort of caught me thinking. So I'm not surprised, and perhaps the last question about user privacy. You just said, oh, it used to be the thing that you most saw was, download a print and read it and what happened with the article.
I can totally understand why it'd be valuable to know what's happening with the article, and that immediately raised all the clients in my head that we know about other platforms or readers, but the draft and the data in some cases have not been kept secure. So I guess, and I'm thinking of JSON in particular because you have this platform where you're
not the publisher. So I can imagine publishers who are putting content on your platform are going to perhaps want to know more and more. Like, don't just tell me how many times it was printed and downloaded. How many people are interacting with? I am worried. So have you given any thought to your privacy and data, a user data, policy apparatus around
yes, you're going to have amazing data, but where should that data go and who should be able to use it? Yeah, I mean, I think we're, these are the questions for all providers in all context here, right? I think that for one right now, obviously you have to get user permission and we don't
share the privacy data and we disambiguate the privacy data. But in the beta right now, we're obviously using these data, but not with the, you know, the individual is actually right us and give us these comments that we're sharing. So I think the question of user privacy and protecting user privacy, which is obviously critically important in libraries and to the publishers as well, is one that we're going to continue to learn about and we're going to have to protect.
So, you know, I think when you go, when you're working on a session like this, we're trying to really show like how are the users actually using this? What does this mean? That's, that's shows the value of having the data at the same time, you know, you highlight the challenges and the restrictions that need to be placed on to protect user privacy. But, you know, for the most part, when we actually deploy this in, in full, we'll have
to make decisions around whether or not this is around as we have here where we have people taking accounts and giving us permission to do certain things or if it's just in the IP address environment, how would we operate in that context without that data? And just to emphasize a point that Kevin made about disambiguating the user data, we keep that in our logs as that we can use for product development and analysis, but it is separated
from the individual who did it. So in the same way that we do with our search queries, for example, we're able to analyze and identify trends and so forth, but not in a way that tells us anything about a particular user. I think there's someone behind you, yeah. So I have a comment and a question.
My comment is to turn a second on where you work out about this. I love JSTOR. I always have, but when I was a frontline librarian, it was always a little frustrating to me, how all of the users knew JSTOR and what I would say was search. JSTOR.
And they didn't really understand what that meant that they were searching, right? And full disclosure, science, library, and that meant something. Right. And so that's one of the things that I think would be really important for users to understand what this tool allows them to explore and what it does in, and it would be amazing if
they were able to create those links to other resources that library might have or the library answer and that sort of thing. So yeah. Plus one. Heavy plus one for that. But my question is sort of going back to the idea of making the content accessible as an understandable but also making the content accessible.
Your product is used by, and your site's very much registering it, it's used by a lot of different user levels, right? So you've got eight or twelve of them. Yeah. And I wonder if you've thought at all about the summaries that you're generating and being
able to sort of to them, expert to sit on expert or various reading levels, things like that. And then the other thing that I wonder, that I was wondering if you were talking about accessibility and being able to make the translate, and I was wondering if you were thinking also about making the summaries available via audio. Oh, interesting.
So you asked about basically allowing the user to like edit the characteristics of the summary. And yes, right now we have one standard summary, but we have been talking about different ways like potentially a slider that's like, I don't know what characteristics we would put, but easy to
hard to allow a user to adjust that. We also see a lot of requests for make this summary smaller. Give it to me and bullet points, make it richer, give me more bullets. So I think there's some, another kind of slider, which is the level of detail. So yes, we're looking into those options.
And your other summary question, remind me. Yes, we have not thought about audio, but now we will. We have, we have an accessibility expert on staff, and we've been working with him to do some analysis of the tool to see which within our community of differing abilities, what even in its current state, which scenarios might
be we already be providing greater values. So there is definitely a stream of thought around accessibility. So from the general search of the entire database, does that take natural evidence for you? Yeah, we're starting. So if you're in the beta, we have the typical keyword base, jstore search, with an option to view the natural
language search, and we're working to figure out how to blend that all together. It's a work in progress that we didn't show it. My question is, what are the door rails around, what types of questions, battle, answer, what the topic of summary brings to us, and somebody say, what is the relationship to a topic with the literature, and other types of literature, science fiction, and that would, to you.
Yeah, so with just the natural language searching, it's still returning individual documents. So in that example, it would look for documents that covered that comparison. However, we're also working on bringing the kind of chat capabilities that we just went through up to the search level,
which would allow a user to potentially, so you've got a search results, we've got 25 on the first page, maybe you pick 5 or 10, tick box them, and then use that tool to interrogate all of those documents at the same time, and generate the C, like, how is this topic covered into this range of articles, how can we synthesize that, and so getting to that kind of,
looking at and understanding a range of documents at the same time is next step. I'm interested in the 30% queries that are not about understanding them. I think they're really interested in one of those sub-gatheries, and that is the queries regarding things that the tool won't,
and it's not, can't get any further knowledge. Yeah. So I have two questions about that. Number one, what is the tool, say, when you ask it to do something, it won't do. Yeah. And number two, how do you internally decide what it will and won't do? Well, at this moment, the thing is that we have a list of forbidden tasks, which I don't have in my brain, but I can check my notes after,
but we don't want them to be them, I think, talking about it like it's a person. We don't want the tool to be offering opinions. For example, we don't want it to take content from OpenAI. It could. I mean, we could open those guardrails and let it use its knowledge from anywhere to answer the question, so that's forbidden.
So we have a range of those things. So the big one is really around opinions. We don't want, we want the tool to really focus on exposing the content of the document rather than interpreting that. So that is the big, like, no, no. Currently, we can do more of that as we understand more. What it does do, like, if you typed in a question very similar to if you typed in a question where the content just couldn't answer it,
or if it was a question where we didn't do it, it just says, I can't do that. Yeah. So if we were demoing, I would show you, I could show you later if you want. But yeah. Yeah, I don't think it uses those words. The prompt says something like politely state that you're an AI tool and you can't do that thing. Yeah.
I may have missed something to put it through the set of starts. Can point of the conversation with her. Can point of the results of the whole model. Waiting to know if I think the source is that right. Or if the jury is restricted to even the draw. At the moment, it's restricted to an individual article.
But we will get into the search experience soon. Okay. But you are playing actually more than that. Yeah. So did you find that, yes, that was actually a tough choice. It seems like that thing. That you do step on your computer. Or you do step on your computer. And it's a big, perfect thing. And it is easier for me to check it. You can really do that.
But to you, how do you think, what you are playing to, for her, and I would be very curious to know her once you go there. Where you get a touch of the other computer. Yeah. Because that's the area that we get. That's the result of being on the internet. Yeah. While the result would be sort of like four of us. And people would get their needs.
Like, make the response. I would be happy to know that's okay. I will, before you ask that question, though, at this scenario you're talking about, does actually happen in this single item where, because I described how the process of answering a question is essentially using the question to run a search across the document. And then the search results are segments of the document.
And so, getting, having our system find the right segments, that's a thing that we need to pay close attention to, because sometimes if we say, this is the, this is the passage that answers that question. But there were three other passages that should also form the answer. They flag incomplete. Like, that is one of the things that we do see.
That this answer you gave me is incomplete. So, we do. Yeah. And that's one of the automated evaluation mechanisms that I described as job is to do exactly that. Like, yes, it might be accurate, but it's not complete enough. And so, we have to look at accuracy and completeness.
Yes, that's my final question. What exactly happened when they did evaluation? So, we were using that time we, the sort of, you know, the application about evaluating the performance of the... So, each one of the different evaluations has a different tool. So, in some cases, we're using other models to compare.
And a lot of cases, what we're doing, is this concept of using LLM as judge. And so, there's a practice in developing field around automated evaluations and very few of them are very effective. And so, this using LLM as a judge is one of the newer methods
and we're using that. So, in the case of the, the harmful content, we're using a randomized model that we're using to compare. In other cases, we're using GPT-4 to validate GPT-3, because it produces more accurate results. And then, for the other two that were on the list,
which were relevance and completeness, maybe. We're, what we're doing there is building out a data set, which is calling a ground truth data set, which is validated, question and answer pairs for specific documents. And so, that is being first created using GPT-4.
And then, we're having, this data set will be human curated, where we'll identify, like, here's a preliminary answer to that question. And you, smart person, read this article and assess the quality of this response. And so, when we have this data set starting with like a thousand,
we've done it with like a hundred so far, right, all human curated. Doing this at scale will allow us to really have this example of, in these are 100% correct, we are highly confident in this, and then being able to use that to model, like to extrapolate from there and test iterations on our model. So, in all cases, it's using some other LLM
besides the one that we're using to produce the response to the user. Does that make sense? Yeah. Okay. So, we're hearing the sounds of completion and lunch. Yeah, we're all done. Thank you so much for coming. Thank you. Thank you. Thank you.
End of transcript

This page is an adaptation of Dan Whaley's DropDoc web application.