The Three Principles of Responsible AI Development, and Other Takeaways from the Everlaw Summit | LawSites
At the Everlaw Summit in San Francisco last week, the annual customer conference of the e-discovery company Everlaw, founder and CEO AJ Shankar delivered a keynote address in which he announced the general availability of three generative AI features the company first introduced last year and had been developing in beta ever since.
In the course of delivering that address (see featured image above), Shankar, a computer scientist by training, detailed the core principles that guide the company’s AI development – principles that he said are “table stakes” to ensuring responsible AI development and the best long-term outcomes for customers.
The three features announced, all under the umbrella name Everlaw AI Assistant, are now live on the Everlaw platform, although customers must purchase credits beyond their standard subscriptions to use them. They are:
Three Core Principles
At a time when many legal professionals still question the safety and accuracy of generative AI, it was notable that Shankar devoted a substantial portion of his keynote to talking not about the products, per se, but about the three core principles that guided their development and Everlaw’s development of other AI products still to come. Those principles are:
With regard to privacy and security, Shankar said that Everlaw ensures that providers of the large language models it uses adhere to strict data retention policies. Everlaw prevents LLM providers from storing any user data beyond the immediate query and from using that data for model training.
Keynote speaker Shankar Vedantam, creator and host of the Hidden Brain podcast, is interviewed by journalist Thuy Vu.
“We ensure that they apply zero data retention to your data, which means that when you send data to them, they’re not allowed to store it for any reason past when they’ve answered your query, as well as no training, so they can’t use the data to train their models in any way.”
With regard to control, Shankar said Everlaw is committed to enabling users to maintain control over their data and tool usage through features that allow them to manage visibility, access, and project-specific settings. Everlaw’s approach to transparency includes notifying users when they are using AI-powered features and making it clear which models are in use.
Administrative-level control allows admins to control access to AI features as well as consumption of AI credits at various organizational and project levels.
“Your users should always know when they’re using gen AI,” Shankar said. “We’ll tell you what models we use. We want you to have that kind of transparency and control in your interactions here, so you can best devise how to use a tool.”
The third principle – that of enabling customers to have confidence in using these tools – is the hardest, Shankar said. “We know gen AI can provide immense value, but it can also make mistakes, right. We all know about the potential for so called hallucinations.”
A panel of judges share their perspectives on AI, technology and the law. From left: moderator Gloria Lee, Everlaw’s chief legal officer; U.S. Magistrate Judge Allison Goddard of the Southern District of California; Superior Court Judge Evette Pennypacker of Santa Clara County, Calif.; and U.S. District Judge Rebecca Pallmeyer of the Northern District of Illinois.
Shankar outlined two ways Everlaw’s development of AI seeks to establish confidence in the AI’s results.
But he said there is a third aspect of building confidence in the AI, and it is something customers have to do for themselves, which is to change their mental model.
“What you basically have to do is think about using a computer a little bit differently from how we’ve all been trained to do for many years. You have to move from an interaction model where you have very repeatable interactions that are also largely inflexible, like a calculator, to a variable-interactions model, where things might be a little different, but it’s highly flexible. It’s much more like a human.”
In fact, he urged the audience to think of gen AI as a “smart intern” – very capable and very hard working, but still able to make mistakes. Over time, you need to learn what the intern is capable of and determine your personal comfort level with its capabilities, but in the meanwhile, you need to continue to check its work.
“In this new world, it’s neither good to just blindly trust the output of a gen AI tool, nor is it good to just say, hey, one mistake and it’s out. It’s like a person, and that’s a fundamental shift in how we want you to think about these tools.”
Just as you would with an intern, in order to build confidence in the AI, you need to check its work, to learn what it is good at and what it is not. For that reason, he said, Everlaw builds its AI products with features that make it easy for users to check the outputs.
A virtual Kevin Roose, tech columnist for The New York Times, is interviewed by Alex Su, chief revenue officer at Latitude, and Rachel Gonzalez, director of customer marketing at Everlaw.
“Our answers will cite specific passages in a document or specific documents when you’re looking at many documents at once, and so you can check that work.”
A specific example of this ability to check the AI’s work can be found in the new Coding Suggestions feature, which will evaluate and code each document in a set based on instructions you provide, much like human reviewers would do.
Unlike predictive coding, it will actually provide an explanation for why it coded a document a certain way, and cite back to specific snippets of text within the source document that support its coding decisions. This allows the user to quickly verify the results and understand why the document was coded as it was.
“It has a richer semantic understanding of the context of each document, which allows for a unique insight like a human, potentially beyond what predictive coding could provide by itself,” Shankar said.
During his keynote, Shankar invited onto the stage two customers who had participated in the beta testing of these AI products.
Of particular interest was customer Cal Yeaman, project attorney at Orrick, Herrington & Sutcliffe, who admitted he had been highly skeptical of using gen AI for review before testing the Review Assistant and the related Coding Suggestions features for himself.
In his testing, he compared the results of the gen AI review tool against the results of both human review and predictive coding for finding responsive and privileged documents.
“I was surprised to find that the generative AI coding suggestions were more accurate than human review by a statistically significant margin,” he reported.
He speculated that others might get different results when using the gen AI review tool, depending on their criteria for the case, the nature of the case, and the underlying subject matter.
“But the more subject matter expertise is required, the more it’s going to favor something like the generative AI model,” he said.
Another way in which the gen AI review impressed him was its consistency in coding documents. “If it was right, it was consistently right the whole way through. If it was wrong, it was consistently wrong the whole way through.” That consistency meant less QC on the back end, he said.
He also commented on the speed of the gen AI tool compared to other review options. In just a few hours, he was able to complete two tranches of review of some 4,000-5,000 documents, including privilege review.
Even for someone who is inefficient in their use of gen AI, the review would have cost less than half that of a managed review, and for someone who is proficient in these tools, the cost would be only 5-20% of the cost of managed review. “So it was a massive savings to the client,” he said.
Of course, cost doesn’t matter if the product can’t do the job, he said. On this, he said, of all the documents that the model suggested were not relevant, the partner who reviewed the results as the subject matter expert found only one that he considered was relevant, and that was a lesser-inclusive email that was already represented in the production population.
He said it was also highly impressive in its identification of privileged documents, catching several communications among lawyers who the review team had not been aware of or who had moved on to other positions. In one instance, it flagged an email based only on a snippet of text that a client had copied from one email chain and pasted into another email with only the lawyer’s first name to identify him and no reference to him as an attorney.
I moderated a panel on uncovering key evidence in high-profile litigation with panelists Mark Agombar, director of XBundle Ltd., who worked on the U.K.’s Post Office Horizon litigation, and Greg McCullough of Fire Litigation Consulting, who is currently working on litigation relating to the Maui wildfire.
“There’s no indication that it was an email to an attorney. There’s no indication that it’s necessarily privileged. Nothing in the metadata. No nothing.”
Overall, he said, there was close alignment between the gen AI coding suggestions and the predictive coding, with their suggestions generally varying by no more than 5-10%.
However, in those cases where there was sharp contrast between the generative AI suggestions and the machine learning models, he said, then in every instance the subject matter expert found that the gen AI had gotten it right.
“Those documents tended to be something that needed some sort of heuristic reasoning, where you need some sort of nuance to the reasoning,” he said.
For all the focus on generative AI at the Everlaw Summit, Shankar noted that only 20% of the company’s development budget is devoted to gen AI, with the rest going to enhancing and developing other features and products.
In a separate presentation, two of the company’s product leads gave an overview of some of the other top features rolled out this year. They included:
This was my first time attending the Everlaw Summit. As it generally the case with customer conferences, there would be little reason to attend for those who are not either customers or considering becoming customers.
Panelists who tackled the issue of deepfakes in the courtroom were Judge Evette Pennypacker from the Superior Court of Santa Clara County, Calif.; Justin Herring, partner at Mayer Brown; Rebecca Delfino, associate dean at Loyola Law School; Chuck Kellner, strategic discovery advisor at Everlaw; and Maura Grossman, research professor at the University of Waterloo.
That said, the more than 350 attendees (plus Everlaw staff and others) got their money’s worth. The programs that I attended were substantive and interesting, and many covered issues that were not product focused, but of broad interest to legal professionals. (I moderated one such panel, looking at the discovery issues and strategies in two high-profile litigations that have been in the news.)
The conference also featured two fascinating “big name” speakers – Shankar Vedantam, creator and host of the Hidden Brain podcast, and Kevin Roose, technology columnist for The New York Times.
An unfortunate sidebar to the conference was the strike by workers at The Palace Hotel, the Marriott-owned hotel where the conference was held. Just a couple days before the conference started, they started picketing outside the hotel, joining a strike and picket lines that are ongoing at Marriott hotels throughout the United States.
Workers are seeking new collective bargaining agreements providing higher wages and fair staffing levels and workloads.
You can read more about the hotel workers’ campaign at UnitedHere! and find hotels endorsed by UniteHere at FairHotel.org.
Bob is a lawyer, veteran legal journalist, and award-winning blogger and podcaster. In 2011, he was named to the inaugural Fastcase 50, honoring “the law’s smartest, most courageous innovators, techies, visionaries and leaders.” Earlier in his career, he was editor-in-chief of several legal publications, including The National Law Journal, and editorial director of ALM’s Litigation Services Division.
Three Core PrinciplesKeynote speaker Shankar Vedantam, creator and host of the Hidden Brain podcast, is interviewed by journalist Thuy Vu.A panel of judges share their perspectives on AI, technology and the law. From left: moderator Gloria Lee, Everlaw’s chief legal officer; U.S. Magistrate Judge Allison Goddard of the Southern District of California; Superior Court Judge Evette Pennypacker of Santa Clara County, Calif.; and U.S. District Judge Rebecca Pallmeyer of the Northern District of Illinois. Play to AI’s strengths.Embed into existing workflows.‘A Smart Intern’A virtual Kevin Roose, tech columnist for The New York Times, is interviewed by Alex Su, chief revenue officer at Latitude, and Rachel Gonzalez, director of customer marketing at Everlaw. A Skeptic ConvertedI moderated a panel on uncovering key evidence in high-profile litigation with panelists Mark Agombar, director of XBundle Ltd., who worked on the U.K.’s Post Office Horizon litigation, and Greg McCullough of Fire Litigation Consulting, who is currently working on litigation relating to the Maui wildfire. Other New ProductsA Note on the ConferencePanelists who tackled the issue of deepfakes in the courtroom were Judge Evette Pennypacker from the Superior Court of Santa Clara County, Calif.; Justin Herring, partner at Mayer Brown; Rebecca Delfino, associate dean at Loyola Law School; Chuck Kellner, strategic discovery advisor at Everlaw; and Maura Grossman, research professor at the University of Waterloo.