Hello! I am a Research Fellow at New York University's Information Law Institute and a a JD Candidate at Yale Law School. Prior to arriving at NYU, I was a postdoctoral scholar at UC Berkeley's Social Science Data Lab (D-Lab), and a visting postdoc at ETH Zurich's Center for Law and Economics. I completed my PhD at Berkeley Law, where I specialized in Law & Economics, and I also hold a BA in Political Science and History from Rutgers University - New Brunswick. At Berkeley, I was a recipient of a D-Lab Data Science Fellowship, and a research fellowship from the Law, Economics, and Politics Center. I was also a Google Policy Fellow at Engine, a Data Science for Social Good Fellow at the University of Chicago, and a Technology Policy Intern at GitHub.

My primary research interests are in privacy and cybercrime law, data science, and public policy. I am particularly interested in using methods drawn from machine learning, natural language processing, and causal inference to explore open empirical questions in U.S. data protection law. I am also interested in integrating data science into empirical legal studies more broadly, and have co-taught a Data, Prediction, and Law course for the undergraduate Legal Studies and Data Science programs, and a graduate course in Computational Social Science at Berkeley.


Predicting Cybersecurity Incidents Through Mandatory Disclosure Regulation

Forthcoming, University of Illinois Journal of Law, Technology, and Policy
Cybercrime is an increasingly common risk for organizations that collect and maintain vast troves of data. There is extensive literature that explores the causes of cybercrime, but relatively little work that aims to predict future incidents. In 2011, the United States Securities and Exchange Commission (SEC) provided guidelines for how publicly traded companies should convey these risks to potential investors. The SEC and other regulatory agencies are exploring how to leverage artificial intelligence, machine learning, and data science tools to improve their regulatory efforts. This paper explores the potential to use machine learning and natural language processing techniques to analyze firms' mandatory risk disclosure statements, and predict which firms are at the greatest risk of suffering cybersecurity incidents. More broadly, this study highlights the potential for using legally mandated disclosures to bolster regulatory efforts, particularly in the context of prediction policy problems.

The Effect of State Data Breach Notification Laws on Medical Identity Theft

Working Paper
As the number of data breaches in the United States grows each year, cybersecurity has become an increasingly important policy area. The primary mechanism for regulating and deterring data breaches is the "data breach notification law." Every U.S. state now has such a law that mandates that certain organizations disclose data breaches to their data subjects. Despite the popularity of these laws, there is relatively little evidence about their effectiveness at deterring breaches, and therefore reducing identity theft. Using medical identity theft panel data collected from the Consumer Financial Protection Bureau (CFPB), this study implements an augmented synthetic control approach to analyze the effect of certain data breach notification standards on medical identity theft.

Misinformation and Hate Speech: The Case of Anti-Asian Hate Speech During the COVID-19 Pandemic

Journal of Online Trust and Safety (2021)
Donald Trump linked COVID-19 to Chinese people on March 16, 2020, by calling it the Chinese virus. Using 59,337 US tweets related to COVID-19 and anti-Asian hate, we analyzed how Trump’s anti-Asian speech altered online hate speech content. Trump increased the prevalence of both anti-Asian hate speech and counterhate speech. In addition, there is a linkage between hate speech and misinformation. Both before and after Trump’s tweet, hate speech speakers shared misinformation regarding the role of the Chinese government in the origin and spread of COVID-19. However, this tendency was amplified in the post-Trump tweet period. The literature on misinformation and hate speech has been developed in parallel, yet misinformation and hate speech are often interwoven in practice. This association may exist because biased people justify and defend their hate speech using misinformation.
With Jae Yeon Kim

Journal of Online Trust and Safety Inaugural Issue Webinar

Trademark Search, Artificial Intelligence and the Role of the Private Sector

Berkeley Technology Law Journal (2021)
In this Article, we aim to study how well these search engines identify potential conflicts under Section 2(d) of the Trademark Act, 15 U.S.C. §1052(d), which forbids the registration of a trademark that is “confusingly similar” to an existing registered trademark. While a traditional trademark applicant might rely on government-supported techniques (the TESS system) for searching for confusingly similar marks, it turns out that they are often incomplete. Today, because of these various gaps in TESS, several private trademark search engines have emerged to supplement TESS and provide more thorough results. These search engines generally aim to provide a user with a more comprehensive list of potential mark conflicts and recommend whether the user should proceed with their trademark application. Each search engine uses its own methods, algorithms, and techniques to return results. Our study aims to answer the question of which search engines do the best job of returning the most relevant results to a user, and why. We then use our findings to demonstrate how our results potentially affect trademark law by demonstrating the emergence of search costs that are born by the trademark registrant, rather than the consumer.
With Sonia Katyal

Improving Traffic Safety Through Video Analysis in Jakarta, Indonesia

Neural Information Processing Systems (NeurIPS) AI For Social Good Workshop
This project presents the results of a partnership between the Data Science for Social Good fellowship, Jakarta Smart City and Pulse Lab Jakarta to create a video analysis pipeline for the purpose of improving traffic safety in Jakarta. The pipeline transforms raw traffic video footage into databases that are ready to be used for traffic analysis. By analyzing these patterns, the city of Jakarta will better understand how human behavior and built infrastructure contribute to traffic challenges and safety risks. The results of this work should also be broadly applicable to smart city initiatives around the globe as they improve urban planning and sustainability through data science approaches.
With Joao Caldeira, Alex Fout, and Raesetje Sefala et. al.

NeurIPS 2018 AI For Social Good Workshop

The Tethered Economy

George Washington Law Review (2019)
Imagine a future in which every purchase decision is as complex as choosing a mobile phone. What will ongoing service cost? Is it compatible with other devices you use? Can you move data and applications across devices? Can you switch providers? These are just some of the questions one must consider when a product is “tethered” or persistently linked to the seller. The Internet of Things, but more broadly, consumer products with embedded software, are already tethered.

While tethered products bring the benefits of connection, they also carry its pathologies. As sellers blend hardware and software—as well as product and service—tethers yoke the consumer to a continuous post-transaction relationship with the seller. The consequences of that dynamic will be felt both at the level of individual consumer harms and on the scale of broader, economywide effects. These consumer and market-level harms, while distinct, reinforce and amplify one another in troubling ways.

Seller contracts have long sought to shape consumers’ legal rights. But in a tethered environment, these rights may become nonexistent as legal processes are replaced with automated technological enforcement. In such an environment, the consumer-seller relationship becomes extractive, more akin to consumers captive in an amusement park than to a competitive marketplace in which many sellers strive to offer the best product for the lowest price.

At the highest level, consumer protection law is concerned with promoting functioning free markets and insulating consumers from harms stemming from information asymmetries. We conclude by exploring legal options to reduce the pathologies of the tethered economy.
With Chris Hoofnagle and Aaron Perzanowski

Deterring Cybercrime: Focus on Intermediaries

Berkeley Technology Law Review (2018)
This Article discusses how governments, intellectual property owners, and technology companies use the law to disrupt access to intermediaries used by financially– motivated cybercriminals. Just like licit businesses, illicit firms rely on intermediaries to advertise, sell and deliver products, collect payments, and maintain a reputation. Recognizing these needs, law enforcers use the courts, administrative procedures, and self–regulatory frameworks to execute a deterrence by denial strategy. Enforcers of the law seize the financial rewards and infrastructures necessary for the operation of illicit firms to deter their presence.

Policing illicit actors through their intermediaries raises due process and fairness concerns because service–providing companies may not be aware of the criminal activity, and because enforcement actions have consequences for consumers and other, licit firms. Yet, achieving direct deterrence by punishment suffers from jurisdictional and resource constraints, leaving enforcers with few other options for remedy. This Article integrates literature from the computer science and legal fields to explain enforcers’ interventions, explore their efficacy, and evaluate the merits and demerits of enforcement efforts focused on the intermediaries used by financially–motivated cybercriminals.
With Chris Hoofnagle and Damon McCoy


Computational Social Science

Graduate, 2020 - 2021
This is a two-semester course that provides a rigorous introduction tomethods and tools in advanced data analytics for social science doctoral students. It was developed as the required course for Berkeley's Computational Social Science Training Program. The goal of the course is toprovide students with a strong foundation of knowledge of core methods, thereby preparing them to contributeto research teams, to conduct their own research, and to enroll in more advanced courses. The course will cover research reproducibility, machine learning, natural language processing, and causal inference. The course is divided into modules, each lasting 3-5 weeks. Each module will include lectures, discussion ofexample research articles, lab exercises, and a group project involving Python or R programming. Projects,typically done in groups of 3 students, will also provide the opportunity to practice reproducibility techniques,data manipulation and transformation, and data science workflows.

Computer Programming for Lawyers*

Law, 2021
Clients increasingly want their lawyers to understand their products and services on a technical level. Regulators need to understand how their rules will be implemented in code. Lawyers increasingly need tools to automate the process of collecting, organizing, and making sense of impossibly large troves of information. Computer Programming for Lawyers introduces law students to the Python programming language with an emphasis on text analysis. For instance, we will use the same tools data scientists employ to "scrape" (collect) data, organize it, clean it, and use it to explore legally-relevant questions. This course will lay the foundation for understanding the basics of how companies leverage software engineering and “big data.” These skills have applications from legal discovery, to deposition preparation, to research into administrative or judicial action.
* Non-instructional role, primarily working on content development and logistics.

Data, Prediction, and Law

Undergraduate, 2018 & 2019
Data, Prediction and Law allows students to explore different data sources that scholars and government officials use to make generalizations and predictions in the realm of law. Students will apply the statistical and Python programming skills from Foundations of Data Science to examine a traditional social science dataset, “big data” related to law, and legal text data. See here for my GitHub repository that contains the in-class lab assignments and here for a blog post describing how we created an upper-level domain-emphasis course within data science.

Human Contexts & The Ethics of Data

Undergraduate, 2018
This course teaches you to use the tools of applied historical thinking and Science, Technology, and Society (STS) to recognize, analyze, and shape the human contexts and ethics of data. It addresses key topics such as doing ethical data science amid shifting definitions of human subjects, consent, and privacy; the changing relationship between data, democracy, and law; the role of data analytics in how corporations and governments provide public goods such as health and security to citizens; sensors, machine learning and artificial intelligence and changing landscapes of labor, industry, and city life. It prepares you to engage as a knowledgeable and responsible citizen and professional in the varied arenas of our datafied world. See here for more information on the Berkeley data science undergraduate program, and the human contexts and ethics program specifically.

Law & Economics

Undergraduate, 2016
The course applies microeconomic theory analysis to legal rules and procedures. Emphasis will be given to the economic consequences of various sorts of liability rules, remedies for breach of contract and the allocation of property rights. The jurisprudential significance of the analysis will be discussed.