bio

Hello! I’m a final year EECS PhD student at UC Berkeley advised by Stuart Russell and Claire Tomlin, and a Fellow at the Kavli Center for Science, Ethics, and the Public.

My research interest are in pure and applied game theory, with a focus on communication and market design. My work often intersects with artificial intelligence, control theory, healthcare, and philosophy.

I have published at control theory and robotics venues - such as ICRA, CDC, and LCSS - and AI venues - such as ICML, ICLR, and TMLR.

In the Fall of 2025 I’ll be joining the Economics PhD program at Stanford.

work in progress

Participation in Kidney Exchange \\ Siththaranjan, Valenzuela-Stookey, Ergin

Stay tuned!

Pairwise Kidney Exchange with Chains \\ Siththaranjan and Ergin

Stay tuned!

Markets and Alienation \\ Siththaranjan and Mosse

Stay tuned!

Incentive-Compatible Feedback Control \\ Siththaranjan* and Tomlin

Stay tuned!

working papers

Multi-Modal Paired Exchange \\ Anand Siththaranjan \\ [pdf]

This paper develops a model of paired exchange that integrates multiple donation technologies with observable risks. Through integration, we aim to enrich the set of potential patient-donor matches over benchmark models while reducing the aggregate risk placed on donors. We construct pairwise exchange mechanisms that satisfy efficiency, stability, and strategy-proofness, and study applications including exchanges with multiple organs, kidney exchange with ABO-desensitization, and exchanges with multiple donors. In a case-study of kidney-liver exchanges, we analyze welfare improvements over non-integrated exchanges. Simulations find a 10 to 20% relative increase in transplants as the proportion of risk-tolerant donors are varied.

Kidney Exchange with Multiple Donors \\ Anand Siththaranjan \\ [pdf]

This paper studies the problem of finding efficient, strategy-proof, and weak-core stable mechanisms for kidney paired exchange under cycle size restrictions. We allow patients to have multiple donors, where their private information is their strict preference over the donor used. We show that no desirable pairwise mechanism exists, but under mild conditions constructively show that such a mechanism exists when cycles of length three are allowed. Our results leverage the structure of blood type compatibility to overcome classic impossibility results due to cycle size restrictions, and we provide intuition to this end. When considering the number of transplants made, imposing strategy-proofness while allowing multiple donors introduces a tradeoff when compared to a complete information single-donor baseline. Nevertheless, we show through simulation using US population data that in most cases there is a relative increase in the number of transplants. With sufficiently many patients with more than one donor this can range from approximately $5$ to $20\%$.

When Can Communication Be Informative? \\ Anand Siththaranjan and Yuichiro Kamada \\ [pdf]

When do cheap talk games admit informative equilibria? In a binary-action model, we show that the alignment between Sender and Receiver preferences determines the existence of informative equilibria, or lack thereof. We provide a characterization of the equilibrium payoff set. When moving beyond the binary-action setting, we find that the preference alignment no longer captures the existence of an informative equilibrium in general. We demonstrate that even if the preferences are perfectly misaligned, there may exist an informative equilibrium. With a restriction to binary states, we find that such preferences do not allow the Receiver to improve their payoffs over a babbling equilibrium, yet this intuitive prediction does not hold beyond the binary-state assumption. We identify a different set of alignment conditions between preferences such that Bayesian persuasion realizes payoffs the same as, or different to, some equilibrium in cheap talk.

Intent Demonstration in General-Sum Dynamic Games via Iterative Linear-Quadratic Approximations \\ Jingqi Li et al. \\ [arxiv]

Autonomous agents should be able to coordinate with other agents without knowing their intents ahead of time. While prior work has studied how agents can gather information about the intent of others, in this work, we study the inverse problem: how agents can demonstrate their intent to others, within the framework of general-sum dynamic games. We first present a model of this intent demonstration problem and then propose an algorithm that enables an agent to trade off their task performance and intent demonstration to improve the overall system's performance. To scale to continuous states and action spaces as well as to nonlinear dynamics and costs, our algorithm leverages linear-quadratic approximations with an efficient intent teaching guarantee. Our empirical results show that intent demonstration accelerates other agents' learning and enables the demonstrating agent to balance task performance with intent expression.

published

Social Planning with the Replicator Dynamics \\ IEEE Control Systems Letters \\ Anand Siththaranjan* and Claire Tomlin \\ [pdf]

Approaches to social planning tend to assume that the behavior of agents is at an equilibrium, yet in practice people’s behavior gradually adapts to their experiences. In this work, a model of social planning under the replicator dynamics is studied. This model allows for a social planner to control the learning process of agents by influencing the relative fitness of different strategies. The desiderata that such a social planner would ideally achieve – exponential stability and budget-balance – are described. Existence of a solution for any full-support distribution, as well as an analysis of its properties, are shown constructively by leveraging classical tools from geometric control theory. Though the solution is optimal in an environment without transfer costs, this may not generally hold otherwise. We formulate a relevant optimal control problem to model this setting, and determine performance guarantees based in our original solution.

Distributional Preference Learning \\ International Conference on Learning Representation (ICLR 2024) \\ Anand Siththaranjan*, Cassidy Laidlaw*, and Dylan Hadfield-Menell \\ [arxiv] [code]

In practice, preference learning from human feedback depends on incomplete data with hidden context. Hidden context refers to data that affects the feedback received, but which is not represented in the data used to train a preference model. This captures common issues of data collection, such as having human annotators with varied preferences, cognitive processes that result in seemingly irrational behavior, and combining data labeled according to different criteria. We prove that standard applications of preference learning, including reinforcement learning from human feedback (RLHF), implicitly aggregate over hidden contexts according to a well-known voting rule called Borda count. We show this can produce counter-intuitive results that are very different from other methods which implicitly aggregate via expected utility. Furthermore, our analysis formalizes the way that preference learning from users with diverse values tacitly implements a social choice function. A key implication of this result is that annotators have an incentive to misreport their preferences in order to influence the learned model, leading to vulnerabilities in the deployment of RLHF. As a step towards mitigating these problems, we introduce a class of methods called distributional preference learning (DPL). DPL methods estimate a distribution of possible score values for each alternative in order to better account for hidden context. Experimental results indicate that applying DPL to RLHF for LLM chatbots identifies hidden context in the data and significantly reduces subsequent jailbreak vulnerability.

AI Alignment with Changing and Influenceable Reward Functions \\ International Conference on Machine Learning (ICML 2024) \\ Micah Carroll et al. \\ [arxiv]

Existing AI alignment approaches assume that preferences are static, which is unrealistic: our preferences change, and may even be influenced by our interactions with AI systems themselves. To clarify the consequences of incorrectly assuming static preferences, we introduce Dynamic Reward Markov Decision Processes (DR-MDPs), which explicitly model preference changes and the AI's influence on them. We show that despite its convenience, the static-preference assumption may undermine the soundness of existing alignment techniques, leading them to implicitly reward AI systems for influencing user preferences in ways users may not truly want. We then explore potential solutions. First, we offer a unifying perspective on how an agent's optimization horizon may partially help reduce undesirable AI influence. Then, we formalize different notions of AI alignment that account for preference change from the outset. Comparing the strengths and limitations of 8 such notions of alignment, we find that they all either err towards causing undesirable AI influence, or are overly risk-averse, suggesting that a straightforward solution to the problems of changing preferences may not exist. As there is no avoiding grappling with changing preferences in real-world settings, this makes it all the more important to handle these issues with care, balancing risks and capabilities. We hope our work can provide conceptual clarity and constitute a first step towards AI alignment practices which explicitly account for (and contend with) the changing and influenceable nature of human preferences.

Open Problems and Fundamental Limitations of RLHF \\ Transactions on Machine Learning Research (TMLR 2023) & International Conference on Learning Representations (ICLR 2025) \\ Stephen Casper et al. \\ [arxiv]

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-faceted approach to the development of safer AI systems.

On the Computational Consequences of Cost Function Design in Nonlinear Optimal Control \\ IEEE Conference on Decision and Control (CDC 2022) \\ Tyler Westenbroek et al. \\ [ieee]

Optimal control is an essential tool for stabilizing complex nonlinear systems. However, despite the extensive impacts of methods such as receding horizon control, dynamic programming and reinforcement learning, the design of cost functions for a particular system often remains a heuristic-driven process of trial and error. In this paper we seek to gain insights into how the choice of cost function interacts with the underlying structure of the control system and impacts the amount of computation required to obtain a stabilizing controller. We treat the cost design problem as a two-step process where the designer specifies outputs for the system that are to be penalized and then modulates the relative weighting of the inputs and the outputs in the cost. To characterize the computational burden associated to obtaining a stabilizing controller with a particular cost, we bound the prediction horizon required by receding horizon methods and the number of iterations required by dynamic programming methods to meet this requirement. Our theoretical results highlight a qualitative separation between what is possible, from a design perspective, when the chosen outputs induce either minimum-phase or non-minimum-phase behavior. Simulation studies indicate that this separation also holds for modern reinforcement learning methods.

Analyzing Human Models that Adapt Online \\ International Conference on Robotics and Automation (ICRA 2021) \\ Andrea Bajcsy et al. \\ [ieee]

Predictive human models often need to adapt their parameters online from human data. This raises previously ignored safety-related questions for robots relying on these models such as what the model could learn online and how quickly could it learn it. For instance, when will the robot have a confident estimate in a nearby human’s goal? Or, what parameter initializations guarantee that the robot can learn the human’s preferences in a finite number of observations? To answer such analysis questions, our key idea is to model the robot’s learning algorithm as a dynamical system where the state is the current model parameter estimate and the control is the human data the robot observes. This enables us to leverage tools from reachability analysis and optimal control to compute the set of hypotheses the robot could learn in finite time, as well as the worst and best-case time it takes to learn them. We demonstrate the utility of our analysis tool in four human-robot domains, including autonomous driving and indoor navigation.