It's good to see people working on this. They're focusing on the right part of the problem, too - predicting a few seconds ahead in a physical environment. Basic survival is about not really screwing up in the next 10 seconds. (People who do robotics are very aware of this.)
Don't get tangled up in the philosophy of causality. That's not the immediate problem.
The Epilog to his book Causality is a reasonably short read and does a good job of introducing the concepts imo. It's basically the same content he used in his Turing award speech and similar presentations over the last decade or so: http://bayes.cs.ucla.edu/BOOK-2K/causality2-epilogue.pdf
For some shorter reads, I saw this recommendation for biology undergrads yesterday:
> I would really love all biology students to read Elliott Sober's "Apportioning Causal Responsibility", Susan Oyama's "Causal Democracy and Causal Contributions in Developmental Systems Theory", and Richard Lewontin's "The Analysis of Variance and the Analysis of Causes".
(First-order) causality actually isn't hard to determine in environments where first-order effects dominate. The way to determine causality here is through a combination of physical laws and controlled experimentation [1]. In fact, we have plenty of causal models (e.g. 1st principles physics-based models, or design-of-experiments models). Without these models, machines/control systems/etc would not work.
The trouble is, outside of these 1st-order effect dominant, deterministic environments, causality becomes much harder. In complex systems, stochasticity, nonlinearity, feedback loops and higher-order effects dominate. There's also emergent behavior -- properties that are true in the small are not true in the large.
Consider a complex system like human society -- can we truly determine causality of broad interventions? Likely not in a first-order way like in the physical sciences. We can do it imperfectly through tools like causal inference (Rubin) which makes much more modest claims about the "strength" of effects (average causal effect). Randomized Controlled Tests (RCT) is another tool for making causal claims.
But in a complex world, 2nd, 3rd and higher order effects dominate and so the notion of root causes itself becomes ambiguous. Richard I. Cook once said "post-accident attribution to a 'root cause' is fundamentally wrong". Though humans are attracted to the idea of a chain of simple causes (which is why we have the myth of Mrs O'Leary's cow kick over a lantern and starting the Great Chicago Fire of 1871), there's typically no easily-identified root cause. First-order causal thinking assumes a Directed-Acyclic-Graph (DAG) idea of a causality chain which converge into a set of effects, but the reality is that such a DAG, if it can be represented, is likely to be infinitely complex in a complex environment.
First-order causal thinking is an insufficient mental model in a complex environments.
Instead, I think instead of aiming for a deep understanding of epistemic causality (where we try to know and represent causality), it's probably more useful to focus on instrumental causality (where we aim to know the main points of leverage that are effective in changing the system). I think we'll likely get very far just by finding the knobs that have the most effect on the variables we would like to change (that don't also simultaneously change variables that we wouldn't want to change).
[1] to determine causality, we typically have to perturb the system -- determining causality through observational data is possible, e.g. via natural experiments, but there are many epistemic restrictions which limit the claims that can be made.
Plasma physics & fusion energy (my field) is challenging for exactly these reasons (despite being 95% classical physics). It's very rare that we can do a nice controlled experiment where only a single variable is changed at a time. I joke that it's really a subset of biology, not physics.
> probably more useful to focus on instrumental causality
I partly agree -- we humans do seem to get by on rudimentary reasoning. On the other hand, the issue of back-progagation is quite similar to identifying 'root causes.' There's also an issue of combinatorial explosion of the number of possible sets of variables that interact with each other, coupled with the fact that data becomes exponentially sparse as the dimension of the space grows. The human ability to detect causal relations is really stunning when you realize how tough it is -- I wouldn't want to bet that we can reproduce it by trial and error. Evolution had plenty of time to get it right, but we don't.
Very much so. If we could quantitatively characterize every little effect, we might be able to model complex systems accurately, but it isn't always possible (exceptions exist, like meteorological modeling). For the longest time in biology, the modeling approach has tended to be more qualitative than quantitative, though this is changing.
Yes, identifying root-causes from the outcomes is a model inversion problem (here's the data, find the generating model), and model inversion problems tend to be ill-conditioned. In complex systems, there's so many possible combinations of causes that could have led to a single outcome that characterizing the entire set is difficult.
Where is a conditional average treatment effect insufficient? We want to say that, seeing X right now, if we do a1, we'll see a change of d (+/- D), compared to doing a0. Or being able to predict y1 and y0 corresponding to actions a1 and a0. That would be a huge step, and something some of ML's most useful methods can't do.
DAGs are not the only valid approach to causality, nor do they subsume other approaches (I have previously commented on this during the Gelman&Pearl debates).
As you suggest, the treatment effect and econometrics literature currently is on a semi-parametric trend: Given that they don't believe that one can actually produce a believable complete causal model (or DAG), one tries to estimate a treatment effect that does not depend on parametric or functional assumptions.
> DAGs are not the only valid approach to causality
It has been a truism to some that "most efforts in policy are responses to previous efforts in policy." In SPC you can measure noise and overcontrol. With a long-lived policy-making institution [unusual?] honesty might admit to facing effects of previous well-intentioned policies.
> to determine causality, we typically have to perturb the system
Couldn't agree more! For computer models to understand causality, it must be able to interact with the environment and probe it. I think understanding causality is one and the same as reinforcement learning, where a computer model learns to interact with its environment
The majority of the books listed in the first link in these comments are about whether and when we can determine causality given observational data - data where the system is perturbed, just not by us (and not as we would ideally like).
I have not looked at the links, but aren't there people who use Bode plots to estimate direction-of-causality, given time-series data? IIRC there are basic relationships between wide-frequency phase behavior and, e.g., impulse-response functions. There is apparent phase information beyond correlation and ANOVA.
Einstein was inspired to come up with special relativity from that experiment because he had the relevant concepts to draw upon to think through various thought experiments. That and the math to back up his intuition.
Yes, but Einstein was able to learn cause and effect by interacting with the physical world as a child. Without being able to interact with an environment, the I believe the ability to learn causality is limited if not impossible
Alternatively, we need to extend our typical notion of a dataset to include where it came from/how it was perturbed. If you give your best statistician a dataset without column names and a description of the experiment, they aren't going to be able to do causal inference. We need to make those things more available to the machines too, in a structured way.
One should mention the anthropic principle in this context. This would then lead one to speculate about a connection between observing causal order and self-awareness.
I think it's clear that humans at least try to pick out the causal factors, and reason causally about the outcomes of specific actions. Non human animals do too; they can learn to take specific actions for food rewards in a Skinner Box for example. Now I agree humans get it wrong a lot of the time, and other animals might get it wrong even more than we do.
I don't think our fallibility in causal reasoning makes it useless to pursue as a goal in artificially intelligent systems. It doesn't need to be perfect, just useful and better than not having it. Afterall, our perception systems are pretty fallible too, otherwise things like optical illusions wouldn't exist.
It always bugs me when optical illusions are brought up as instances of failure of the our perception system. These contrived conditions are really the 'exceptions that prove the rule:' our visual system has powerful inference capability running below the level of conscious perception, and most of the time it works so well we can't fathom the distinction between the model we infer and the sense data that the system works from. For instance, we perceive one scene with depth, when what we start with is two scenes with slight discrepancies.
Somewhat tangentially, I recall a psych paper where the researchers found that people perceive images in mirrors to be located behind the surface of the mirror. The researchers apparently thought that the image was at the surface of the mirror (ie, they misunderstood basic optics), and concluded that they'd discovered an optical illusion.
I suspect that you made that comment because you think in the terms of rigorous detection of causality, not everyday effective detection of causality heuristically.
Nowadays yes. For most of history though (and still commonly today), causal explanations for plenty of things involved some flavor of the supernatural.
Yes - humans do it abductively, which isn't rigorous but serves our purpose most of the time. You can't trust a self driving system with abduction though - it's the style of reasoning that gave us rain dances and homeopathy.
I'm sure there was a philosopher who said that his deepest wish was to know just one cause. Been trying to find out who said it for ages with no luck, so maybe I'll just attribute it to myself.
David Hume perhaps? He reasoned that causality was not empirical and therefore was a habit of thinking humans acquired from constant conjunction of events. Kant was troubled by that so he elevated causality to a category of thought, like space and time, which the mind used to structure sensations.
That's Deep Mind's "Algorithms for Causal Reasoning in Probability Trees" released on October 23rd.
It got posted on HN a few days ago but I am surprised that it did not get more traction.
The area is under-explored because AFAIK there is no introduction for someone adept in machine learning to learn the basics with coding projects in for example pytorch and showing how causal inference is bringing added benefit. If you want to advance causality research, lower the barrier to entry just like it has been done with deep learning.
I have a fun story about causality inference. I pressed a light switch and immediately thunder crashed. I pressed it again, no sound was heard. Followed by nervous laughter.
I take this to mean that we have a notion of "effectiveness", and that consequences are attributed to preceding effective actions.
it wasn't too long ago that interpretable models are seen as unimportant in the field. the value of prediction is so much more valuable than any other result of machine learning application. explanatory models were considered fuddy-duddy econometric pseudo-science.
The article is missing an important point: you cannot learn causality from observational data alone. It's not about shortcomings of this or that model, it's a theoretical impossibility.
Reinforcement learning is uniquely positioned to build machines that understand cause-and-effect on their own because the algorithm is allowed to interact with the world, observe the results, gather more data, rule out hypotheses, and so on.
1. First let's get through the easy part: reinforcement learning (RL) is not unique in its ability to identify cause-and-effect - this was achieved long ago through the use of randomized controlled trials. RL merely streamlines the task of reacting to such information (as well as optimizing experiments w.r.t. a desired goal).
2. Now the trickier part: you can learn causality from observational data alone if you combine this with understanding of a mechanism. Indeed, the whole field of causal inference is an attempt to formalize and extend such methods.
There is a massive space of problems where experimentation (whether old-fashioned A/B testing or more sophisticated online learning) is simply not possible, whether because of ethical reasons, cost, or other reasons behind non-destructive study. These problems are common in medicine, economics, physics. In such problems the only data is observational. Causal inference is very valuable here.
You are completely right, I jumped through too many hops to reach that conclusion, and I was very imprecise in stating it.
Let me try again: the most interesting and frequent setting is where you cannot control the experimental conditions and you have no idea about the causal mechanism at play. And that is where traditional ML cannot help, but RL can.
What you are describing seems to be essentially the process of science: Develop, step by step, the causal mechanisms at play until you solve the problem.
It should not be underestimated that our collective knowledge of causal inference with statistical methods is good, but still improving.
I mean, in that sense, ML is helpful. Heck, it is already used in causal inference in two-step estimators and the like.
While CRTs are the gold standard for establishing causality, that doesn't mean all is lost in observational data alone. Causal inference is a rich field, with a lot of work in recent decades. There's so much more to causal inference than the old dismissive chesnut about correlation.
That's not true. Look up doubly robust estimators for a neat counterexample. Or IPTW for an extension of a dataset that can help with this (X, t, y, and propensities, instead of just X, t, y). Causal inference has a super rich literature that's quickly growing.
Or better yet (stretching and sidestepping how you meant it), give your friendly neighborhood statistician/econometrician just a dataset and they can't do causal inference. Give them propensities and column names/descriptions and a writeup of the experiment/where the data came from, and suddenly they might be able to do causal inference. It points to a need to augment our observations with more structured metadata, if we want to do causal inference with data that's just lying around.
Can you elaborate on this? If agent A interacts with a system and can learn causal relationships, how could agent B who observes all of agent A's experiments not be capable of drawing the same conclusions?
It seems that any theorem that rules out learning causality from observational data alone would also rule out learning causality from any kind of interactions.
Unless you're assuming that agent A "knows" it has free will so its own actions have no cause, while agent B can't tell whether the environment caused agent A's actions or vice-versa. But if that's what the proof hinges on, it's pretty shallow, because agent A has no such guarantee that its own choices have no root cause.
Sorry, I was quite sloppy in my previous comment. Agent B certainly can learn everything that A learns from the same observations.
What I meant is that you cannot learn from "general" observational data, unless it is structured in a certain way (the randomized controlled trial mentioned in a sibling). RL is able to gather data on its own, while other ML methods must do with what they are given. This means that RL could eventually discover the causal relationships, while non-RL cannot (except if the data comes from a RCT).
>you cannot learn causality from observational data alone.
As others have pointed out, this is kind of the point of much of Pearl's work on causality. Specifically, do-calculus provides a set of primitive operations that can be used to convert queries in interventional/counterfactual (causal) distributions to estimands in a purely observational distribution.
Agreed. But even RL is often difficult to transfer to other domains of knowledge.
Though I agree that this is an important next challenge (and causality has been the "next challenge" for at least the last 5 years), it's often more easily solved these days with mixing in human expert knowledge to the equation (that is, using ML alongside of human expertise).