Understanding Causality Is the Next Challenge for Machine Learning

mooneater · on Oct 29, 2020

Which book should you read? https://www.bradyneal.com/which-causal-inference-book

cschmidt · on Oct 29, 2020

It is worth mentioning that the "Causal Inference: What if" book in the flowchart is a free download. The dead tree version is still in the works.

https://www.hsph.harvard.edu/miguel-hernan/causal-inference-...

Animats · on Oct 29, 2020

It's good to see people working on this. They're focusing on the right part of the problem, too - predicting a few seconds ahead in a physical environment. Basic survival is about not really screwing up in the next 10 seconds. (People who do robotics are very aware of this.)

Don't get tangled up in the philosophy of causality. That's not the immediate problem.

ganzuul · on Oct 29, 2020

As someone thoroughly tangled up in said philosophy, I concur. Here be dragons.

ljw1001 · on Oct 29, 2020

Species survival requires looking much farther ahead, but it’s harder to do, and even harder to monetize.

e_y_ · on Oct 30, 2020

Focusing on the short term also seems like it'd make paperclip/Skynet scenarios less likely.

turing_complete · on Oct 29, 2020

I recommend "The Book of Why" by Judea Pearl as an introduction to the topic. It is targeted at a general (but educated) audience.

jasonwatkinspdx · on Oct 29, 2020

The Epilog to his book Causality is a reasonably short read and does a good job of introducing the concepts imo. It's basically the same content he used in his Turing award speech and similar presentations over the last decade or so: http://bayes.cs.ucla.edu/BOOK-2K/causality2-epilogue.pdf

epistasis · on Oct 29, 2020

For some shorter reads, I saw this recommendation for biology undergrads yesterday:

> I would really love all biology students to read Elliott Sober's "Apportioning Causal Responsibility", Susan Oyama's "Causal Democracy and Causal Contributions in Developmental Systems Theory", and Richard Lewontin's "The Analysis of Variance and the Analysis of Causes".

https://twitter.com/hanemaung/status/1321488068717719552?s=2...

These are available through research gate for those without library access.

wenc · on Oct 29, 2020

(First-order) causality actually isn't hard to determine in environments where first-order effects dominate. The way to determine causality here is through a combination of physical laws and controlled experimentation [1]. In fact, we have plenty of causal models (e.g. 1st principles physics-based models, or design-of-experiments models). Without these models, machines/control systems/etc would not work.

The trouble is, outside of these 1st-order effect dominant, deterministic environments, causality becomes much harder. In complex systems, stochasticity, nonlinearity, feedback loops and higher-order effects dominate. There's also emergent behavior -- properties that are true in the small are not true in the large.

Consider a complex system like human society -- can we truly determine causality of broad interventions? Likely not in a first-order way like in the physical sciences. We can do it imperfectly through tools like causal inference (Rubin) which makes much more modest claims about the "strength" of effects (average causal effect). Randomized Controlled Tests (RCT) is another tool for making causal claims.

But in a complex world, 2nd, 3rd and higher order effects dominate and so the notion of root causes itself becomes ambiguous. Richard I. Cook once said "post-accident attribution to a 'root cause' is fundamentally wrong". Though humans are attracted to the idea of a chain of simple causes (which is why we have the myth of Mrs O'Leary's cow kick over a lantern and starting the Great Chicago Fire of 1871), there's typically no easily-identified root cause. First-order causal thinking assumes a Directed-Acyclic-Graph (DAG) idea of a causality chain which converge into a set of effects, but the reality is that such a DAG, if it can be represented, is likely to be infinitely complex in a complex environment. First-order causal thinking is an insufficient mental model in a complex environments.

Instead, I think instead of aiming for a deep understanding of epistemic causality (where we try to know and represent causality), it's probably more useful to focus on instrumental causality (where we aim to know the main points of leverage that are effective in changing the system). I think we'll likely get very far just by finding the knobs that have the most effect on the variables we would like to change (that don't also simultaneously change variables that we wouldn't want to change).

[1] to determine causality, we typically have to perturb the system -- determining causality through observational data is possible, e.g. via natural experiments, but there are many epistemic restrictions which limit the claims that can be made.

lambdatronics · on Oct 31, 2020

>complex systems, stochasticity, nonlinearity, feedback loops and higher-order effects ... emergent behavior

Plasma physics & fusion energy (my field) is challenging for exactly these reasons (despite being 95% classical physics). It's very rare that we can do a nice controlled experiment where only a single variable is changed at a time. I joke that it's really a subset of biology, not physics.

> probably more useful to focus on instrumental causality

I partly agree -- we humans do seem to get by on rudimentary reasoning. On the other hand, the issue of back-progagation is quite similar to identifying 'root causes.' There's also an issue of combinatorial explosion of the number of possible sets of variables that interact with each other, coupled with the fact that data becomes exponentially sparse as the dimension of the space grows. The human ability to detect causal relations is really stunning when you realize how tough it is -- I wouldn't want to bet that we can reproduce it by trial and error. Evolution had plenty of time to get it right, but we don't.

wenc · on Oct 31, 2020

Very much so. If we could quantitatively characterize every little effect, we might be able to model complex systems accurately, but it isn't always possible (exceptions exist, like meteorological modeling). For the longest time in biology, the modeling approach has tended to be more qualitative than quantitative, though this is changing.

Yes, identifying root-causes from the outcomes is a model inversion problem (here's the data, find the generating model), and model inversion problems tend to be ill-conditioned. In complex systems, there's so many possible combinations of causes that could have led to a single outcome that characterizing the entire set is difficult.

6gvONxR4sf7o · on Oct 29, 2020

Where is a conditional average treatment effect insufficient? We want to say that, seeing X right now, if we do a1, we'll see a change of d (+/- D), compared to doing a0. Or being able to predict y1 and y0 corresponding to actions a1 and a0. That would be a huge step, and something some of ML's most useful methods can't do.

zwaps · on Oct 29, 2020

DAGs are not the only valid approach to causality, nor do they subsume other approaches (I have previously commented on this during the Gelman&Pearl debates).

As you suggest, the treatment effect and econometrics literature currently is on a semi-parametric trend: Given that they don't believe that one can actually produce a believable complete causal model (or DAG), one tries to estimate a treatment effect that does not depend on parametric or functional assumptions.

a9h74j · on Oct 30, 2020

> DAGs are not the only valid approach to causality

It has been a truism to some that "most efforts in policy are responses to previous efforts in policy." In SPC you can measure noise and overcontrol. With a long-lived policy-making institution [unusual?] honesty might admit to facing effects of previous well-intentioned policies.

jostmey · on Oct 29, 2020

> to determine causality, we typically have to perturb the system

Couldn't agree more! For computer models to understand causality, it must be able to interact with the environment and probe it. I think understanding causality is one and the same as reinforcement learning, where a computer model learns to interact with its environment

zwaps · on Oct 29, 2020

The majority of the books listed in the first link in these comments are about whether and when we can determine causality given observational data - data where the system is perturbed, just not by us (and not as we would ideally like).

a9h74j · on Oct 30, 2020

I have not looked at the links, but aren't there people who use Bode plots to estimate direction-of-causality, given time-series data? IIRC there are basic relationships between wide-frequency phase behavior and, e.g., impulse-response functions. There is apparent phase information beyond correlation and ANOVA.

jbay808 · on Oct 29, 2020

But many datasets will be full of recorded perturbations already. Can't you usually hunt for the evidence you need in the data you have?

The Michelson-Morley experiment was enough data to get special relativity, for example.

goatlover · on Oct 29, 2020

Einstein was inspired to come up with special relativity from that experiment because he had the relevant concepts to draw upon to think through various thought experiments. That and the math to back up his intuition.

jostmey · on Oct 29, 2020

Yes, but Einstein was able to learn cause and effect by interacting with the physical world as a child. Without being able to interact with an environment, the I believe the ability to learn causality is limited if not impossible

6gvONxR4sf7o · on Oct 29, 2020

Alternatively, we need to extend our typical notion of a dataset to include where it came from/how it was perturbed. If you give your best statistician a dataset without column names and a description of the experiment, they aren't going to be able to do causal inference. We need to make those things more available to the machines too, in a structured way.

NikolaeVarius · on Oct 29, 2020

I was under the impression understanding causality is like, the secret of space, time and existence.

krapp · on Oct 29, 2020

To understand causality, you must first have understood causality.

ganzuul · on Oct 29, 2020

https://en.wikipedia.org/wiki/Unmoved_mover#First_cause

One should mention the anthropic principle in this context. This would then lead one to speculate about a connection between observing causal order and self-awareness.

clircle · on Oct 29, 2020

Why does anyone think AI can pick up on causation when humans can't even do it?

evjrob · on Oct 29, 2020

I think it's clear that humans at least try to pick out the causal factors, and reason causally about the outcomes of specific actions. Non human animals do too; they can learn to take specific actions for food rewards in a Skinner Box for example. Now I agree humans get it wrong a lot of the time, and other animals might get it wrong even more than we do.

I don't think our fallibility in causal reasoning makes it useless to pursue as a goal in artificially intelligent systems. It doesn't need to be perfect, just useful and better than not having it. Afterall, our perception systems are pretty fallible too, otherwise things like optical illusions wouldn't exist.

lambdatronics · on Nov 1, 2020

It always bugs me when optical illusions are brought up as instances of failure of the our perception system. These contrived conditions are really the 'exceptions that prove the rule:' our visual system has powerful inference capability running below the level of conscious perception, and most of the time it works so well we can't fathom the distinction between the model we infer and the sense data that the system works from. For instance, we perceive one scene with depth, when what we start with is two scenes with slight discrepancies.

Somewhat tangentially, I recall a psych paper where the researchers found that people perceive images in mirrors to be located behind the surface of the mirror. The researchers apparently thought that the image was at the surface of the mirror (ie, they misunderstood basic optics), and concluded that they'd discovered an optical illusion.

nabla9 · on Oct 29, 2020

Humans can and we are really good at it.

I suspect that you made that comment because you think in the terms of rigorous detection of causality, not everyday effective detection of causality heuristically.

chillacy · on Oct 30, 2020

Nowadays yes. For most of history though (and still commonly today), causal explanations for plenty of things involved some flavor of the supernatural.

shatnersbassoon · on Oct 29, 2020

Yes - humans do it abductively, which isn't rigorous but serves our purpose most of the time. You can't trust a self driving system with abduction though - it's the style of reasoning that gave us rain dances and homeopathy.

nabla9 · on Oct 29, 2020

>You can't trust a self driving system with abduction though

All wetware based self driving systems on the road use heuristics in their wetware system.

You can't hope to have self driving without heuristics. That's what deep learning is.

shatnersbassoon · on Oct 29, 2020

I'm sure there was a philosopher who said that his deepest wish was to know just one cause. Been trying to find out who said it for ages with no luck, so maybe I'll just attribute it to myself.

goatlover · on Oct 29, 2020

David Hume perhaps? He reasoned that causality was not empirical and therefore was a habit of thinking humans acquired from constant conjunction of events. Kant was troubled by that so he elevated causality to a category of thought, like space and time, which the mind used to structure sensations.

saas_sam · on Oct 29, 2020

+1 to Hume. He described causality as being a glorified expectation which is subject to change at any time.

random_user456 · on Oct 30, 2020

There have been a lot of advances recently. https://arxiv.org/abs/2010.12237v1

sheepdestroyer · on Oct 31, 2020

That's Deep Mind's "Algorithms for Causal Reasoning in Probability Trees" released on October 23rd. It got posted on HN a few days ago but I am surprised that it did not get more traction.

nafizh · on Oct 30, 2020

The area is under-explored because AFAIK there is no introduction for someone adept in machine learning to learn the basics with coding projects in for example pytorch and showing how causal inference is bringing added benefit. If you want to advance causality research, lower the barrier to entry just like it has been done with deep learning.

im3w1l · on Oct 30, 2020

I have a fun story about causality inference. I pressed a light switch and immediately thunder crashed. I pressed it again, no sound was heard. Followed by nervous laughter.

I take this to mean that we have a notion of "effectiveness", and that consequences are attributed to preceding effective actions.

princeb · on Oct 30, 2020

it wasn't too long ago that interpretable models are seen as unimportant in the field. the value of prediction is so much more valuable than any other result of machine learning application. explanatory models were considered fuddy-duddy econometric pseudo-science.

o_p · on Oct 29, 2020

Humans dont understand causality at all and yet they are considered intelligent though

arusahni · on Oct 29, 2020

... and humans, tbh.

mrfusion · on Oct 29, 2020

Anyone working on GPT-4? it might figure some of that out on its own?

blackbear_ · on Oct 29, 2020

The article is missing an important point: you cannot learn causality from observational data alone. It's not about shortcomings of this or that model, it's a theoretical impossibility.

Reinforcement learning is uniquely positioned to build machines that understand cause-and-effect on their own because the algorithm is allowed to interact with the world, observe the results, gather more data, rule out hypotheses, and so on.

xenocyon · on Oct 29, 2020

That's not correct.

1. First let's get through the easy part: reinforcement learning (RL) is not unique in its ability to identify cause-and-effect - this was achieved long ago through the use of randomized controlled trials. RL merely streamlines the task of reacting to such information (as well as optimizing experiments w.r.t. a desired goal).

2. Now the trickier part: you can learn causality from observational data alone if you combine this with understanding of a mechanism. Indeed, the whole field of causal inference is an attempt to formalize and extend such methods.

There is a massive space of problems where experimentation (whether old-fashioned A/B testing or more sophisticated online learning) is simply not possible, whether because of ethical reasons, cost, or other reasons behind non-destructive study. These problems are common in medicine, economics, physics. In such problems the only data is observational. Causal inference is very valuable here.

blackbear_ · on Oct 29, 2020

You are completely right, I jumped through too many hops to reach that conclusion, and I was very imprecise in stating it.

Let me try again: the most interesting and frequent setting is where you cannot control the experimental conditions and you have no idea about the causal mechanism at play. And that is where traditional ML cannot help, but RL can.

zwaps · on Oct 29, 2020

What you are describing seems to be essentially the process of science: Develop, step by step, the causal mechanisms at play until you solve the problem.

It should not be underestimated that our collective knowledge of causal inference with statistical methods is good, but still improving.

I mean, in that sense, ML is helpful. Heck, it is already used in causal inference in two-step estimators and the like.

jasonwatkinspdx · on Oct 29, 2020

While CRTs are the gold standard for establishing causality, that doesn't mean all is lost in observational data alone. Causal inference is a rich field, with a lot of work in recent decades. There's so much more to causal inference than the old dismissive chesnut about correlation.

6gvONxR4sf7o · on Oct 29, 2020

That's not true. Look up doubly robust estimators for a neat counterexample. Or IPTW for an extension of a dataset that can help with this (X, t, y, and propensities, instead of just X, t, y). Causal inference has a super rich literature that's quickly growing.

Or better yet (stretching and sidestepping how you meant it), give your friendly neighborhood statistician/econometrician just a dataset and they can't do causal inference. Give them propensities and column names/descriptions and a writeup of the experiment/where the data came from, and suddenly they might be able to do causal inference. It points to a need to augment our observations with more structured metadata, if we want to do causal inference with data that's just lying around.

jbay808 · on Oct 29, 2020

Can you elaborate on this? If agent A interacts with a system and can learn causal relationships, how could agent B who observes all of agent A's experiments not be capable of drawing the same conclusions?

It seems that any theorem that rules out learning causality from observational data alone would also rule out learning causality from any kind of interactions.

Unless you're assuming that agent A "knows" it has free will so its own actions have no cause, while agent B can't tell whether the environment caused agent A's actions or vice-versa. But if that's what the proof hinges on, it's pretty shallow, because agent A has no such guarantee that its own choices have no root cause.

blackbear_ · on Oct 29, 2020

Sorry, I was quite sloppy in my previous comment. Agent B certainly can learn everything that A learns from the same observations.

What I meant is that you cannot learn from "general" observational data, unless it is structured in a certain way (the randomized controlled trial mentioned in a sibling). RL is able to gather data on its own, while other ML methods must do with what they are given. This means that RL could eventually discover the causal relationships, while non-RL cannot (except if the data comes from a RCT).

zwaps · on Oct 29, 2020

Do I understand you correctly that you mean: "Learn automatically without further human input?"

Because in statistics, causal inference is certainly possible without RCTs.

jbay808 · on Oct 29, 2020

Thank you for the follow-up! That's much clearer.

axiom92 · on Oct 29, 2020

>you cannot learn causality from observational data alone.

As others have pointed out, this is kind of the point of much of Pearl's work on causality. Specifically, do-calculus provides a set of primitive operations that can be used to convert queries in interventional/counterfactual (causal) distributions to estimands in a purely observational distribution.

dfmooreqqq · on Oct 29, 2020

Agreed. But even RL is often difficult to transfer to other domains of knowledge.

Though I agree that this is an important next challenge (and causality has been the "next challenge" for at least the last 5 years), it's often more easily solved these days with mixing in human expert knowledge to the equation (that is, using ML alongside of human expertise).

benlivengood · on Oct 29, 2020

What of astronomy, then? I think it is possible, but certainly more difficult, to infer causality from observation.