Why is my internet out? Insights on logic and confounders from a dear nerd
I wanted to know more about DAGs (directed acyclic graphs) and how they help people that design experiments account for confounders and other nuisances. This is a lot to do with counterintuitive logic and the many mistakes that observational clinical trials still suffer from.
It seemed obvious to me that someone that taught themselves computing and code, who could explain this well and clearly to someone like me, who hates computing, could do so for everybody. Hence, my first guest post... ladies and gents, by Francisco Amaral.
Why is my internet out?
Hypothesis: it's my (internet service provider) ISP's fault.
Let's call ISP; they agree, it's their fault. There was an equipment failure on their end that caused connections to and from my router to fail because a certain machine was not working.
Here, we have a simple causal relation —it is physically impossible for data to be transferred through machines that are not working. This is verifiable and empirical. It is a fact.
If all other variables allow connections to get through, blocking this one is enough to prevent it. Therefore, if that machine is broken, connection to the internet is not possible.
Theory: BROKEN MACHINE → NO
INTERNET
For the purposes of this, let’s assume no other variables which can cause an internet outage are at play here. Only this broken machine.
⇢ So two weeks later, my internet is out again. 😖 And again I call my ISP. They inform me that this time, it's a different machine that's broken, and it will take a few hours to fix. Without this machine, no connection to the internet is possible.
We can use our theory from before for this case too. Looks like we have another simple causal relation. So why does this particular machine seem to break most often when it rains (true story by the way)?
⟹ We need to start considering the variables in our problem. And here lies the first challenge. In the first case, we only considered “machine” and “internet”. This was enough for us, since it’s something our ISP is in charge of, we know that only they can fix, and that they will fix it 😕 as soon as the problem is found.
We don’t need to investigate further. Machines break sometimes. Establishing this causal relationship accomplishes our goals. However, because the weather is involved, let's take a deeper dive.
⇢ As it turns out, this particular machine is in an underground box outside. When it rains, it gets flooded, and water needs to be pumped out. Some parts may need replacing too, but not necessarily. If the machine that handles my internet connection becomes submerged, it breaks (huge oversimplification, but let’s assume this to be true).
⟹ It's also not enough that it “just rains”. It needs to rain a lot, and over an extended period of time. It's starting to be a lot to consider here. So how do we keep tabs on all the things that stop me from lurking on Reddit on a wide screen?
Well, what if instead of just “machine” and “internet”, we introduce the “rain” variable? We then have three variables, “rain”, “machine”, and “internet”. How do they relate to each other?
We know that NO MACHINE = NO INTERNET
But rain sometimes equals no internet too?
It looks like the two are related, but the first doesn’t necessarily cause the latter. So we have to climb up the hierarchy, which we had to research in order to find that it exists in the first place —in my case, the research came down to speaking with the technician who has had to pump the water out of the “box hole”.
⟹ From the bottom to the top, (no) internet ← (no) machine ← rain
So it looks like there’s a correlation between rain and a broken machine. Incidentally, there is also a correlation between rain and no internet, but through our research we know that that correlation is more than one hierarchical step away, so we ignore it. What matters to us is what causes the internet outage, which we know to be the machine.
⇢ But what if we want to take it one step further? Correlation is vague. I want to know why my internet only goes out when it rains sometimes. Well, we need to try to deconstruct our variables further.
What if, instead of just “rain”, we had “millimeters of precipitation” and “period of time”? In this case, we have other avenues of research.
We could perform measurements, or consult the experts that exist exclusively as a figment of my imagination, who will tell us that, considering all the variables and relations that they’ve already researched (like slope of the terrain, ground porosity, volume of the “box hole”, height at which the machine sits within that hole, drainage system capacity, etc.) and that we will not introduce into our little example here (although we could), it is certain that the machine will be submerged if it rains X millimeters within a period of Y days.
⟹ Now we have a net of relationships that starts to get hard to visualize without the help of graphics. We have an and relationship between “X millimeters of precipitation” and “Y period of time”, which has a causal effect on breaking the machine, which will determine if I get internet or not.
↪ Establishing relationships can go pretty much as deep as we want. In this particular case, we could go so far as to investigate the prevalence of planetary weather patterns, like El Niño, to ascertain the probability of heavy precipitation in my region, so that I can remember to keep my phone charged or download a show I can watch while my internet is out.
Hi! It's me again. Did you get it? I actually did! So pleased with myself that I understood this! It's incredibly relevant for biomedical research, since traditional methods of identifying confounding and adjusting for confounding may be inadequate. For how to apply this in your own research, see the summary below, taken from here:
Michoel T, Zhang JD. Causal inference in drug discovery and development. Drug Discov Today. 2023;28(10):103737. doi:10.1016/j.drudis.2023.103737
Comments
Post a Comment