Ch@t: Bynum resident uses data analysis skills to track COVID-19 spread, create in-depth database

Posted 6/5/20

Bynum resident Craig Greiner has been very busy lately. Like many, his life was changed by the COVID-19 pandemic. It led him to create a data dashboard that tracks how the virus is spreading in …

The News + Record is worth reading!

We’re all about Chatham County, and we welcome you to our site. You can view up to 3 stories each month, then registration is required.

Please sign in below if you have an account. If not, please register here to get an account and an additional 7 stories each month. It’s easy and takes just a minute.

Our staff works hard to bring good journalism, writing and story-telling to Chatham County. HELP US! You can get the News + Record mailed to you weekly by subscribing here.

Please log in to continue

Log in

Ch@t: Bynum resident uses data analysis skills to track COVID-19 spread, create in-depth database

Thanks for reading Chatham County’s leading news source! Please consider supporting community journalism by subscribing to the News + Record – you can do so by clicking here.

Posted

Bynum resident Craig Greiner has been very busy lately. Like many, his life was changed by the COVID-19 pandemic. It led him to create a data dashboard that tracks how the virus is spreading in Chatham County, North Carolina and around the country.

“I have seen a great deal of information and misinformation shared about the pandemic, the spread, and the impact,” he said. “I see this at a local level, and even at a national level with maps shown on national news networks. Most recently, we learned that in Georgia the data was heavily misrepresented the data. I often see various sources citing a new ‘daily total,’ but that number means little without context — particularly the context of how that number has changed recently.”

Greiner used to work in the medical field, but now he uses data to help “determine trends or correlations,” he said, with the John Deere Company. You can find his dashboard at public.tableau.com/profile/craig.greiner#!/vizhome/USCovid-19Analytics/COVID-19Analysis. A former bio-engineer and researcher working in regenerative medicine, Greiner spoke to the News + Record this week about mining data, reliable sources and the role of social media in how people interpret data.

You mentioned your concern about the spread of misinformation around COVID-19. How can the public become better informed about the data that they are seeing? What sources do you recommend using? What types of questions should people ask themselves when interpreting data?

When I was in high school, I recall a period where we studied current events as part of our courses. At the beginning of the week, we would find multiple newspapers on our desks for us to digest — both local and national. Our assignment was to find one topic that was covered by more than one news outlet and report out on what we learned by consulting multiple sources of information. The point being, every news outlet has some inherent bias because every outlet is a collection of individuals, and each news article is written by a person and edited by a person, and if even only in tone or choice of certain adjective or adverb there is rarely ever total neutrality. This same sentiment holds true today and is especially true when looking at data.

One may think that facts are facts, numbers do not lie, etc., but there are many ways to skew a person’s interpretation of the data. These can be something visual such as color choice, font size or placement on a page, or more statistical such as choosing to illustrate a median versus a mean, removing outliers in the data set, etc. It could also be a simple lack of context such as we often see with COVID-19 data. For example, knowing what our number of total or new cases today is valuable, but knowing our trend is much more informative.

To become better informed as a member of the public, I suggest that just as I had to do in that high school classroom — consult multiple sources before making any conclusion and be critical of the differences. In the world of what appears to be ever-widening media bias, this is even more critical. However, equally as important in the time of social media news feeds is to consider the source and verify. Ask yourself, where are the numbers coming from, who is sharing the information, and is it consistent with what I am seeing elsewhere? In short, just because someone shares something on social media does not make it true.

Another item you should consider is what story is the person sharing the data trying to tell. With COVID-19, are they trying to illustrate that it is safe to re-open, that it is not safe to re-open, or evaluating if it is safe to re-open? Those are three very different ‘stories’ that just as a writer is often not fully neutral, neither is a data scientist.

When in doubt, go as close to the source as you can. When considering COVID-19 data, the data I share is based on the Johns Hopkins University data set (coronavirus.jhu.edu), which is used widely as a trusted repository. You can also consult local state government websites (covid19.ncdhhs.gov/dashboard) and the CDC website (cdc.gov/coronavirus/2019-ncov/index.html). But remember, even when consulting those sources as yourself about the ‘story’, the context, and the consistency with other data sources.

Based on the data you have found in your analysis, do you think North Carolina reopened too soon?

Hindsight is 20-20, right? Also, I am basing my comments only on the CDC recommendation that states should see a 14-day downward trend of new cases. To calculate this, I examine the new cases reported any given day and the 13 preceding days. Plotting a linear trend line of this provides a slope. If that slope is positive, we are trending up; negative, trending down. Based on what I see in the data, if I would take a snapshot in time as of May 5, North Carolina cases were trending downward over a 14-day period. This was true for both May 4 and May 5. Purely by the numbers, we met that metric for reopening. However, the data shows we did so just barely. It was nearly flat and as of May 3, the two-week average was still trending upwards significantly — at a rate four times greater than that at which we were trending downward on May 5.

As a data steward, I pride myself in presenting and interpreting data, but allowing individuals to draw their own insights (i.e. opinions). The N.C. state government based their decision on multiple metrics, including trends in the amount of testing, ability to perform contact tracing and a downward slope in multiple trends, including daily case counts and hospitalizations. It is likely the other metrics influenced their decision, and rightfully so.

If I were to perform the same analysis for Phase 2 (May 22), it tells a different story. On May 22 we had a significant upward 14-day trend — 33 times that at which we were trending downward on May 5. We had also experienced a rolling 14-day trend upward all but two days from May 6 to May 22. While there is a lag from disease transmission to a new case being recorded, which prevents us from concluding that re-opening on the 5th caused the increase, the data shows a clear and persistent increasing trend. By this metric, and this metric alone, the state was not meeting the guideline for re-opening further.

How do you think social media has played a role in the misinformation shared about the pandemic and its spread?

There are several reasons people need to be cautious and critical when considering social media ‘news’ - especially regarding this pandemic. We must have an even higher bar of being properly informed and realize that anyone can post anything and there is very little if any ‘fact checking’. In this manner misinformation can spread just as easily as actual facts. However, in the case of a pandemic, that misinformation can have dire, real world consequences. Also, the manner in which social media sites function is a positive feedback loop that continues to deliver a user more of what they react to, reinforcing a belief whether that belief be rooted in fact or fiction.

How would you advise people to look at, and interpret, data they’re reading about and hearing about?

Be critical. Do not blindly believe. Do not blindly promote. View it as a personal responsibility to only share accurate information and take pride in being fully informed and promoting factual information. And as I mentioned before, remember to always consider the ‘story’, the context, and the consistency with other data sources.

What other key metrics should people look out for when looking at data, in addition to the virus’s spread over the last 14 days?

As a means of collating and presenting data, we always place data into buckets. That may be by state or by county, by new cases or by deaths. However, we must remember that the virus does not respect man-made borders and those buckets are ultimately arbitrary. When considering the spread of the virus, also consider the trends of surrounding counties and states and the number of transients in your area (such as by tourism) that may introduce the virus.

At a foundational level, I feel it is also critical that we consider the rate of testing when looking at any COVID-19 data. Without testing, we cannot report positive cases. Therefore, inadequate testing can greatly skew the data. We must also remember that testing is not a one-and-done phenomenon where we are “done” when a certain percent of the population has been tested once. It is a journey and will likely involve repeat testing.

While not included in the data set I have presented, I think it is also important that we begin to look at the virus and its impact on various populations. Be that by race, age, gender or overall health, we will likely find differences in the impact of the virus. And when considering this, individuals must be conscientious that while they may not be part of a “high-risk” population, they may be able to spread the virus to that population. It is not only about our own response to the virus, but the larger social impact of our individual choices.

What are some of the common pitfalls in interpreting COVID-19 stats?

Two of the common pitfalls that I often see are a lack of context and an abuse of context.

Trends are the most informative piece of data we have at this time. Viewing the data as a snapshot in time, with no context, without also presenting the current trend is misrepresentation of the data.

Regarding abuse of context — writing off a spike in cases due to a spike in an isolated population such as a factory or nursing home is missing the broader potential social impact of that spike. Take a spike in cases at a meat-processing plant as an example. Those workers may drive a one-day spike in the data, but those workers have families. They carry the virus home to their families, and those family members can not only contract the virus but also spread it to a wider population. Assuming that a spike in that isolation population is not representative of an elevated level of risk for the population at large is an abuse of context.

What would you say to those who claim that COVID-19 is no more deadly than the seasonal flu?

I would say that the numbers simply do not bear this out to be true at this time. The CDC tracks the flu each year and estimated that the last flu season included an estimated 35.5 million people getting sick with influenza and 34,200 deaths from influenza. This indicates that there was a mortality rate of 0.097 percent. For perspective, 0.1% is a trusted estimate of average flu mortality rates.

What we know as of today is that COVID-19 in the United States has claimed 102,806 lives out of the 1.7 million who have been confirmed to have the virus. By these numbers, COVID-19 therefore has shown a mortality rate of 5.8% in confirmed cases. Now many experts agree that the count of positive cases is a gross underestimate and therefore the mortality rate is lower. Recent studies have estimated that only 1 in 10, or 1 in 12, COVID-19 cases are reported and therefore the death is closer of 0.6-0.4%. The CDC is now promoting a revised value of 0.4%. However, even at this reduced rate COVID-19 would be 4 times as lethal as the flu. COVID-19 also has a transmission rate that is higher than the flu. Current estimates show that an individual with COVID-19 will transmit the disease to 2 - 2.5 people on average, while with the flu that number is 0.9 - 2.1. Therefore, not only is the virus more deadly, but it is easier to spread.

There is one last difference between the flu and COVID-19 and that is the availability of a vaccine. While the flu vaccine may indeed blunt the impact of the flu, and if one is developed it will likely do the same to COVID-19, it remains to be seen if it would reduce the mortality rate. But even more importantly, we have to consider the world we live in today and that is a world with no COVID-19 vaccine.

Do you regret leaving the medical field during a time like this?

Personally, I have no regrets no longer working in the medical sciences. I have the utmost respect and admiration for those on the front lines today. We all owe them a debt of gratitude. I hope that some day I may give back to my community in a larger way, but there are many ways to live a life of service here.

Comments

No comments on this story | Please log in to comment by clicking here
Please log in or register to add your comment