Guess what happened when I asked ChatGPT to analyze the Czech Republic data?
My data extraction and summary was spot on. But ChatGPT improperly analyzed it. Here are the issues I encountered to save you some headaches if you want to use it to analyze data.
Executive summary
I posted the complete chat here.
Summary:
ChatGPT did not find any errors in my code that summarized the data using pandas group-by. Not surprising!
Data files (.csv) that are above a certain size cannot be uploaded.
There is some sort of upload byte limit PER CHAT. After that, file uploads will completely fail and ChatGPT was completely clueless as to the problem or the limit.
ChatGPT cannot report technical problems to the OpenAI ops team. This is insane.
Technical support from OpenAI on this issue was non-existent.
I was the one that pointed out that the upload problems were specific to the chat.
ChatGPT made several errors in analyzing the data from the two summary .csv files that I provided. I pointed out that the mortality rates calculated from the two different groupings should be the same and ChatGPT investigated and found out I was right and it was wrong.
The final numbers, ASMR calculations, had nonsensical units. But that wasn’t my priority in this chat since that is easily fixable later.
ChatGPT found a 2X higher ASMR for Moderna. We’ll see if that is right soon.
The bottom line is that you can’t just upload the record level data and ask it to analyze the data for you. It will mess up. But if you hand hold it through the process and point out where it made mistakes, you are left with a chat that can be replicated by anyone.
Next steps
So I’m going to have another go at this and I think the results will be epic.
The Czech Republic data is the highest quality data on vaccine safety and efficacy that is publicly available. That is precisely why nobody in mainstream science will touch the Czech Republic data with a 10-foot pole.
Fortunately, since I’m a misinformation superspreader, I have no qualms at all in analyzing data like this and sharing what I find :).
Update!
I was successful at my second attempt now that I learned the tricks. But once again, the ChatGPT made silly errors like using a standard population that didn’t add up to 100,000.
Here’s the result it got on the first go around, but we’ll see soon if this is actually accurate or not.
Summary
AI chatbots can be very useful tools in validating the analysis of big data, but it’s not yet reached a stage where it is push-button simple. You have to hand hold it at each step and check its work. It is however very useful as a double check on your own calculations!
OK, I was able to complete the analysis with ChatGPT just now. Going to bed. You'll see the full transcript tomorrow and YOU ARE GOING TO LOVE IT.
"since I’m a misinformation superspreader" I know your comment was written in a light hearted manner, but I am so thankful for independent critical thinkers like yourself who are truth seekers instead of those who follow the narratives of echo chambers.