How I found outlier values using ChatGPT

One of the first courses in my MS Data Analytics program was about learning how to clean data. There are several steps involved in cleaning data to get it ready for analysis. One of those steps is finding outlier values.

In statistics, the z-score of a value is used to determine how many standard deviations it is away from the mean. A value is considered to be an outlier if it has a z-score greater than 3, or less than -3.

All the examples in the course material showed how to create a new dataframe based on the column to be evaluated, and then find the outlier values from there. But I wanted a simpler way that didn’t require creating addtional objects. So I asked ChatGPT to create a new function. Here’s how I did it:

And here’s the initial response:

Then I told it to make a slight modification:

And here’s the final result:

You can view my final code, dataset for testing, and the ChatGPT conversation at my GitHub repo for this blogpost.

Now it’s your turn! Try using ChatGPT to create a Python function to automate a data cleaning task. Share your results in the comments below!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top