
Data Cleaning & Pre-processing - Medicare Claims
The datasets is provided by the National Cardiovascular Disease Surveillance System. Its integrated from multiple indicators from many data sources to provide a comprehensive picture of the public health burden of CVDs and associated risk factors in the United States. Explored the data for anomalies in order to prepare it for analysis. Using the seaborn library, I visualized the data to find the missing value. Identified and de-duplicated the data to improve accuracy and efficiency. Analyzed the data to identify numerator drop, explored the states and the claim associated with it. Identified and handled missing values successfully using pandas and the numpy package. The data was also grouped to split, apply, and combine based on category buckets for analysis and exported to an excel file. (Python libraries used: Pandas, NumPy, Seaborn)
Learn more