Author: Freeman Chen, Amy,Ahbi,…

Date: 2024-03-02


Abstract

THis paper demonstrat the process of building a llm process pipline. the process include extract data from api, data cleaning, upload data to google cloud service (GCS), and extract data from gcs, aggregate data with apache spark, upload data into mongodb atlas, retrieve news data from atlas convert to spark dataset, analysis data with llm model, Compare the number of clickbait political articles being published daily betweeen different news sources and generage a weekly plot. apply airflow to automate every process and updatge every day. in this paper, we will discuss the method and the stragdgey incldunig how we handle the text data and how we aggregate the data…

  • Data Process and Cleaning
  • Random forest model and Xgboost
  • fine -tune