Adding Google News Articles to a Doc and Translating Automatically
As of April 2021, I have taken on a freelance project for a consultancy that helps both Japanese corporations switch to using cage-free eggs, and cage-free egg producers in Japan increase their production capacity/acquire new customers.
As part of this project, I have been asked to gather any relevant industry news in Japanese, add it to a doc somewhere and roughly translate the content.
This job is regular, manual, and will be required in regions outside Japan - a perfect candidate for automation.
Program Features
Makes use of SerpAPI ($50 a month, up to 5,000 requests).
- Grabs 50 Google News results for a list of Japanese keywords you input - keyword list and number of results can be changed.
- Adds them to a Google Sheet in order of latest to oldest.
- Provides an automatic English translation of the results.
View program here in Colab (API key removed). Example output here in Google Sheets.
Flow
- Pull Google News results for a list of Japanese keywords using SerpAPI.
- Create a separate Pandas dataframe using the JSON from each request (while ensuring there is no mojibake from Japanese results).
- Add columns to each dataframe for automatic translation in Google Sheets.
- Create new Google Sheets doc in Google Drive, add worksheets for each keyword using the gspread library.
- Populate worksheets with dataframes using gspread_dataframe library.
Challenges
Google News provides a date value for articles less than 1 month old in the format (3 weeks ago, 2 days ago etc.)
So the results could be sorted by latest to oldest, I needed to parse these values using regex, substract the time period from the time of executing the program, and finally convert to datetime.
The function below was applied to each date value if it contained the string “ago”.
Adding translations automatically
Translations were provided using the Google Translate functionality, accessible in Google Sheets using the =GOOGLETRANSLATE() formula. Since the automation program creates a new file, whoever uses this sheet should not have to add translation cells to each worksheet manually.
This was resolved by adding the formula directly to a Pandas dataframe, including the cells that should be translated.
Adding a new worksheet for each keyword
The gspread Python library is a wrapper for the Google Sheets API. After creating a new Google Sheets file in Google Drive, I added a new worksheet to the file for each keyword to be searched in Google News.
The content of each new worksheet is then populated with the dataframe pertaining to each keyword.