This is a guide to programatically load a python pandas DataFrame (python 3) into a Google Sheets spreadsheet.
Google Sheets is a useful tool to use in the daily tasks of a data analyst. If you have small to medium sized data, Google sheets can be the perfect store, particularly if you need to collaborate and the data needs to be updated regularly.
This particular application is based on the gspread package which allows you to use python to manipulate Google Sheets in certain ways.
To use gspread to edit Google Sheets programatically we will need to set up the appropriate permissions in the Google account you wish to use. Along with this we will need to obtain the service account key. Luckily the gspread package has quite detailed documentation which walks through setting up and obtaining what is required.
Once things have been setup the loadToGoogleSheets function can be used to load data with a small change and understanding the parameters:
Small Change
credentials_file = 'AccessKey.json'
swap ‘AccessKey.json’ with the file path to the service account key you have.
Understanding Parameters
loadToGoogleSheets(some_dataframe, spreadsheet_id, worksheetNumber = 0)
some_dataframe is a pandas dataframe
spreadsheet_id is the id of the spreadsheet you want to load the data into. For example if the Google sheet you want to load data into has the url https://docs.google.com/spreadsheets/d/1uk-e76oq-C6QdFXIqET340C7kzXk6t0xPzWSt1JMRsM/edit#gid=0 then the id of that Google sheet is 1uk-e76oq-C6QdFXIqET340C7kzXk6t0xPzWSt1JMRsM.
worksheetNumber is the inex sheet number to load into. The first worksheet is 0. If the index is higher than the number of worksheets in the Google sheet there will be an error.
Assuming set up was correct and the function has been loaded the below will load a pandas dataframe called ‘data’ into the google sheet with id 1uk-e76oq-C6QdFXIqET340C7kzXk6t0xPzWSt1JMRsM in the 2nd worksheet.
loadToGoogleSheets(data, '1uk-e76oq-C6QdFXIqET340C7kzXk6t0xPzWSt1JMRsM', 1)
Depending on the size of the data this should load the data into the Google Sheet within a minute or so. I’m not sure on the limits of Google sheets exactly but I have used this to load 50k rows with 15 columns of data.
The full function in all its glory:
def loadToGoogleSheets(some_dataframe, spreadsheet_id, worksheetNumber = 0):
"""loads a csv file into a specified Tab in a specified Google Spreadsheet.
Can load data up to 21 columns"""
import gspread
from oauth2client.service_account import ServiceAccountCredentials
scope = ['https://spreadsheets.google.com/feeds']
credentials_file = 'AccessKey.json'
credentials = ServiceAccountCredentials.from_json_keyfile_name(credentials_file, scope)
gc = gspread.authorize(credentials)
sht = gc.open_by_key(spreadsheet_id)
worksheet = sht.get_worksheet(worksheetNumber)
stringList = ["A2:A", "B2:B", "C2:C", "D2:D", "E2:E", "F2:F", "G2:G", "H2:H", "I2:I", "J2:J", "K2:K", "L2:L", "M2:M", "N2:N", "O2:O", "P2:P", "Q2:Q", "R2:R", "S2:S", "T2:T", "U2:U", "V2:V", "W2:W", "X2:X", "Y2:Y", "Z2:Z"]
colNum = len(some_dataframe.columns)
stringList = stringList[0:colNum]
# add column names
try:
for i in range(colNum):
worksheet.update_cell(1, i+1, some_dataframe.columns[i])
except:
print("Failed to write in column names")
# add data
try:
maxRow = len(some_dataframe) + 1
worksheet.resize(rows = maxRow + 1000, cols = 30)
for i in range(colNum):
myString = stringList[i] + str(maxRow)
cell_list = worksheet.range(myString)
for x in range(len(cell_list)):
cell_list[x].value = some_dataframe.iloc[x, i]
worksheet.update_cells(cell_list)
except:
print("Failed to write data")
return None
This could be improved in the following ways:
- Allow Google sheets document name to be passed instead of spreasheet id.
- Create a new tab if index passed is above the number of worksheets in Google sheet.
- Check types of parameters passed.