Using SQL in pandas with duckdb

A code snippet showing how to use SQL to query a pandas dataframe using duckdb. See this post by duckdb.

SQL logic, functions and syntax fit my brain better than the pandas API. This means I reach for SQL to analyse and manipulate data first.

Pairing pandas ability to get a variety of data into a dataframe, with the ability to use SQL against that data is a powerful combination.

# to install: pip install pandas duckdb
import pandas as pd
import duckdb

# read the iris.csv file into a pandas dataframe
data = pd.read_csv('iris.csv')

# create a query string
query_01 = '''
SELECT 
	sepal_length,
    sepal_width,
    petal_length,
FROM data
WHERE
	petal_width > 1
'''

# run query against dataframe, and return results as a dataframe
query_result_dataframe = duckdb.query(query_01).to_df()

# show first 10 rows of results
query_result_dataframe.head(10)

Send a Comment

Your email address will not be published.