A code snippet showing how to use SQL to query a pandas dataframe using duckdb. See this post by duckdb.
SQL logic, functions and syntax fit my brain better than the pandas API. This means I reach for SQL to analyse and manipulate data first.
Pairing pandas ability to get a variety of data into a dataframe, with the ability to use SQL against that data is a powerful combination.
# to install: pip install pandas duckdb
import pandas as pd
import duckdb
# read the iris.csv file into a pandas dataframe
data = pd.read_csv('iris.csv')
# create a query string
query_01 = '''
SELECT
sepal_length,
sepal_width,
petal_length,
FROM data
WHERE
petal_width > 1
'''
# run query against dataframe, and return results as a dataframe
query_result_dataframe = duckdb.query(query_01).to_df()
# show first 10 rows of results
query_result_dataframe.head(10)