Skip to content

docs: modernize multimodal tutorials and migrate legacy blob APIs#16918

Open
shuoweil wants to merge 2 commits intomainfrom
shuowei-deprecate-blob-notebook-updates
Open

docs: modernize multimodal tutorials and migrate legacy blob APIs#16918
shuoweil wants to merge 2 commits intomainfrom
shuowei-deprecate-blob-notebook-updates

Conversation

@shuoweil
Copy link
Copy Markdown
Contributor

@shuoweil shuoweil commented May 1, 2026

Overview
This PR modernizes the tutorial notebooks across the BigFrames ecosystem to align with the latest public API state and successfully completes the migration path away from the deprecated internal .blob accessors and private loading mechanisms (e.g. _from_glob_path).

The core objective is ensuring user-facing documentation perfectly tracks active package behavior without invoking retirement warnings or triggering internal attribute exceptions.

Key Changes

  • Explicit URL Mapping: Replaced implicit complex type conversions inside low-level AI function calls with deterministic, user-facing patterns utilizing bbq.obj.get_access_url.
  • Tutorial Stabilizations: Corrected a regression chain in ai_movie_poster.ipynb wherein cascading dataframe types were left in raw struct states, potentially breaking downstream filters. All extractions are now strictly string-casted at generation time.
  • Infrastructure Cleanup: Successfully replaced legacy references to _from_glob_path and other private ingestion hooks with proper bbq.obj.make_ref and standard read_gbq() patterns.
  • Kaggle Integrity: Wrapped authentication segments in Kaggle notebooks to prevent local import collisions and added missing output formatting.

Fixes #< 478952827 > 🦕

@shuoweil shuoweil requested a review from GarrettWu May 1, 2026 23:23
@shuoweil shuoweil self-assigned this May 1, 2026
@shuoweil shuoweil requested review from a team as code owners May 1, 2026 23:23
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates several Jupyter notebooks to demonstrate BigQuery DataFrames features, including the anywidget display mode, generative AI functions for image analysis, and vector search capabilities. Feedback focuses on fixing a broken dependency in the vector search notebook caused by a removed cell, correcting a variable name typo, avoiding bare exception blocks, and replacing hardcoded internal project IDs with placeholders to ensure the tutorials are functional and portable for public users.

"cell_type": "code",
"execution_count": null,
"source": [
"# Code calling .blob.audio_transcribe() was removed to satisfy the goal of removing public Blob APIs."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Removing the code that populates flattened['Transcription'] while keeping subsequent cells that depend on this column (e.g., lines 574, 608, 639) will cause the notebook to fail with a KeyError. The legacy .blob call should be replaced with a modern equivalent (e.g., using bbq.ai.generate with multimodal support) rather than simply being removed to ensure the tutorial remains functional.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reimplemented the broken audio transcription workflow using modern bbq.obj APIs and bbq.ai.generate to properly populate flattened["Transcription"] with the exactly-matched output struct (content & status) required by downstream evaluation cells.

"source": [
"import bigframes.pandas as bpd\n",
"\n",
"MY_RPOJECT_ID = \"bigframes-dev\" # @param {type:\"string\"}\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in the variable name MY_RPOJECT_ID. It should be MY_PROJECT_ID. This variable is also used in subsequent cells (e.g., lines 115 and 233), which should also be updated to maintain consistency.

" if str(uri).lower().endswith((\".png\", \".jpg\", \".jpeg\", \".webp\")):\n",
" return f'<img src=\"{url}\" width=\"{width}\">'\n",
" return f'<a href=\"{url}\" target=\"_blank\">{uri}</a>'\n",
" except: return \"Format Error\"\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Avoid using a bare except: block. It is better to catch specific exceptions (such as json.JSONDecodeError or KeyError) or at least Exception to avoid catching system-level signals like KeyboardInterrupt.

        except Exception: return \"Format Error\"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote the cell formatter exception block from a raw except: to except Exception: to adhere to best practice

" SELECT\n",
" AI.GENERATE(\n",
" prompt=>(\"Extract the values.\", OBJ.GET_ACCESS_URL(OBJ.FETCH_METADATA(OBJ.MAKE_REF(gcs_path, \"us.conn\")), \"r\")),\n",
" connection_id=>\"bigframes-dev.us.bigframes-default-connection\",\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The connection_id is hardcoded to an internal project (bigframes-dev). For a public tutorial, this should be a placeholder or use a variable so that users can configure it for their own environment.

"bpd.options.bigquery.location = \"US\"\n",
"\n",
"# Set to your GCP project ID.\n",
"bpd.options.bigquery.project = \"swast-scratch\""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The project ID is hardcoded as "swast-scratch". In a tutorial notebook, it is best practice to use a placeholder or a variable that prompts the user for their own project ID.

"import bigframes.bigquery as bbq\n",
"\n",
"vector_search_results = bbq.vector_search(\n",
" base_table=f\"swast-scratch.scipy2025.national_jukebox\",\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The base_table parameter uses a hardcoded project and dataset (swast-scratch.scipy2025). This should be updated to use the project ID variable configured earlier in the notebook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant