Skip to content

Instantly share code, notes, and snippets.

@dmsnell
Created March 27, 2025 18:38
Show Gist options
  • Select an option

  • Save dmsnell/61ba0e36a0c75a3aa38ed86dd38cdb7e to your computer and use it in GitHub Desktop.

Select an option

Save dmsnell/61ba0e36a0c75a3aa38ed86dd38cdb7e to your computer and use it in GitHub Desktop.
Extract the first table from an HTML document into a CSV using WordPress’ HTML API
<?php
// Ensure you load the HTML API
// - require_once __DIR__ . '/wp-load.php';
// - Manual requires
// - PHP extension
// - any other approach
// Extracts the first table in the HTML, if one is found.
// Usage: cat water-report.html | php table-to-csv.php
$html = file_get_contents( 'php://stdin' );
$p = WP_HTML_Processor::create_fragment( $html );
// Find the table
$p->next_tag( 'TABLE' );
while ( $p->next_tag( 'TR' ) ) {
$col = 0;
while ( $p->next_token() ) {
$token_name = $p->get_token_name();
if ( 'TABLE' === $token_name ) {
exit(0);
}
if ( 'TR' === $token_name ) {
break;
}
if ( 'TD' !== $token_name && 'TH' !== $token_name ) {
continue;
}
$cell = '';
if ( $col++ > 0 ) {
echo ",";
}
while ( $p->next_token() && $token_name !== $p->get_token_name() ) {
switch ( $p->get_token_name() ) {
case '#text':
$cell .= $p->get_modifiable_text();
break;
}
}
$cell = trim( str_replace( "\n", '', $cell ) );
if ( str_contains( $cell, ',' ) ) {
echo '"' . str_replace( '"', '""', $cell ) . '"';
} else {
echo $cell;
}
}
echo "\n";
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment