Skip to content

Instantly share code, notes, and snippets.

@rodrigopedra
Created April 17, 2022 04:22
Show Gist options
  • Select an option

  • Save rodrigopedra/3fc18f8af4bb0ff49f760b2e2b278061 to your computer and use it in GitHub Desktop.

Select an option

Save rodrigopedra/3fc18f8af4bb0ff49f760b2e2b278061 to your computer and use it in GitHub Desktop.
PHP import CSV
<?php
$file = fopen('php://output', 'w');
fputcsv($file, ['ID', 'NAME', 'AGE']);
for ($index = 0; $index < 21_000_000; $index++) {
fputcsv($file, [$index + 1, bin2hex(random_bytes(20)), random_int(18, 65)]);
}
fclose($file);
<?php
define('BUFFER_SIZE', 500);
$pdo = new PDO('mysql:host=127.0.0.1;dbname=dummy;charset=utf8mb4', 'root', 'password', [
PDO::ATTR_EMULATE_PREPARES => false,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
]);
$pdo->exec('DROP TABLE IF EXISTS `import`');
$pdo->exec('CREATE TABLE `import` (
`id` BIGINT UNSIGNED NOT NULL,
`name` VARCHAR(100) NOT NULL,
`age` INT UNSIGNED NOT NULL
)');
$statement = $pdo->prepare('INSERT INTO `import`(`id`, `name`, `age`) VALUES ' . implode(',', array_fill(0, BUFFER_SIZE, '(?, ?, ?)')));
echo date('Y-m-d H:i:s'), PHP_EOL;
$file = fopen($argv[1], 'r');
fgetcsv($file, 100); // skip first line
$buffer = [];
while (!feof($file)) {
$fields = fgetcsv($file, 100);
if ($fields === false) {
continue;
}
$buffer[] = $fields;
if (count($buffer) === BUFFER_SIZE) {
$statement->execute(array_merge(...$buffer));
$buffer = [];
}
}
fclose($file);
$remaining = count($buffer);
if ($remaining > 0) {
$statement = $pdo->prepare('INSERT INTO `import`(`id`, `name`, `age`) VALUES ' . implode(',', array_fill(0, $remaining, '(?, ?, ?)')));
$statement->execute(array_merge(...$buffer));
}
echo date('Y-m-d H:i:s'), PHP_EOL;
$value = $pdo->query('SELECT COUNT(*) FROM `import`')->fetchColumn();
echo number_format($value), PHP_EOL;
<?php
$pdo = new PDO('mysql:host=127.0.0.1;dbname=dummy;charset=utf8mb4', 'root', 'password', [
PDO::ATTR_EMULATE_PREPARES => false,
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
]);
$pdo->exec('DROP TABLE IF EXISTS `import`');
$pdo->exec('CREATE TABLE `import` (
`id` BIGINT UNSIGNED NOT NULL,
`name` VARCHAR(100) NOT NULL,
`age` INT UNSIGNED NOT NULL
)');
$statement = $pdo->prepare('INSERT INTO `import`(`id`, `name`, `age`) VALUES (:id, :name, :age)');
echo date('Y-m-d H:i:s'), PHP_EOL;
$file = fopen($argv[1], 'r');
fgetcsv($file, 100); // skip first line
while (!feof($file)) {
$fields = fgetcsv($file, 100);
if ($fields === false) {
continue;
}
$statement->execute([
':id' => $fields[0],
':name' => $fields[1],
':age' => $fields[2],
]);
}
fclose($file);
echo date('Y-m-d H:i:s'), PHP_EOL;
$value = $pdo->query('SELECT COUNT(*) FROM `import`')->fetchColumn();
echo number_format($value), PHP_EOL;
$ php generate.php > data.csv
$ ls -lh data.csv
-rw-r--r-- 1 rodrigo rodrigo 1,1G abr 17 01:02 data.csv
$ php import-linear.php data.csv
2022-04-17 01:02:50
2022-04-17 01:17:00
21,000,000
$ php import-buffered.php data.csv
2022-04-17 01:17:16
2022-04-17 01:19:12
21,000,000
@rodrigopedra
Copy link
Author

@jamesmills a final disclaimer:

Although I spent some years (about 7 years) dealing with large datasets, I worked more on the ETL and analysis side of the job, and not the database management.

Of course along the years you learn a trick or two when dealing over and over with the same problem set. But the options I gave above, specially on the second comment on your additional questions, might no be the best solution.

Those are strategies I would do, from my experience, but if my team had an expert on database management, I would talk to them first to assess if this is the best strategy.

So if anyone else reading this knows better, please share in the comments how would you handle these. Thanks =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment