Created
April 17, 2022 04:22
-
-
Save rodrigopedra/3fc18f8af4bb0ff49f760b2e2b278061 to your computer and use it in GitHub Desktop.
PHP import CSV
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <?php | |
| $file = fopen('php://output', 'w'); | |
| fputcsv($file, ['ID', 'NAME', 'AGE']); | |
| for ($index = 0; $index < 21_000_000; $index++) { | |
| fputcsv($file, [$index + 1, bin2hex(random_bytes(20)), random_int(18, 65)]); | |
| } | |
| fclose($file); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <?php | |
| define('BUFFER_SIZE', 500); | |
| $pdo = new PDO('mysql:host=127.0.0.1;dbname=dummy;charset=utf8mb4', 'root', 'password', [ | |
| PDO::ATTR_EMULATE_PREPARES => false, | |
| PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, | |
| ]); | |
| $pdo->exec('DROP TABLE IF EXISTS `import`'); | |
| $pdo->exec('CREATE TABLE `import` ( | |
| `id` BIGINT UNSIGNED NOT NULL, | |
| `name` VARCHAR(100) NOT NULL, | |
| `age` INT UNSIGNED NOT NULL | |
| )'); | |
| $statement = $pdo->prepare('INSERT INTO `import`(`id`, `name`, `age`) VALUES ' . implode(',', array_fill(0, BUFFER_SIZE, '(?, ?, ?)'))); | |
| echo date('Y-m-d H:i:s'), PHP_EOL; | |
| $file = fopen($argv[1], 'r'); | |
| fgetcsv($file, 100); // skip first line | |
| $buffer = []; | |
| while (!feof($file)) { | |
| $fields = fgetcsv($file, 100); | |
| if ($fields === false) { | |
| continue; | |
| } | |
| $buffer[] = $fields; | |
| if (count($buffer) === BUFFER_SIZE) { | |
| $statement->execute(array_merge(...$buffer)); | |
| $buffer = []; | |
| } | |
| } | |
| fclose($file); | |
| $remaining = count($buffer); | |
| if ($remaining > 0) { | |
| $statement = $pdo->prepare('INSERT INTO `import`(`id`, `name`, `age`) VALUES ' . implode(',', array_fill(0, $remaining, '(?, ?, ?)'))); | |
| $statement->execute(array_merge(...$buffer)); | |
| } | |
| echo date('Y-m-d H:i:s'), PHP_EOL; | |
| $value = $pdo->query('SELECT COUNT(*) FROM `import`')->fetchColumn(); | |
| echo number_format($value), PHP_EOL; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| <?php | |
| $pdo = new PDO('mysql:host=127.0.0.1;dbname=dummy;charset=utf8mb4', 'root', 'password', [ | |
| PDO::ATTR_EMULATE_PREPARES => false, | |
| PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION, | |
| ]); | |
| $pdo->exec('DROP TABLE IF EXISTS `import`'); | |
| $pdo->exec('CREATE TABLE `import` ( | |
| `id` BIGINT UNSIGNED NOT NULL, | |
| `name` VARCHAR(100) NOT NULL, | |
| `age` INT UNSIGNED NOT NULL | |
| )'); | |
| $statement = $pdo->prepare('INSERT INTO `import`(`id`, `name`, `age`) VALUES (:id, :name, :age)'); | |
| echo date('Y-m-d H:i:s'), PHP_EOL; | |
| $file = fopen($argv[1], 'r'); | |
| fgetcsv($file, 100); // skip first line | |
| while (!feof($file)) { | |
| $fields = fgetcsv($file, 100); | |
| if ($fields === false) { | |
| continue; | |
| } | |
| $statement->execute([ | |
| ':id' => $fields[0], | |
| ':name' => $fields[1], | |
| ':age' => $fields[2], | |
| ]); | |
| } | |
| fclose($file); | |
| echo date('Y-m-d H:i:s'), PHP_EOL; | |
| $value = $pdo->query('SELECT COUNT(*) FROM `import`')->fetchColumn(); | |
| echo number_format($value), PHP_EOL; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| $ php generate.php > data.csv | |
| $ ls -lh data.csv | |
| -rw-r--r-- 1 rodrigo rodrigo 1,1G abr 17 01:02 data.csv | |
| $ php import-linear.php data.csv | |
| 2022-04-17 01:02:50 | |
| 2022-04-17 01:17:00 | |
| 21,000,000 | |
| $ php import-buffered.php data.csv | |
| 2022-04-17 01:17:16 | |
| 2022-04-17 01:19:12 | |
| 21,000,000 |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@jamesmills a final disclaimer:
Although I spent some years (about 7 years) dealing with large datasets, I worked more on the ETL and analysis side of the job, and not the database management.
Of course along the years you learn a trick or two when dealing over and over with the same problem set. But the options I gave above, specially on the second comment on your additional questions, might no be the best solution.
Those are strategies I would do, from my experience, but if my team had an expert on database management, I would talk to them first to assess if this is the best strategy.
So if anyone else reading this knows better, please share in the comments how would you handle these. Thanks =)