Created
November 14, 2024 06:04
-
-
Save lalitsingh24x7/2ccc1a813e6556508d4d0b878092ada1 to your computer and use it in GitHub Desktop.
partitionBy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| 1. Partitioned Writes: | |
| # Write the DataFrame partitioned by Product and Date directly to S3 | |
| df.write.mode("overwrite").partitionBy("Product", "Date").csv(output_s3_base_path, header=True) | |
| <<< | |
| # Select relevant columns, keeping Product as a column but partitioning by Date only | |
| df = df.select("Product", "Date", "Amount") | |
| # Write the DataFrame partitioned only by Date | |
| df.write.mode("overwrite").partitionBy("Date").csv(output_s3_base_path, header=True) | |
| >>> | |
| 2. Repartitioning for Parallelism: | |
| # Repartition by Product and Date to ensure parallel processing | |
| repartitioned_df = df.repartition("Product", "Date") | |
| # Write the repartitioned DataFrame | |
| repartitioned_df.write.mode("overwrite").partitionBy("Product", "Date").csv(output_s3_base_path, header=True) | |
| 3: Dynamic Frame Conversion (AWS Glue): | |
| from awsglue.dynamicframe import DynamicFrame | |
| # Convert the DataFrame to a DynamicFrame | |
| dynamic_df = DynamicFrame.fromDF(df, glueContext, "dynamic_df") | |
| # Write the DynamicFrame partitioned by Product and Date | |
| glueContext.write_dynamic_frame.from_options( | |
| frame=dynamic_df, | |
| connection_type="s3", | |
| connection_options={ | |
| "path": output_s3_base_path, | |
| "partitionKeys": ["Product", "Date"] | |
| }, | |
| format="csv" | |
| ) | |
| x | |
| 4: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment