아래 자료는 황장군님의 강의자료를 GCP에서 테스트한 결과입니다.
- fluntd (streaming)
- embulk (batch) http://www.embulk.org/docs/
embulk를 리눅스에 설치해보자. jar를 copy 하면 됨
curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.embulk.org/embulk-latest.jar"| import pandas as pd | |
| vtx=pd.read_csv("vtx.tsv", sep=" ") | |
| edge=pd.read_csv("edge.tsv", sep=" ") | |
| edge = edge[edge[':START_ID'].isin(vtx["id:ID"])] | |
| edge = edge[edge[':END_ID'].isin(vtx["id:ID"])] | |
| edge.to_csv("edge2.tsv", sep=" ", index=False, header = True) | |
| # https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf |
| import scala.util.parsing.json._ | |
| val result = JSON.parseFull(""" | |
| {"name": "Naoki", "lang": ["Java", "Scala"]} | |
| """) | |
| result match { | |
| case Some(e) => println(e) // => Map(name -> Naoki, lang -> List(Java, Scala)) | |
| case None => println("Failed.") | |
| } |
| <h3> <p [ngStyle]="{backgroundColor:getColor()}" [ngClass]="{online: 'online'==='online'}">Oozie Workflow Maker </p></h3> | |
| #!/bin/bash | |
| sudo apt-get update | |
| sudo apt-get --fix-missing install python-mpltoolkits.basemap python-numpy python-matplotlib |
| # gcloud beta pubsub topics create sanidego | |
| # gcloud beta pubsub topics publish sandiego "hello" | |
| from google.cloud import pubsub | |
| client = pubsub.Client() | |
| topic = client.topic("sandiego") | |
| topic.create() |
| datalab create mydatalabvm --zone us-central1-b |
| Compute Engine: https://cloud.google.com/compute/ | |
| Storage: https://cloud.google.com/storage/ | |
| Pricing: https://cloud.google.com/pricing/ | |
| Cloud Launcher: https://cloud.google.com/launcher/ | |
| Pricing Philosophy: https://cloud.google.com/pricing/philosophy/ |
| gcloud dataproc clusters create <NAME-OF-YOUR-CLUSTER> --subnet default --zone us-central1-b --master-machine-type n1-standard-2 --master-boot-disk-size 500 --num-workers 2 --worker-machine-type n1-standard-2 --worker-boot-disk-size 500 --project <YOUR-PROJECT-ID> |
| # RULE | |
| # It is a kind of Column Store | |
| # Avoid using * (star) to return all columns, instead use preview | |
| # Check the amount of the processing size by changing query) (500 MB, 1T..) | |
| # Always with LIMIT | |
| # format converts number into string (cannot add) | |
| # cannot use aliased column in where clause like income | |
| # StandardSQL or legacySQL https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql | |
| #standardSQL |
아래 자료는 황장군님의 강의자료를 GCP에서 테스트한 결과입니다.
embulk를 리눅스에 설치해보자. jar를 copy 하면 됨
curl --create-dirs -o ~/.embulk/bin/embulk -L "https://dl.embulk.org/embulk-latest.jar"