Skip to content

Instantly share code, notes, and snippets.

@thanandorn
Created April 29, 2021 11:10
Show Gist options
  • Save thanandorn/99f341e737d6afb6e0df44d3ccad4ebf to your computer and use it in GitHub Desktop.
Save thanandorn/99f341e737d6afb6e0df44d3ccad4ebf to your computer and use it in GitHub Desktop.
ReadGzippedCSVonDF
COLUMNS = [ 'a', 'b', 'c' ]
with beam.Pipeline(options=PipelineOptions()) as p:
csv_pc = ( p | 'ReadCSVFromGCS' >> beam.io.ReadFromText(file_pattern='./data/filename.csv.gz', compression_type='gzip')
| 'SplitData' >> beam.Map(lambda x: x.split(','))
| 'ConvertToDict' >> beam.Map(lambda x: dict(zip(COLUMNS, x)))
| 'Print' >> beam.Map(print))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment