Useful scripts - csvfmt, xmlfmt, jsonfmt
This is the third post of a series describing simple scripts that I wrote to ease my life as a programmer.
In this post, I'll describe 3 scripts to "pretty print" some common file types, to improve readability: csvfmt
, xmlfmt
and jsonfmt
.
csvfmt
takes a CSV
("Comma Separated Values") file from stdin
, parses it and pretty print each record as a Python dictionary.
#!/usr/bin/env python import csv import sys import pprint for row in csv.DictReader(sys.stdin): pprint.pprint(row)
Output looks like this:
% echo 'a,b,c 1,2,3 4,5,6 ' | csvfmt {'a': '1', 'b': '2', 'c': '3'} {'a': '4', 'b': '5', 'c': '6'}
xmlfmt
takes an XML
file from either stdin
or a file (specified on the cmd line) and extracts all the text from it. This script is thought to be used to read the text embedded in XML
tags, and it's analogous to [htmlfmt=][[http://man.cat-v.org/plan_9/1/fmt][5]]. If you want to format an =XML
file, maintaining the XML
tags, use [xmllint -format=][[http://xmlsoft.org/xmllint.html][6]], or my =[xmlind][7]
(described another blog post of this series.)
#!/usr/bin/env python import xml.dom.minidom from pylib.xmlutil import getText, getInput dom = xml.dom.minidom.parse(getInput()) print(getText(dom))
For example:
% echo '<a>a text<b>b text</b>more a text</a>' | xmlfmt
a textb textmore a text
jsonfmt
takes a JSON
file from stdin
and pretty prints it as a Python object.
#!/usr/bin/env python import json import sys import pprint pprint.pprint(json.load(sys.stdin))
Try it out:
$> curl 'http://search.twitter.com/search.json?q=lorenzo' | jsonfmt {u'completed_in': 0.035, u'max_id': 267982040698351617L, u'max_id_str': u'267982040698351617', u'next_page': u'?page=2&max_id=267982040698351617&q=lorenzo', u'page': 1, u'query': u'lorenzo', u'refresh_url': u'?since_id=267982040698351617&q=lorenzo', u'results': [{u'created_at': u'Mon, 12 Nov 2012 13:27:52 +0000', u'from_user': u'michael_174', u'from_user_id': 234373960, u'from_user_id_str': u'234373960', u'from_user_name': u'Michael Adhiyatama', u'geo': None, u'id': 267982040698351617L, u'id_str': u'267982040698351617', u'iso_language_code': u'in', etc. etc.
All three scripts are written in Python and available here.