tldr/pages/common/pup.md

29 lines
617 B
Markdown
Raw Normal View History

2017-10-08 17:00:15 +01:00
# pup
> Command-line HTML parsing tool.
2019-05-29 14:53:56 +01:00
> More information: <https://github.com/ericchiang/pup>.
2017-10-08 17:00:15 +01:00
2017-10-11 00:00:49 +01:00
- Transform a raw HTML file into a cleaned, indented, and colored format:
2017-10-08 17:00:15 +01:00
`cat {{index.html}} | pup --color`
- Filter HTML by element tag name:
2017-10-11 13:17:46 +01:00
`cat {{index.html}} | pup '{{tag}}'`
2017-10-08 17:00:15 +01:00
- Filter HTML by ID:
2017-10-08 17:00:15 +01:00
2017-10-11 00:00:49 +01:00
`cat {{index.html}} | pup '{{div#id}}'`
2017-10-08 17:00:15 +01:00
- Filter HTML by attribute value:
2017-10-11 00:05:27 +01:00
`cat {{index.html}} | pup '{{input[type="text"]}}'`
2017-10-08 17:00:15 +01:00
2017-10-11 00:00:49 +01:00
- Print all text from the filtered HTML elements and their children:
2017-10-08 17:00:15 +01:00
2017-10-11 00:00:49 +01:00
`cat {{index.html}} | pup '{{div}} text{}'`
2017-10-08 17:00:15 +01:00
- Print HTML as JSON:
2017-10-11 00:00:49 +01:00
`cat {{index.html}} | pup '{{div}} json{}'`