2017-10-08 17:00:15 +01:00
|
|
|
# pup
|
|
|
|
|
2021-05-20 21:13:41 +01:00
|
|
|
> Command-line HTML parsing tool.
|
2019-05-29 14:53:56 +01:00
|
|
|
> More information: <https://github.com/ericchiang/pup>.
|
2017-10-08 17:00:15 +01:00
|
|
|
|
2017-10-11 00:00:49 +01:00
|
|
|
- Transform a raw HTML file into a cleaned, indented, and colored format:
|
2017-10-08 17:00:15 +01:00
|
|
|
|
|
|
|
`cat {{index.html}} | pup --color`
|
|
|
|
|
|
|
|
- Filter HTML by element tag name:
|
|
|
|
|
2017-10-11 13:17:46 +01:00
|
|
|
`cat {{index.html}} | pup '{{tag}}'`
|
2017-10-08 17:00:15 +01:00
|
|
|
|
2024-03-14 05:01:06 +00:00
|
|
|
- Filter HTML by ID:
|
2017-10-08 17:00:15 +01:00
|
|
|
|
2017-10-11 00:00:49 +01:00
|
|
|
`cat {{index.html}} | pup '{{div#id}}'`
|
2017-10-08 17:00:15 +01:00
|
|
|
|
|
|
|
- Filter HTML by attribute value:
|
|
|
|
|
2017-10-11 00:05:27 +01:00
|
|
|
`cat {{index.html}} | pup '{{input[type="text"]}}'`
|
2017-10-08 17:00:15 +01:00
|
|
|
|
2017-10-11 00:00:49 +01:00
|
|
|
- Print all text from the filtered HTML elements and their children:
|
2017-10-08 17:00:15 +01:00
|
|
|
|
2017-10-11 00:00:49 +01:00
|
|
|
`cat {{index.html}} | pup '{{div}} text{}'`
|
2017-10-08 17:00:15 +01:00
|
|
|
|
|
|
|
- Print HTML as JSON:
|
|
|
|
|
2017-10-11 00:00:49 +01:00
|
|
|
`cat {{index.html}} | pup '{{div}} json{}'`
|