사용자가 특정 단어를 검색한 다음 해당 단어가 포함된 모든 웹사이트를 표시하는 JSON 페이지에서 URL을 어떻게 얻을 수 있나요?

사용자가 특정 단어를 검색한 다음 해당 단어가 포함된 모든 웹사이트를 표시하는 JSON 페이지에서 URL을 어떻게 얻을 수 있나요?

특정 웹페이지의 현재 URL을 반환하는 bash 셸의 스크립트를 사용하려고 합니다... 내가 가진 것은 모든 URL을 반환하는 스크립트이지만 원하는 링크를 코드에 넣어야 합니다. 사용자가 단어를 입력한 다음 해당 단어가 포함된 모든 URL을 반환하도록 하려고 합니다. 이렇게 ./reddit.sh Linux하면 해당 단어가 포함된 URL이 표시됩니다. 이것은 지금까지 내 코드입니다.

wget -qO- http://reddit.com/ | grep -Eo "(http|https)://[a-zA-Z0-9./?=_-]*" | sort | unique

답변1

완벽한 솔루션:

사용된 것들:세게 때리다,얻다,xmllint,sed,유형

reddit.sh스크립트:

#!/bin/bash

search_word="$1"

wget -qO - --follow-tags=a "http://reddit.com/search?q=${search_word}" \
|  xmllint --html --xpath '//a[contains(@href, "'"${search_word}"'")]' - 2>/dev/null \
| sed 's/<\/a>/&\n/g' | sort -u

용법:

$ bash reddit.sh linux

출력(축약):

<a href="https://fossbytes.com/firefox-quantum-57-is-here-to-kill-google-chrome-download-for-windows-mac-linux/" class="search-link may-blank">https://fossbytes.com/firefox-quantum-57-is-here-to-kill-google-chrome-download-for-windows-mac-linux/</a>
<a href="https://www.change.org/p/lenovo-demand-that-lenovo-provide-bios-update-to-enable-linux-installation">https://www.change.org/p/lenovo-demand-that-lenovo-provide-bios-update-to-enable-linux-installation</a>
<a href="https://www.gamingonlinux.com/articles/atari-are-launching-a-new-gaming-system-the-ataribox-and-it-runs-linux.10418" class="search-link may-blank">https://www.gamingonlinux.com/articles/atari-are-launching-a-new-gaming-system-the-ataribox-and-it-runs-linux.10418</a>
<a href="https://www.reddit.com/r/funny/comments/5xyw3c/every_time_i_try_out_linux/" data-inbound-url="/r/funny/comments/5xyw3c/every_time_i_try_out_linux/?utm_term=055776b0-02a3-4fd4-81fb-7693fb1f7a86&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=1" data-href-url="/r/funny/comments/5xyw3c/every_time_i_try_out_linux/" class="search-comments may-blank">2,315 comments</a>
<a href="https://www.reddit.com/r/funny/comments/5xyw3c/every_time_i_try_out_linux/" data-inbound-url="/r/funny/comments/5xyw3c/every_time_i_try_out_linux/?utm_term=055776b0-02a3-4fd4-81fb-7693fb1f7a86&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=1" data-href-url="/r/funny/comments/5xyw3c/every_time_i_try_out_linux/" class="search-title may-blank">Every time I try out linux</a>
<a href="https://www.reddit.com/r/funny/comments/6wdq13/20170825_happy_birthday_linux/" data-inbound-url="/r/funny/comments/6wdq13/20170825_happy_birthday_linux/?utm_term=055776b0-02a3-4fd4-81fb-7693fb1f7a86&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=14" data-href-url="/r/funny/comments/6wdq13/20170825_happy_birthday_linux/" class="search-comments may-blank">269 comments</a>
<a href="https://www.reddit.com/r/funny/comments/6wdq13/20170825_happy_birthday_linux/" data-inbound-url="/r/funny/comments/6wdq13/20170825_happy_birthday_linux/?utm_term=055776b0-02a3-4fd4-81fb-7693fb1f7a86&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=14" data-href-url="/r/funny/comments/6wdq13/20170825_happy_birthday_linux/" class="search-title may-blank">20170825: Happy Birthday Linux</a>
...

추가 테스트 사례를 보려면 다음을 검색하세요 python.

$ bash reddit.sh python

출력(축약):

<a href="https://developers.slashdot.org/story/17/12/15/1133217/microsoft-considers-adding-python-as-an-official-scripting-language-in-excel" class="search-link may-blank">https://developers.slashdot.org/story/17/12/15/1133217/microsoft-considers-adding-python-as-an-official-scripting-language-in-excel</a>
<a href="https://www.reddit.com/r/ATBGE/comments/7bjnxs/check_out_this_python/" data-inbound-url="/r/ATBGE/comments/7bjnxs/check_out_this_python/?utm_term=02b9b18c-b9c1-42d4-8718-7f5c74d03b90&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=7" data-href-url="/r/ATBGE/comments/7bjnxs/check_out_this_python/" class="search-comments may-blank">302 comments</a>
<a href="https://www.reddit.com/r/ATBGE/comments/7bjnxs/check_out_this_python/" data-inbound-url="/r/ATBGE/comments/7bjnxs/check_out_this_python/?utm_term=02b9b18c-b9c1-42d4-8718-7f5c74d03b90&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=7" data-href-url="/r/ATBGE/comments/7bjnxs/check_out_this_python/" class="search-title may-blank">Check out this python!</a>
<a href="https://www.reddit.com/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/" data-inbound-url="/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/?utm_term=02b9b18c-b9c1-42d4-8718-7f5c74d03b90&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=8" data-href-url="/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/" class="search-comments may-blank">1,364 comments</a>
<a href="https://www.reddit.com/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/" data-inbound-url="/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/?utm_term=02b9b18c-b9c1-42d4-8718-7f5c74d03b90&amp;utm_medium=search&amp;utm_source=reddit&amp;utm_name=frontpage&amp;utm_content=8" data-href-url="/r/funny/comments/5haxy5/monty_python_life_of_brian_is_still_relevant_today/" class="search-title may-blank">Monty Python Life Of Brian is still relevant today</a>
...

답변2

비슷한 것을 시도해 보셨나요?

target="reddit"; wget -qO- http://reddit.com/ | grep -Po "http.*?(?=\")" | grep -i $target | sort | uniq

편집하다:@RomanPerekhrest와 동일한 라인을 따라 확장

target="linux"; wget -qO- "http://reddit.com/search?q=${target}" | grep -Po "http.*?(?=\")" | grep $target | sort -u

편집 편집@nxnev에 대한 여러 단어

target="arch linux"; url="http://reddit.com/search?q=$target"; search=$(echo $target | sed 's/ /|/'); wget -qO- "$url" | grep -Po "http.*?(?=\")" | grep -Eh "$search" | sort -u

답변3

Reddit의 검색 결과(URL만)를 표시하고 API를 사용하지 않으려면 다음과 같이 하면 됩니다.

reddit() {
  local 'search_term' 'user_agent'
  user_agent='your_user_agent'
  for search_term; do
    curl \
      --data-urlencode "q=${search_term}" \
      --get \
      --header "User-Agent: ${user_agent}" \
      --silent \
      "https://www.reddit.com/search" \
    | grep -P -o -e '<a [^>]*? class="search-title may-blank" >.*?<\/a>' \
    | grep -P -o -e '(?<=href=")(.*?)(?=")' \
    | tail -n '+4'
  done
}

예:

$ reddit 'arch linux'
https://www.reddit.com/r/linux/comments/6pepav/someone_got_offended_by_a_hostname_of_an/
https://www.reddit.com/r/linux/comments/6g6xsu/the_arch_linux_wiki_is_awesome_and_i_would_like/
https://www.reddit.com/r/linuxmasterrace/comments/7ikqxs/my_new_macbook_pro_has_been_made_glorious_by_the/
https://www.reddit.com/r/linux/comments/5sx15b/arch_linux_pulls_the_plug_on_32bit/
https://www.reddit.com/r/archlinux/comments/7a4sgv/almost_no_one_on_campus_got_it_but_i_dressed_up/
https://www.reddit.com/r/archlinux/comments/7blg7w/arch_linux_news_the_end_of_i686_support/
https://www.reddit.com/r/archlinux/comments/7g53jg/here_is_a_screenshot_of_a_music_player_ive_been/
https://www.reddit.com/r/thinkpad/comments/7k704w/my_beloved_x1_carbon_5th_gen_running_arch_linux/
https://www.reddit.com/r/pcmasterrace/comments/39hl6h/im_thoroughly_enjoying_arch_linux_60fps/
https://www.reddit.com/r/linux/comments/3qsmk4/twitch_installs_arch_linux_similar_to_twitch/
https://www.reddit.com/r/linuxmasterrace/comments/7aai76/i_am_using_archlinux/
https://www.reddit.com/r/archlinux/comments/7j2zhl/fully_encrypted_archlinux_with_secure_boot_on/
https://www.reddit.com/r/linuxmasterrace/comments/5dbgku/my_experience_with_arch_linux_so_far/
https://www.reddit.com/r/linux/comments/4m0r93/why_did_archlinux_embrace_systemd/
https://www.reddit.com/r/unixporn/comments/7iss7b/xfce_arch_linux_satisfaction/
https://www.reddit.com/r/archlinux/comments/5ndu7r/my_manual_to_install_arch_linux_the_minimal_way_i/
https://www.reddit.com/r/archlinux/comments/73g3vz/librem_5_will_support_arch_linux/
https://www.reddit.com/r/haskell/comments/7jyie0/the_arch_linux_community_does_not_look_very_about/
https://www.reddit.com/r/linux_gaming/comments/4xep1o/no_mans_sky_running_on_wine_in_64_bit_arch_linux/
https://www.reddit.com/r/archlinux/comments/7bjp8j/hexadecimal_arch_linux_calendar_for_2018/
https://www.reddit.com/r/linux/comments/3r1mdv/twitch_installs_arch_linux_lasts_only_a_few_hours/
https://www.reddit.com/r/archlinux/comments/7hfb9m/farch_functional_arch_linux_system_management/

관련 정보