changed contact to bad_on

This commit is contained in:
emdee 2022-11-19 10:30:22 +00:00
parent aac3793b35
commit 08626942d3
4 changed files with 489 additions and 255 deletions

121
README.md
View File

@ -1,15 +1,18 @@
This extends nusenu's basic idea of using the stem library to This extends nusenu's basic idea of using the stem library to
dynamically exclude nodes that are likely to be bad by putting them dynamically exclude nodes that are likely to be bad by putting them
on the ExcludeNodes or ExcludeExitNodes setting of a running Tor. on the ExcludeNodes or ExcludeExitNodes setting of a running Tor.
* https://github.com/nusenu/noContactInfo_Exit_Excluder * https://github.com/nusenu/noContactInfo_Exit_Excluder
* https://github.com/TheSmashy/TorExitRelayExclude * https://github.com/TheSmashy/TorExitRelayExclude
The basic cut is to exclude Exit nodes that do not have a contact. The basic idea is to exclude Exit nodes that do not have ContactInfo:
That can be extended to nodes that do not have an email in the contact etc. * https://github.com/nusenu/ContactInfo-Information-Sharing-Specification
That can be extended to relays that do not have an email in the contact,
or to relays that do not have ContactInfo that is verified to include them.
But there's a problem, and your Tor notice.log will tell you about it: But there's a problem, and your Tor notice.log will tell you about it:
you could exclude the nodes needed to access hidden services or you could exclude the relays needed to access hidden services or mirror
directorues. So we need to add to the process the concept of a whitelist. directories. So we need to add to the process the concept of a whitelist.
In addition, we may have our own blacklist of nodes we want to exclude, In addition, we may have our own blacklist of nodes we want to exclude,
or use these lists for other applications like selektor. or use these lists for other applications like selektor.
@ -30,96 +33,96 @@ BadNodes:
# $0000000000000000000000000000000000000007 # $0000000000000000000000000000000000000007
``` ```
That part requires [PyYAML](https://pyyaml.org/wiki/PyYAML) That part requires [PyYAML](https://pyyaml.org/wiki/PyYAML)
https://github.com/yaml/pyyaml/ https://github.com/yaml/pyyaml/ or ```ruamel```: do
```pip3 install ruamel``` or ```pip3 install PyYAML```;
the advantage of the former is that it preserves comments.
Right now only the ExcludeExitNodes section is used by we may add ExcludeNodes (You may have to run this as the Tor user to get RW access to
later, and by default all sub-sections of the badnodes.yaml are used as a /run/tor/control, in which case the directory for the YAML files must
ExcludeExitNodes but it can be customized with the lWanted commandline arg. be group Tor writeable, and its parents group Tor RX.)
The original idea has also been extended to add different conditions for
exclusion: the ```--contact``` commandline arg is a comma sep list of conditions:
* Empty - no contact info
* NoEmail - no @ sign in the contact',
More may be added later.
Because you don't want to exclude the introduction points to any onion Because you don't want to exclude the introduction points to any onion
you want to connect to, ```--white_onions``` should whitelist the you want to connect to, ```--white_onions``` should whitelist the
introduction points to a comma sep list of onions, but is introduction points to a comma sep list of onions; we fixed stem to do this:
currently broken in stem 1.8.0: see:
* https://github.com/torproject/stem/issues/96 * https://github.com/torproject/stem/issues/96
* https://gitlab.torproject.org/legacy/trac/-/issues/25417 * https://gitlab.torproject.org/legacy/trac/-/issues/25417
```--torrc_output``` will write the torrc ExcludeNodes configuration to a file. ```--torrc_output``` will write the torrc ExcludeNodes configuration to a file.
Now for the final part: we lookup the Contact info of every server ```--good_contacts``` will write the contact info as a ciiss dictionary
that is currently in our Tor, and check it for its existence.
If it fails to provide the well-know url, we assume its a bogus
relay and add it to a list of nodes that goes on ExcludeNodes -
not just exclude Exit.
If the Contact info is good we add the list of fingerprints to add
to ExitNodes, a whitelist of relays to use as exits.
```--proof_output``` will write the contact info as a ciiss dictionary
to a YAML file. If the proof is uri-rsa, the well-known file of fingerprints to a YAML file. If the proof is uri-rsa, the well-known file of fingerprints
is downloaded and the fingerprints are added on a 'fps' field we create is downloaded and the fingerprints are added on a 'fps' field we create
of that fingerprint's entry of the YAML dictionary. This file is read at the of that fingerprint's entry of the YAML dictionary. This file is read at the
beginning of the program to start with a trust database, and only new beginning of the program to start with a trust database, and only new
contact info from new relays are added to the dictionary. contact info from new relays are added to the dictionary.
You can expect it to take an hour or two the first time this is run: Now for the final part: we lookup the Contact info of every relay
>700 domains. that is currently in our Tor, and check it the existence of the
well-known file that lists the fingerprints of the relays it runs.
If it fails to provide the well-know url, we assume its a bad
relay and add it to a list of nodes that goes on ```ExcludeNodes```
(not just ExcludeExitNodes```). If the Contact info is good, we add the
list of fingerprints to ```ExitNodes```, a whitelist of relays to use as exits.
```--bad_on``` We offer the users 3 levels of cleaning:
1. clean relays that have no contact ```=Empty```
2. clean relays that don't have an email in the contact (implies 1)
```=Empty,NoEmail```
3. clean relays that don't have "good' contactinfo. (implies 1)
```=Empty,NoEmail,NotGood```
The default is ```=Empty,NotGood``` ; ```NoEmail``` is inherently imperfect
in that many of the contact-as-an-email are obfuscated, but we try anyway.
To be "good" the ContactInfo must:
1. have a url for the well-defined-file to be gotten
2. must have a file that can be gotten at the URL
3. must support getting the file with a valid SSL cert from a recognized authority
4. (not in the spec but added by Python) must use a TLS SSL > v1
5. must have a fingerprint list in the file
6. must have the FP that got us the contactinfo in the fingerprint list in the file,
For usage, do ```python3 exclude_badExits.py --help` For usage, do ```python3 exclude_badExits.py --help`
## Usage
## Usage
``` ```
usage: exclude_badExits.py [-h] [--https_cafile HTTPS_CAFILE] usage: exclude_badExits.py [-h] [--https_cafile HTTPS_CAFILE]
[--proxy_host PROXY_HOST] [--proxy_port PROXY_PORT] [--proxy_host PROXY_HOST] [--proxy_port PROXY_PORT]
[--proxy_ctl PROXY_CTL] [--torrc TORRC] [--proxy_ctl PROXY_CTL] [--torrc TORRC]
[--timeout TIMEOUT] [--good_nodes GOOD_NODES] [--timeout TIMEOUT] [--good_nodes GOOD_NODES]
[--bad_nodes BAD_NODES] [--contact CONTACT] [--bad_nodes BAD_NODES] [--bad_on BAD_ON]
[--bad_contacts BAD_CONTACTS] [--bad_contacts BAD_CONTACTS]
[--strict_nodes {0,1}] [--wait_boot WAIT_BOOT] [--strict_nodes {0,1}] [--wait_boot WAIT_BOOT]
[--points_timeout POINTS_TIMEOUT] [--points_timeout POINTS_TIMEOUT]
[--log_level LOG_LEVEL] [--log_level LOG_LEVEL]
[--bad_sections BAD_SECTIONS] [--bad_sections BAD_SECTIONS]
[--white_services WHITE_SERVICES] [--white_onions WHITE_ONIONS]
[--torrc_output TORRC_OUTPUT] [--torrc_output TORRC_OUTPUT]
[--proof_output PROOF_OUTPUT] [--relays_output RELAYS_OUTPUT]
``` [--good_contacts GOOD_CONTACTS]
### Optional arguments: optional arguments:
```
-h, --help show this help message and exit -h, --help show this help message and exit
--https_cafile HTTPS_CAFILE --https_cafile HTTPS_CAFILE
Certificate Authority file (in PEM) Certificate Authority file (in PEM)
```
```
--proxy_host PROXY_HOST, --proxy-host PROXY_HOST --proxy_host PROXY_HOST, --proxy-host PROXY_HOST
proxy host proxy host
--proxy_port PROXY_PORT, --proxy-port PROXY_PORT --proxy_port PROXY_PORT, --proxy-port PROXY_PORT
proxy control port proxy control port
--proxy_ctl PROXY_CTL, --proxy-ctl PROXY_CTL --proxy_ctl PROXY_CTL, --proxy-ctl PROXY_CTL
control socket - or port control socket - or port
```
```
--torrc TORRC torrc to check for suggestions --torrc TORRC torrc to check for suggestions
--timeout TIMEOUT proxy download connect timeout --timeout TIMEOUT proxy download connect timeout
```
```
--good_nodes GOOD_NODES --good_nodes GOOD_NODES
Yaml file of good info that should not be excluded Yaml file of good info that should not be excluded
--bad_nodes BAD_NODES --bad_nodes BAD_NODES
Yaml file of bad nodes that should also be excluded Yaml file of bad nodes that should also be excluded
``` --bad_on BAD_ON comma sep list of conditions - Empty,NoEmail,NotGood
```
--contact CONTACT comma sep list of conditions - Empty,NoEmail
--bad_contacts BAD_CONTACTS --bad_contacts BAD_CONTACTS
Yaml file of bad contacts that bad FPs are using Yaml file of bad contacts that bad FPs are using
```
```
--strict_nodes {0,1} Set StrictNodes: 1 is less anonymous but more secure, --strict_nodes {0,1} Set StrictNodes: 1 is less anonymous but more secure,
although some sites may be unreachable although some sites may be unreachable
--wait_boot WAIT_BOOT --wait_boot WAIT_BOOT
@ -127,23 +130,31 @@ usage: exclude_badExits.py [-h] [--https_cafile HTTPS_CAFILE]
--points_timeout POINTS_TIMEOUT --points_timeout POINTS_TIMEOUT
Timeout for getting introduction points - must be long Timeout for getting introduction points - must be long
>120sec. 0 means disabled looking for IPs >120sec. 0 means disabled looking for IPs
```
```
--log_level LOG_LEVEL --log_level LOG_LEVEL
10=debug 20=info 30=warn 40=error 10=debug 20=info 30=warn 40=error
--bad_sections BAD_SECTIONS --bad_sections BAD_SECTIONS
sections of the badnodes.yaml to use, comma separated, sections of the badnodes.yaml to use, comma separated,
'' BROKEN '' BROKEN
``` --white_onions WHITE_ONIONS
```
--white_services WHITE_SERVICES
comma sep. list of onions to whitelist their comma sep. list of onions to whitelist their
introduction points - BROKEN introduction points - BROKEN
```
```
--torrc_output TORRC_OUTPUT --torrc_output TORRC_OUTPUT
Write the torrc configuration to a file Write the torrc configuration to a file
--proof_output PROOF_OUTPUT --relays_output RELAYS_OUTPUT
Write the download relays in json to a file
--good_contacts GOOD_CONTACTS
Write the proof data of the included nodes to a YAML Write the proof data of the included nodes to a YAML
file file
This extends nusenu's basic idea of using the stem library to dynamically
exclude nodes that are likely to be bad by putting them on the ExcludeNodes or
ExcludeExitNodes setting of a running Tor. *
https://github.com/nusenu/noContactInfo_Exit_Excluder *
https://github.com/TheSmashy/TorExitRelayExclude The basic idea is to exclude
Exit nodes that do not have ContactInfo: *
https://github.com/nusenu/ContactInfo-Information-Sharing-Specification That
can be extended to relays that do not have an email in the contact, or to
relays that do not have ContactInfo that is verified to include them.
``` ```

View File

@ -9,12 +9,17 @@ on the ExcludeNodes or ExcludeExitNodes setting of a running Tor.
* https://github.com/nusenu/noContactInfo_Exit_Excluder * https://github.com/nusenu/noContactInfo_Exit_Excluder
* https://github.com/TheSmashy/TorExitRelayExclude * https://github.com/TheSmashy/TorExitRelayExclude
The basic cut is to exclude Exit nodes that do not have a contact. The basic idea is to exclude Exit nodes that do not have ContactInfo:
That can be extended to nodes that do not have an email in the contact etc. * https://github.com/nusenu/ContactInfo-Information-Sharing-Specification
That can be extended to relays that do not have an email in the contact,
or to relays that do not have ContactInfo that is verified to include them.
""" """
"""But there's a problem, and your Tor notice.log will tell you about it: __prolog__ = __doc__
you could exclude the nodes needed to access hidden services or
directorues. So we need to add to the process the concept of a whitelist. __doc__ +="""But there's a problem, and your Tor notice.log will tell you about it:
you could exclude the relays needed to access hidden services or mirror
directories. So we need to add to the process the concept of a whitelist.
In addition, we may have our own blacklist of nodes we want to exclude, In addition, we may have our own blacklist of nodes we want to exclude,
or use these lists for other applications like selektor. or use these lists for other applications like selektor.
@ -35,36 +40,22 @@ BadNodes:
# $0000000000000000000000000000000000000007 # $0000000000000000000000000000000000000007
``` ```
That part requires [PyYAML](https://pyyaml.org/wiki/PyYAML) That part requires [PyYAML](https://pyyaml.org/wiki/PyYAML)
https://github.com/yaml/pyyaml/ https://github.com/yaml/pyyaml/ or ```ruamel```: do
```pip3 install ruamel``` or ```pip3 install PyYAML```;
the advantage of the former is that it preserves comments.
Right now only the ExcludeExitNodes section is used by we may add ExcludeNodes (You may have to run this as the Tor user to get RW access to
later, and by default all sub-sections of the badnodes.yaml are used as a /run/tor/control, in which case the directory for the YAML files must
ExcludeExitNodes but it can be customized with the lWanted commandline arg. be group Tor writeable, and its parents group Tor RX.)
The original idea has also been extended to add different conditions for
exclusion: the ```--contact``` commandline arg is a comma sep list of conditions:
* Empty - no contact info
* NoEmail - no @ sign in the contact',
More may be added later.
Because you don't want to exclude the introduction points to any onion Because you don't want to exclude the introduction points to any onion
you want to connect to, ```--white_onions``` should whitelist the you want to connect to, ```--white_onions``` should whitelist the
introduction points to a comma sep list of onions, but is introduction points to a comma sep list of onions; we fixed stem to do this:
currently broken in stem 1.8.0: see:
* https://github.com/torproject/stem/issues/96 * https://github.com/torproject/stem/issues/96
* https://gitlab.torproject.org/legacy/trac/-/issues/25417 * https://gitlab.torproject.org/legacy/trac/-/issues/25417
```--torrc_output``` will write the torrc ExcludeNodes configuration to a file. ```--torrc_output``` will write the torrc ExcludeNodes configuration to a file.
Now for the final part: we lookup the Contact info of every server
that is currently in our Tor, and check it for its existence.
If it fails to provide the well-know url, we assume its a bogus
relay and add it to a list of nodes that goes on ExcludeNodes -
not just exclude Exit.
If the Contact info is good we add the list of fingerprints to add
to ExitNodes, a whitelist of relays to use as exits.
```--good_contacts``` will write the contact info as a ciiss dictionary ```--good_contacts``` will write the contact info as a ciiss dictionary
to a YAML file. If the proof is uri-rsa, the well-known file of fingerprints to a YAML file. If the proof is uri-rsa, the well-known file of fingerprints
is downloaded and the fingerprints are added on a 'fps' field we create is downloaded and the fingerprints are added on a 'fps' field we create
@ -72,24 +63,51 @@ of that fingerprint's entry of the YAML dictionary. This file is read at the
beginning of the program to start with a trust database, and only new beginning of the program to start with a trust database, and only new
contact info from new relays are added to the dictionary. contact info from new relays are added to the dictionary.
You can expect it to take an hour or two the first time this is run: Now for the final part: we lookup the Contact info of every relay
>700 domains. that is currently in our Tor, and check it the existence of the
well-known file that lists the fingerprints of the relays it runs.
If it fails to provide the well-know url, we assume its a bad
relay and add it to a list of nodes that goes on ```ExcludeNodes```
(not just ExcludeExitNodes```). If the Contact info is good, we add the
list of fingerprints to ```ExitNodes```, a whitelist of relays to use as exits.
```--bad_on``` We offer the users 3 levels of cleaning:
1. clean relays that have no contact ```=Empty```
2. clean relays that don't have an email in the contact (implies 1)
```=Empty,NoEmail```
3. clean relays that don't have "good' contactinfo. (implies 1)
```=Empty,NoEmail,NotGood```
The default is ```=Empty,NotGood``` ; ```NoEmail``` is inherently imperfect
in that many of the contact-as-an-email are obfuscated, but we try anyway.
To be "good" the ContactInfo must:
1. have a url for the well-defined-file to be gotten
2. must have a file that can be gotten at the URL
3. must support getting the file with a valid SSL cert from a recognized authority
4. (not in the spec but added by Python) must use a TLS SSL > v1
5. must have a fingerprint list in the file
6. must have the FP that got us the contactinfo in the fingerprint list in the file,
For usage, do ```python3 exclude_badExits.py --help` For usage, do ```python3 exclude_badExits.py --help`
""" """
# https://github.com/nusenu/trustor-example-trust-config/blob/main/trust_config
# https://github.com/nusenu/tor-relay-operator-ids-trust-information
import argparse import argparse
import os import os
import json
import sys import sys
import time import time
from io import StringIO from io import StringIO
import stem import stem
import urllib3
from stem import InvalidRequest from stem import InvalidRequest
from stem.connection import IncorrectPassword from stem.connection import IncorrectPassword
from stem.util.tor_tools import is_valid_fingerprint from stem.util.tor_tools import is_valid_fingerprint
import urllib3
from urllib3.util.ssl_match_hostname import CertificateError from urllib3.util.ssl_match_hostname import CertificateError
# list(ipaddress._find_address_range(ipaddress.IPv4Network('172.16.0.0/12')) # list(ipaddress._find_address_range(ipaddress.IPv4Network('172.16.0.0/12'))
@ -113,6 +131,13 @@ try:
except: except:
ub_ctx = RR_TYPE_TXT = RR_CLASS_IN = None ub_ctx = RR_TYPE_TXT = RR_CLASS_IN = None
from support_onions import (bAreWeConnected, icheck_torrc, lIntroductionPoints,
oGetStemController, vwait_for_controller,
yKNOWN_NODNS, zResolveDomain)
from trustor_poc import TrustorError, idns_validate
from trustor_poc import oDownloadUrlUrllib3 as oDownloadUrl
global LOG global LOG
import logging import logging
import warnings import warnings
@ -120,18 +145,17 @@ import warnings
warnings.filterwarnings('ignore') warnings.filterwarnings('ignore')
LOG = logging.getLogger() LOG = logging.getLogger()
from support_onions import (bAreWeConnected, icheck_torrc, lIntroductionPoints, try:
oGetStemController, vwait_for_controller, from torcontactinfo import TorContactInfoParser
yKNOWN_NODNS, zResolveDomain) oPARSER = TorContactInfoParser()
from support_phantompy import vsetup_logging except ImportError:
from trustor_poc import TrustorError, idns_validate oPARSER = None
from trustor_poc import oDownloadUrlUrllib3 as oDownloadUrl
LOG.info("imported HTTPSAdapter") ETC_DIR = '/usr/local/etc/tor/yaml'
ETC_DIR = '/etc/tor/yaml'
aTRUST_DB = {} aTRUST_DB = {}
aTRUST_DB_INDEX = {} aTRUST_DB_INDEX = {}
aRELAYS_DB = {}
aRELAYS_DB_INDEX = {}
aFP_EMAIL = {} aFP_EMAIL = {}
sDETAILS_URL = "https://metrics.torproject.org/rs.html#details/" sDETAILS_URL = "https://metrics.torproject.org/rs.html#details/"
# You can call this while bootstrapping # You can call this while bootstrapping
@ -145,13 +169,13 @@ oBAD_NODES[oBAD_ROOT] = {}
oBAD_NODES[oBAD_ROOT]['ExcludeNodes'] = {} oBAD_NODES[oBAD_ROOT]['ExcludeNodes'] = {}
lKNOWN_NODNS = [] lKNOWN_NODNS = []
lMAYBE_NODNS = [] tMAYBE_NODNS = set()
def lYamlBadNodes(sFile, def lYamlBadNodes(sFile,
section=sEXCLUDE_EXIT_KEY, section=sEXCLUDE_EXIT_KEY,
lWanted=['BadExit']): lWanted=['BadExit']):
global oBAD_NODES global oBAD_NODES
global lKNOWN_NODNS global lKNOWN_NODNS
global lMAYBE_NODNS global tMAYBE_NODNS
if not yaml: if not yaml:
return [] return []
@ -167,11 +191,10 @@ def lYamlBadNodes(sFile,
l = oBAD_NODES[oBAD_ROOT]['ExcludeNodes']['BadExit'] l = oBAD_NODES[oBAD_ROOT]['ExcludeNodes']['BadExit']
tMAYBE_NODNS = set(safe_load(StringIO(yKNOWN_NODNS)))
root = 'ExcludeDomains' root = 'ExcludeDomains'
if root not in oBAD_NODES[oBAD_ROOT] or not oBAD_NODES[oBAD_ROOT][root]: if root in oBAD_NODES[oBAD_ROOT] and oBAD_NODES[oBAD_ROOT][root]:
lMAYBE_NODNS = safe_load(StringIO(yKNOWN_NODNS)) tMAYBE_NODNS.extend(oBAD_NODES[oBAD_ROOT][root])
else:
lMAYBE_NODNS = oBAD_NODES[oBAD_ROOT][root]
return l return l
oGOOD_NODES = {} oGOOD_NODES = {}
@ -192,12 +215,12 @@ def lYamlGoodNodes(sFile='/etc/tor/torrc-goodnodes.yaml'):
def bdomain_is_bad(domain, fp): def bdomain_is_bad(domain, fp):
global lKNOWN_NODNS global lKNOWN_NODNS
if domain in lKNOWN_NODNS: return True if domain in lKNOWN_NODNS: return True
if domain in lMAYBE_NODNS: if domain in tMAYBE_NODNS:
ip = zResolveDomain(domain) ip = zResolveDomain(domain)
if ip == '': if ip == '':
LOG.debug(f"{fp} {domain} does not resolve") LOG.debug(f"{fp} {domain} does not resolve")
lKNOWN_NODNS.append(domain) lKNOWN_NODNS.append(domain)
lMAYBE_NODNS.remove(domain) tMAYBE_NODNS.remove(domain)
return True return True
for elt in '@(){}$!': for elt in '@(){}$!':
@ -207,31 +230,79 @@ def bdomain_is_bad(domain, fp):
return False return False
tBAD_URLS = set() tBAD_URLS = set()
lAT_REPS = ['[]', ' at ', '(at)', '[at]', '<at>', '(att)', '_at_',
'~at~', '.at.', '!at!', '<a>t', '<(a)>', '|__at-|', '<:at:>',
'[__at ]', '"a t"', 'removeme at ']
lDOT_REPS = [' point ', ' dot ', '[dot]', '(dot)', '_dot_', '!dot!', '<.>',
'<:dot:>', '|dot--|',
]
lNO_EMAIL = ['<nobody at example dot com>',
'not@needed.com',
'<nobody at none of your business xyz>',
'<not-set@example.com>',
'not a person <nomail at yet dot com>',
r'<nothing/at\\mail.de>',
'@snowden',
'ano ano@fu.dk',
'anonymous',
'anonymous@buzzzz.com',
'check http://highwaytohoell.de',
'no@no.no',
'not@needed.com',
'not@re.al',
'nothanks',
'nottellingyou@mail.info',
'ur@mom.com',
'your@e-mail',
'your@email.com',
]
def sCleanEmail(s):
s = s.lower()
for elt in lAT_REPS:
s = s.replace(' ' + elt + ' ', '@').replace(elt, '@')
for elt in lDOT_REPS:
s = s.replace(elt, '.')
s = s.replace('(dash)', '-')
for elt in lNO_EMAIL:
s = s.replace(elt, '')
return s
lATS = ['abuse', 'email'] lATS = ['abuse', 'email']
lINTS = ['ciissversion', 'uplinkbw', 'signingkeylifetime', 'memory'] lINTS = ['ciissversion', 'uplinkbw', 'signingkeylifetime', 'memory']
lBOOLS = ['dnssec', 'dnsqname', 'aesni', 'autoupdate', 'dnslocalrootzone', lBOOLS = ['dnssec', 'dnsqname', 'aesni', 'autoupdate', 'dnslocalrootzone',
'sandbox', 'offlinemasterkey'] 'sandbox', 'offlinemasterkey']
def aVerifyContact(a, fp, https_cafile, timeout=20, host='127.0.0.1', port=9050): def aCleanContact(a):
global tBAD_URLS
global lKNOWN_NODNS
# cleanups # cleanups
for elt in lINTS: for elt in lINTS:
if elt in a: if elt in a:
a[elt] = int(a[elt]) a[elt] = int(a[elt])
for elt in lBOOLS: for elt in lBOOLS:
if elt in a: if elt not in a: continue
if a[elt] in ['y', 'yes', 'true', 'True']: if a[elt] in ['y', 'yes', 'true', 'True']:
a[elt] = True a[elt] = True
else: else:
a[elt] = False a[elt] = False
for elt in lATS: for elt in lATS:
if elt in a: if elt not in a: continue
a[elt] = a[elt].replace('[]', '@') a[elt] = sCleanEmail(a[elt])
if 'url' in a.keys():
a['url'] = a['url'].rstrip('/')
if a['url'].startswith('http://'):
domain = a['url'].replace('http://', '')
elif a['url'].startswith('https://'):
domain = a['url'].replace('https://', '')
else:
domain = a['url']
a['url'] = 'https://' + domain
a.update({'fps': []}) a.update({'fps': []})
return a
def aVerifyContact(a, fp, https_cafile, timeout=20, host='127.0.0.1', port=9050):
global tBAD_URLS
global lKNOWN_NODNS
keys = list(a.keys()) keys = list(a.keys())
a = aCleanContact(a)
if 'email' not in keys: if 'email' not in keys:
LOG.warn(f"{fp} 'email' not in {keys}")
a['email'] = '' a['email'] = ''
if 'ciissversion' not in keys: if 'ciissversion' not in keys:
aFP_EMAIL[fp] = a['email'] aFP_EMAIL[fp] = a['email']
@ -260,13 +331,10 @@ def aVerifyContact(a, fp, https_cafile, timeout=20, host='127.0.0.1', port=9050)
LOG.debug(f"{fp} 'uri' but not 'url' in {keys}") LOG.debug(f"{fp} 'uri' but not 'url' in {keys}")
# drop through # drop through
c = a['url'].lstrip('https://').lstrip('http://').strip('/') domain = a['url'].replace('https://', '').replace('http://', '')
a['url'] = 'https://' +c # domain should be a unique key for contacts?
# domain should be a unique key for contacts
domain = a['url'][8:]
if bdomain_is_bad(domain, fp): if bdomain_is_bad(domain, fp):
LOG.warn(f"{domain} is bad from {a['url']}") LOG.warn(f"{domain} is bad - {a['url']}")
LOG.debug(f"{fp} is bad from {a}") LOG.debug(f"{fp} is bad from {a}")
return a return a
@ -277,7 +345,7 @@ def aVerifyContact(a, fp, https_cafile, timeout=20, host='127.0.0.1', port=9050)
lKNOWN_NODNS.append(domain) lKNOWN_NODNS.append(domain)
return {} return {}
if a['proof'] not in ['uri-rsa']: if a['proof'] in ['dns-rsa']:
# only support uri for now # only support uri for now
if False and ub_ctx: if False and ub_ctx:
fp_domain = fp + '.' + domain fp_domain = fp + '.' + domain
@ -289,12 +357,13 @@ def aVerifyContact(a, fp, https_cafile, timeout=20, host='127.0.0.1', port=9050)
LOG.warn(f"{fp} proof={a['proof']} not supported yet") LOG.warn(f"{fp} proof={a['proof']} not supported yet")
return a return a
LOG.debug(f"{len(keys)} contact fields for {fp}") # LOG.debug(f"{len(keys)} contact fields for {fp}")
url = f"https://{domain}/.well-known/tor-relay/rsa-fingerprint.txt" url = a['url'] + "/.well-known/tor-relay/rsa-fingerprint.txt"
try: try:
LOG.debug(f"Downloading from {domain} for {fp}") LOG.debug(f"Downloading from {domain} for {fp}")
o = oDownloadUrl(url, https_cafile, o = oDownloadUrl(url, https_cafile,
timeout=timeout, host=host, port=port) timeout=timeout, host=host, port=port,
content_type='text/plain')
# requests response: text "reason", "status_code" # requests response: text "reason", "status_code"
except AttributeError as e: except AttributeError as e:
LOG.exception(f"AttributeError downloading from {domain} {e}") LOG.exception(f"AttributeError downloading from {domain} {e}")
@ -308,7 +377,8 @@ def aVerifyContact(a, fp, https_cafile, timeout=20, host='127.0.0.1', port=9050)
else: else:
LOG.warn(f"TrustorError downloading from {domain} {e.args}") LOG.warn(f"TrustorError downloading from {domain} {e.args}")
tBAD_URLS.add(a['url']) tBAD_URLS.add(a['url'])
except urllib3.exceptions.MaxRetryError as e: # noqa except (urllib3.exceptions.MaxRetryError, urllib3.exceptions.ProtocolError,) as e: # noqa
#
# maybe offline - not bad # maybe offline - not bad
LOG.warn(f"MaxRetryError downloading from {domain} {e}") LOG.warn(f"MaxRetryError downloading from {domain} {e}")
except (BaseException) as e: except (BaseException) as e:
@ -336,33 +406,45 @@ def aVerifyContact(a, fp, https_cafile, timeout=20, host='127.0.0.1', port=9050)
if not l: if not l:
LOG.warn(f"Downloading from {domain} empty for {fp}") LOG.warn(f"Downloading from {domain} empty for {fp}")
else: else:
a['fps'] = [elt for elt in l if elt and len(elt) == 40 \ a['fps'] = [elt.strip() for elt in l if elt \
and not elt.startswith('#')] and not elt.startswith('#')]
LOG.info(f"Downloaded from {domain} {len(a['fps'])} FPs") LOG.info(f"Downloaded from {domain} {len(a['fps'])} FPs")
for elt in a['fps']:
if len(elt) != 40:
LOG.warn(f"len !=40 from {domain} '{elt}'")
return a return a
def aParseContactYaml(contact, fp): def aParseContact(contact, fp):
""" """
See the Tor ContactInfo Information Sharing Specification v2 See the Tor ContactInfo Information Sharing Specification v2
https://nusenu.github.io/ContactInfo-Information-Sharing-Specification/ https://nusenu.github.io/ContactInfo-Information-Sharing-Specification/
""" """
lelts = contact.split()
a = {} a = {}
if len(lelts) % 1 != 0: if not contact:
LOG.warn(f"bad contact for {fp} odd number of components") LOG.warn(f"null contact for {fp}")
LOG.debug(f"{fp} {a}") LOG.debug(f"{fp} {contact}")
return a return {}
key = '' # shlex?
lelts = contact.split(' ')
if not lelts:
LOG.warn(f"empty contact for {fp}")
LOG.debug(f"{fp} {contact}")
return {}
for elt in lelts: for elt in lelts:
if key == '': if ':' not in elt:
key = elt # hoster:Quintex Alliance Consulting
LOG.warn(f"no : in {elt} for {contact} in {fp}")
continue continue
a[key] = elt (key , val,) = elt.split(':', 1)
key = '' if key == '':
LOG.debug(f"{fp} {len(a.keys())} fields") continue
key = key.rstrip(':')
a[key] = val
a = aCleanContact(a)
# LOG.debug(f"{fp} {len(a.keys())} fields")
return a return a
def aParseContact(contact, fp): def aParseContactYaml(contact, fp):
""" """
See the Tor ContactInfo Information Sharing Specification v2 See the Tor ContactInfo Information Sharing Specification v2
https://nusenu.github.io/ContactInfo-Information-Sharing-Specification/ https://nusenu.github.io/ContactInfo-Information-Sharing-Specification/
@ -393,7 +475,7 @@ def oMainArgparser(_=None):
CAfs = [''] CAfs = ['']
parser = argparse.ArgumentParser(add_help=True, parser = argparse.ArgumentParser(add_help=True,
epilog=__doc__) epilog=__prolog__)
parser.add_argument('--https_cafile', type=str, parser.add_argument('--https_cafile', type=str,
help="Certificate Authority file (in PEM)", help="Certificate Authority file (in PEM)",
default=CAfs[0]) default=CAfs[0])
@ -420,8 +502,8 @@ def oMainArgparser(_=None):
parser.add_argument('--bad_nodes', type=str, parser.add_argument('--bad_nodes', type=str,
default=os.path.join(ETC_DIR, 'badnodes.yaml'), default=os.path.join(ETC_DIR, 'badnodes.yaml'),
help="Yaml file of bad nodes that should also be excluded") help="Yaml file of bad nodes that should also be excluded")
parser.add_argument('--contact', type=str, default='Empty,NoEmail', parser.add_argument('--bad_on', type=str, default='Empty,NotGood',
help="comma sep list of conditions - Empty,NoEmail") help="comma sep list of conditions - Empty,NoEmail,NotGood")
parser.add_argument('--bad_contacts', type=str, parser.add_argument('--bad_contacts', type=str,
default=os.path.join(ETC_DIR, 'badcontacts.yaml'), default=os.path.join(ETC_DIR, 'badcontacts.yaml'),
help="Yaml file of bad contacts that bad FPs are using") help="Yaml file of bad contacts that bad FPs are using")
@ -443,6 +525,9 @@ def oMainArgparser(_=None):
parser.add_argument('--torrc_output', type=str, parser.add_argument('--torrc_output', type=str,
default=os.path.join(ETC_DIR, 'torrc.new'), default=os.path.join(ETC_DIR, 'torrc.new'),
help="Write the torrc configuration to a file") help="Write the torrc configuration to a file")
parser.add_argument('--relays_output', type=str,
default=os.path.join(ETC_DIR, 'relays.json'),
help="Write the download relays in json to a file")
parser.add_argument('--good_contacts', type=str, default=os.path.join(ETC_DIR, 'goodcontacts.yaml'), parser.add_argument('--good_contacts', type=str, default=os.path.join(ETC_DIR, 'goodcontacts.yaml'),
help="Write the proof data of the included nodes to a YAML file") help="Write the proof data of the included nodes to a YAML file")
return parser return parser
@ -471,12 +556,134 @@ def vwrite_goodnodes(oargs, oGOOD_NODES, ilen):
os.rename(oargs.good_nodes, bak) os.rename(oargs.good_nodes, bak)
os.rename(tmp, oargs.good_nodes) os.rename(tmp, oargs.good_nodes)
def lget_onionoo_relays(oargs):
import requests
adata = {}
if oargs.relays_output and os.path.exists(oargs.relays_output):
LOG.info(f"Getting OO relays from {oargs.relays_output}")
try:
with open(oargs.relays_output, 'rt') as ofd:
sdata = ofd.read()
adata = json.loads(sdata)
except Exception as e:
LOG.error(f"Getting data relays from {oargs.relays_output}")
adata = {}
if not adata:
surl = "https://onionoo.torproject.org/details"
LOG.info(f"Getting OO relays from {surl}")
sCAfile = oargs.https_cafile
assert os.path.exists(sCAfile), sCAfile
if True:
try:
o = oDownloadUrl(surl, sCAfile,
timeout=oargs.timeout,
host=oargs.proxy_host,
port=oargs.proxy_port,
content_type='')
if hasattr(o, 'text'):
data = o.text
else:
data = str(o.data, 'UTF-8')
except Exception as e:
# simplejson.errors.JSONDecodeError
# urllib3.exceptions import ConnectTimeoutError, NewConnectionError
# (urllib3.exceptions.MaxRetryError, urllib3.exceptions.ProtocolError,)
LOG.exception(f"JSON error {e}")
return []
else:
LOG.debug(f"Downloaded {surl} {len(sdata)} bytes")
adata = json.loads(data)
else:
odata = requests.get(surl, verify=sCAfile)
try:
adata = odata.json()
except Exception as e:
# simplejson.errors.JSONDecodeError
LOG.exception(f"JSON error {e}")
return []
else:
LOG.debug(f"Downloaded {surl} {len(adata)} relays")
sdata = repr(adata)
if oargs.relays_output:
try:
with open(oargs.relays_output, 'wt') as ofd:
ofd.write(sdata)
except Exception as e:
LOG.warn(f"Error {oargs.relays_output} {e}")
else:
LOG.debug(f"Wrote {oargs.relays_output} {len(sdata)} bytes")
lonionoo_relays = [r for r in adata["relays"] if 'fingerprint' in r.keys()]
return lonionoo_relays
def vsetup_logging(log_level, logfile='', stream=sys.stdout):
global LOG
add = True
try:
if 'COLOREDLOGS_LEVEL_STYLES' not in os.environ:
os.environ['COLOREDLOGS_LEVEL_STYLES'] = 'spam=22;debug=28;verbose=34;notice=220;warning=202;success=118,bold;error=124;critical=background=red'
# https://pypi.org/project/coloredlogs/
import coloredlogs
except ImportError:
coloredlogs = False
# stem fucks up logging
# from stem.util import log
logging.getLogger('stem').setLevel(30)
logging._defaultFormatter = logging.Formatter(datefmt='%m-%d %H:%M:%S')
logging._defaultFormatter.default_time_format = '%m-%d %H:%M:%S'
logging._defaultFormatter.default_msec_format = ''
kwargs = dict(level=log_level,
force=True,
format='%(levelname)s %(message)s')
if logfile:
add = logfile.startswith('+')
sub = logfile.startswith('-')
if add or sub:
logfile = logfile[1:]
kwargs['filename'] = logfile
if coloredlogs:
# https://pypi.org/project/coloredlogs/
aKw = dict(level=log_level,
logger=LOG,
stream=stream,
fmt='%(levelname)s %(message)s'
)
coloredlogs.install(**aKw)
if logfile:
oHandler = logging.FileHandler(logfile)
LOG.addHandler(oHandler)
LOG.info(f"CSetting log_level to {log_level} {stream}")
else:
logging.basicConfig(**kwargs)
if add and logfile:
oHandler = logging.StreamHandler(stream)
LOG.addHandler(oHandler)
LOG.info(f"SSetting log_level to {log_level!s}")
def vwritefinale(oargs, lNotInaRELAYS_DB):
if len(lNotInaRELAYS_DB):
LOG.warn(f"{len(lNotInaRELAYS_DB)} relays from stem were not in onionoo.torproject.org")
LOG.info(f"For info on a FP, use: https://nusenu.github.io/OrNetStats/w/relay/<FP>.html")
LOG.info(f"For info on relays, use: https://onionoo.torproject.org/details")
# https://onionoo.torproject.org/details
LOG.info(f"although it's often broken")
def iMain(lArgs): def iMain(lArgs):
global aTRUST_DB global aTRUST_DB
global aTRUST_DB_INDEX global aTRUST_DB_INDEX
global oBAD_NODES global oBAD_NODES
global oGOOD_NODES global oGOOD_NODES
global lKNOWN_NODNS global lKNOWN_NODNS
global aRELAYS_DB
global aRELAYS_DB_INDEX
parser = oMainArgparser() parser = oMainArgparser()
oargs = parser.parse_args(lArgs) oargs = parser.parse_args(lArgs)
@ -484,13 +691,21 @@ def iMain(lArgs):
if bAreWeConnected() is False: if bAreWeConnected() is False:
raise SystemExit("we are not connected") raise SystemExit("we are not connected")
if os.path.exists(oargs.proxy_ctl):
controller = oGetStemController(log_level=oargs.log_level, sock_or_pair=oargs.proxy_ctl)
else:
port =int(oargs.proxy_ctl)
controller = oGetStemController(log_level=oargs.log_level, sock_or_pair=port)
vwait_for_controller(controller, oargs.wait_boot)
sFile = oargs.torrc sFile = oargs.torrc
if sFile and os.path.exists(sFile): if sFile and os.path.exists(sFile):
icheck_torrc(sFile, oargs) icheck_torrc(sFile, oargs)
twhitelist_set = set() twhitelist_set = set()
sFile = oargs.good_contacts sFile = oargs.good_contacts
if sFile and os.path.exists(sFile): if False and sFile and os.path.exists(sFile):
try: try:
with open(sFile, 'rt') as oFd: with open(sFile, 'rt') as oFd:
aTRUST_DB = safe_load(oFd) aTRUST_DB = safe_load(oFd)
@ -511,14 +726,6 @@ def iMain(lArgs):
except Exception as e: except Exception as e:
LOG.exception(f"Error reading YAML TrustDB {sFile} {e}") LOG.exception(f"Error reading YAML TrustDB {sFile} {e}")
if os.path.exists(oargs.proxy_ctl):
controller = oGetStemController(log_level=oargs.log_level, sock_or_pair=oargs.proxy_ctl)
else:
port =int(oargs.proxy_ctl)
controller = oGetStemController(port=port)
vwait_for_controller(controller, oargs.wait_boot)
if oargs.good_contacts: if oargs.good_contacts:
good_contacts_tmp = oargs.good_contacts + '.tmp' good_contacts_tmp = oargs.good_contacts + '.tmp'
@ -542,9 +749,12 @@ def iMain(lArgs):
t = set(oGOOD_NODES[oGOOD_ROOT]['Relays']['IntroductionPoints']) t = set(oGOOD_NODES[oGOOD_ROOT]['Relays']['IntroductionPoints'])
w = set() w = set()
if 'Services' in oGOOD_NODES[oGOOD_ROOT].keys(): if 'Services' in oGOOD_NODES[oGOOD_ROOT].keys():
# 'Onions' can I use the IntroductionPoints for Services too? w = set(oGOOD_NODES[oGOOD_ROOT]['Services'])
# w = set(oGOOD_NODES[oGOOD_ROOT]['Services']) twhitelist_set.update(w)
pass if len(w) > 0:
LOG.info(f"Whitelist {len(t)} relays from Services")
w = set()
if 'Onions' in oGOOD_NODES[oGOOD_ROOT].keys(): if 'Onions' in oGOOD_NODES[oGOOD_ROOT].keys():
# Provides the descriptor for a hidden service. The **address** is the # Provides the descriptor for a hidden service. The **address** is the
# '.onion' address of the hidden service # '.onion' address of the hidden service
@ -555,7 +765,7 @@ def iMain(lArgs):
LOG.info(f"{len(w)} services will be checked from IntroductionPoints") LOG.info(f"{len(w)} services will be checked from IntroductionPoints")
t.update(lIntroductionPoints(controller, w, itimeout=oargs.points_timeout)) t.update(lIntroductionPoints(controller, w, itimeout=oargs.points_timeout))
if len(t) > 0: if len(t) > 0:
LOG.info(f"IntroductionPoints {len(t)} relays from {len(w)} services") LOG.info(f"IntroductionPoints {len(t)} relays from {len(w)} IPs for onions")
twhitelist_set.update(t) twhitelist_set.update(t)
texclude_set = set() texclude_set = set()
@ -573,10 +783,12 @@ def iMain(lArgs):
iFakeContact = 0 iFakeContact = 0
iTotalContacts = 0 iTotalContacts = 0
aBadContacts = {} aBadContacts = {}
lNotInaRELAYS_DB = []
lConds = oargs.contact.split(',') aRELAYS_DB = {elt['fingerprint'].upper(): elt for
elt in lget_onionoo_relays(oargs)
if 'fingerprint' in elt}
lConds = oargs.bad_on.split(',')
iR = 0 iR = 0
relays = controller.get_server_descriptors() relays = controller.get_server_descriptors()
for relay in relays: for relay in relays:
iR += 1 iR += 1
@ -586,6 +798,12 @@ def iMain(lArgs):
relay.fingerprint = relay.fingerprint.upper() relay.fingerprint = relay.fingerprint.upper()
sofar = f"G:{len(aTRUST_DB.keys())} U:{len(tdns_urls)} F:{iFakeContact} BF:{len(texclude_set)} GF:{len(ttrust_db_index)} TC:{iTotalContacts} #{iR}" sofar = f"G:{len(aTRUST_DB.keys())} U:{len(tdns_urls)} F:{iFakeContact} BF:{len(texclude_set)} GF:{len(ttrust_db_index)} TC:{iTotalContacts} #{iR}"
fp = relay.fingerprint
if aRELAYS_DB and fp not in aRELAYS_DB.keys():
LOG.warn(f"{fp} not in aRELAYS_DB")
lNotInaRELAYS_DB += [fp]
if not relay.exit_policy.is_exiting_allowed(): if not relay.exit_policy.is_exiting_allowed():
if sEXCLUDE_EXIT_KEY == 'ExcludeNodes': if sEXCLUDE_EXIT_KEY == 'ExcludeNodes':
pass # LOG.debug(f"{relay.fingerprint} not an exit {sofar}") pass # LOG.debug(f"{relay.fingerprint} not an exit {sofar}")
@ -602,78 +820,79 @@ def iMain(lArgs):
# dunno # dunno
relay.contact = str(relay.contact, 'UTF-8') relay.contact = str(relay.contact, 'UTF-8')
if ('Empty' in lConds and not relay.contact) or \ # fail if the contact is empty
('NoEmail' in lConds and relay.contact and 'email:' not in relay.contact): if ('Empty' in lConds and not relay.contact):
LOG.info(f"{fp} skipping empty contact - Empty {sofar}")
texclude_set.add(relay.fingerprint) texclude_set.add(relay.fingerprint)
continue continue
if not relay.contact or 'ciissversion:' not in relay.contact: contact = sCleanEmail(relay.contact)
# should be unreached 'Empty' should always be in lConds # fail if the contact has no email - unreliable
if ('NoEmail' in lConds and relay.contact and
('@' not in contact and 'email:' not in contact)):
LOG.info(f"{fp} skipping contact - NoEmail {contact} {sofar}")
LOG.debug(f"{fp} {relay.contact} {sofar}")
texclude_set.add(relay.fingerprint)
continue continue
# fail if the contact does not pass
if ('NotGood' in lConds and relay.contact and
('ciissversion:' not in relay.contact)):
LOG.info(f"{fp} skipping no ciissversion in contact {sofar}")
LOG.debug(f"{fp} {relay.contact} {sofar}")
texclude_set.add(relay.fingerprint)
continue
# if it has a ciissversion in contact we count it in total
iTotalContacts += 1 iTotalContacts += 1
fp = relay.fingerprint # fail if the contact does not have url: to pass
if relay.contact and 'url:' not in relay.contact: if relay.contact and 'url' not in relay.contact:
LOG.info(f"{fp} skipping bad contact - no url: {sofar}") LOG.info(f"{fp} skipping unfetchable contact - no url {sofar}")
LOG.debug(f"{fp} {relay.contact} {sofar}") LOG.debug(f"{fp} {relay.contact} {sofar}")
if ('NotGood' in lConds): texclude_set.add(fp)
continue
# only proceed if 'NotGood' not in lConds:
if 'NotGood' not in lConds: continue
# fail if the contact does not have url: to pass
a = aParseContact(relay.contact, relay.fingerprint)
if not a:
LOG.warn(f"{relay.fingerprint} contact did not parse {sofar}")
texclude_set.add(fp) texclude_set.add(fp)
continue continue
c = relay.contact.lower() if 'url' in a and a['url']:
# first rough cut # fail if the contact uses a url we already know is bad
i = c.find('url:') if a['url'] in tBAD_URLS:
if i >=0: LOG.info(f"{relay.fingerprint} skipping in tBAD_URLS {a['url']} {sofar}")
c = c[i + 4:] LOG.debug(f"{relay.fingerprint} {a} {sofar}")
i = c.find(' ') # The fp is using a contact with a URL we know is bad
if i >=0: c = c[:i]
c = c.lstrip('https://').lstrip('http://').strip('/')
i = c.find('/')
if i >=0: c = c[:i]
domain = c
if domain and bdomain_is_bad(domain, fp):
LOG.info(f"{fp} skipping bad {domain} {sofar}")
LOG.debug(f"{fp} {relay.contact} {sofar}")
texclude_set.add(fp)
continue
if domain:
ip = zResolveDomain(domain)
if not ip:
LOG.warn(f"{fp} {domain} did not resolve {sofar}")
texclude_set.add(fp)
lKNOWN_NODNS.append(domain)
iFakeContact += 1 iFakeContact += 1
texclude_set.add(relay.fingerprint)
continue
domain = a['url'].replace('https://', '').replace('http://', '')
# fail if the contact uses a domain we already know does not resolve
if domain in lKNOWN_NODNS:
# The fp is using a contact with a URL we know is bogus
LOG.info(f"{relay.fingerprint} skipping in lKNOWN_NODNS {a} {sofar}")
LOG.debug(f"{relay.fingerprint} {relay} {sofar}")
iFakeContact += 1
texclude_set.add(relay.fingerprint)
continue continue
if 'dns-rsa' in relay.contact.lower(): if 'dns-rsa' in relay.contact.lower():
# skip if the contact uses a dns-rsa url we dont handle
target = f"{relay.fingerprint}.{domain}" target = f"{relay.fingerprint}.{domain}"
LOG.info(f"skipping 'dns-rsa' {target} {sofar}") LOG.info(f"skipping 'dns-rsa' {target} {sofar}")
tdns_urls.add(target) tdns_urls.add(target)
continue
elif 'proof:uri-rsa' in relay.contact.lower(): if 'proof:uri-rsa' in relay.contact.lower():
a = aParseContact(relay.contact, relay.fingerprint) # list(a.values())[0]
if not a: b = aVerifyContact(a,
LOG.warn(f"{relay.fingerprint} did not parse {sofar}")
texclude_set.add(relay.fingerprint)
continue
if 'url' in a and a['url']:
if a['url'] in tBAD_URLS:
# The fp is using a contact with a URL we know is bad
LOG.info(f"{relay.fingerprint} skipping in tBAD_URLS {a['url']} {sofar}")
LOG.debug(f"{relay.fingerprint} {a} {sofar}")
iFakeContact += 1
texclude_set.add(relay.fingerprint)
continue
domain = a['url'].replace('https://', '').replace('http://', '')
if domain in lKNOWN_NODNS:
# The fp is using a contact with a URL we know is bogus
LOG.info(f"{relay.fingerprint} skipping in lKNOWN_NODNS {a['url']} {sofar}")
LOG.debug(f"{relay.fingerprint} {a} {sofar}")
iFakeContact += 1
texclude_set.add(relay.fingerprint)
continue
b = aVerifyContact(list(a.values())[0],
relay.fingerprint, relay.fingerprint,
oargs.https_cafile, oargs.https_cafile,
timeout=oargs.timeout, timeout=oargs.timeout,
@ -697,16 +916,11 @@ def iMain(lArgs):
aBadContacts[relay.fingerprint] = b aBadContacts[relay.fingerprint] = b
continue continue
LOG.info(f"{relay.fingerprint} verified {b['url']} {sofar}") LOG.info(f"{relay.fingerprint} GOOD {b['url']} {sofar}")
# add our contact info to the trustdb # add our contact info to the trustdb
aTRUST_DB[relay.fingerprint] = b aTRUST_DB[relay.fingerprint] = b
for elt in b['fps']: for elt in b['fps']:
aTRUST_DB_INDEX[elt] = b aTRUST_DB_INDEX[elt] = b
if oargs.good_contacts and oargs.log_level <= 20:
# as we go along then clobber
with open(good_contacts_tmp, 'wt') as oFYaml:
yaml.dump(aTRUST_DB, oFYaml)
oFYaml.close()
LOG.info(f"Filtered {len(twhitelist_set)} whitelisted relays") LOG.info(f"Filtered {len(twhitelist_set)} whitelisted relays")
texclude_set = texclude_set.difference(twhitelist_set) texclude_set = texclude_set.difference(twhitelist_set)
@ -746,6 +960,8 @@ def iMain(lArgs):
# GuardNodes are readonl # GuardNodes are readonl
vwrite_goodnodes(oargs, oGOOD_NODES, len(aTRUST_DB_INDEX.keys())) vwrite_goodnodes(oargs, oGOOD_NODES, len(aTRUST_DB_INDEX.keys()))
vwritefinale(oargs, lNotInaRELAYS_DB)
retval = 0 retval = 0
try: try:
logging.getLogger('stem').setLevel(30) logging.getLogger('stem').setLevel(30)

View File

@ -33,41 +33,41 @@ bHAVE_TORR = shutil.which('tor-resolve')
# in the wild we'll keep a copy here so we can avoid restesting # in the wild we'll keep a copy here so we can avoid restesting
yKNOWN_NODNS = """ yKNOWN_NODNS = """
--- ---
- 0x0.is - heraldonion.org
- a9.wtf - linkspartei.org
- aklad5.com - pineapple.cx
- artikel5ev.de
- arvanode.net
- dodo.pm
- dra-family.github.io
- eraldonion.org
- erjan.net
- galtland.network
- ineapple.cx
- lonet.sh
- moneneis.de
- olonet.sh
- or-exit-2.aa78i2efsewr0neeknk.xyz
- or.wowplanet.de
- ormycloud.org
- plied-privacy.net
- rivacysvcs.net
- redacted.org
- rification-for-nusenu.net
- rofl.cat
- rsv.ch
- sv.ch
- thingtohide.nl - thingtohide.nl
- tikel10.org
- tor.wowplanet.de
- tor-exit-2.aa78i2efsewr0neeknk.xyz - tor-exit-2.aa78i2efsewr0neeknk.xyz
- tor-exit-3.aa78i2efsewr0neeknk.xyz - tor-exit-3.aa78i2efsewr0neeknk.xyz
- torix-relays.org - tor.dlecan.com
- tse.com
- tuxli.org - tuxli.org
- w.digidow.eu - verification-for-nusenu.net
- w.cccs.de
""" """
# - 0x0.is
# - a9.wtf
# - aklad5.com
# - artikel5ev.de
# - arvanode.net
# - dodo.pm
# - erjan.net
# - galtland.network
# - lonet.sh
# - moneneis.de
# - olonet.sh
# - or-exit-2.aa78i2efsewr0neeknk.xyz
# - or.wowplanet.de
# - ormycloud.org
# - plied-privacy.net
# - rivacysvcs.net
# - redacted.org
# - rofl.cat
# - sv.ch
# - tikel10.org
# - tor.wowplanet.de
# - torix-relays.org
# - tse.com
# - w.digidow.eu
# - w.cccs.de
def oMakeController(sSock='', port=9051): def oMakeController(sSock='', port=9051):
import getpass import getpass
@ -86,13 +86,15 @@ def oGetStemController(log_level=10, sock_or_pair='/run/tor/control'):
global oSTEM_CONTROLER global oSTEM_CONTROLER
if oSTEM_CONTROLER: return oSTEM_CONTROLER if oSTEM_CONTROLER: return oSTEM_CONTROLER
import stem.util.log import stem.util.log
stem.util.log.Runlevel = log_level # stem.util.log.Runlevel = 'DEBUG' = 20 # log_level
if os.path.exists(sock_or_pair): if os.path.exists(sock_or_pair):
LOG.info(f"controller from socket {sock_or_pair}") LOG.info(f"controller from socket {sock_or_pair}")
controller = Controller.from_socket_file(path=sock_or_pair) controller = Controller.from_socket_file(path=sock_or_pair)
else: else:
if ':' in sock_or_pair: if type(sock_or_pair) == int:
port = sock_or_pair
elif ':' in sock_or_pair:
port = sock_or_pair.split(':')[1] port = sock_or_pair.split(':')[1]
else: else:
port = sock_or_pair port = sock_or_pair

View File

@ -8,7 +8,6 @@ import os
import re import re
import sys import sys
import requests
from stem.control import Controller from stem.control import Controller
# from stem.util.tor_tools import * # from stem.util.tor_tools import *
from urllib3.util import parse_url as urlparse from urllib3.util import parse_url as urlparse
@ -213,6 +212,7 @@ def find_validation_candidates(controller,
return result return result
def oDownloadUrlRequests(uri, sCAfile, timeout=30, host='127.0.0.1', port=9050): def oDownloadUrlRequests(uri, sCAfile, timeout=30, host='127.0.0.1', port=9050):
import requests
# socks proxy used for outbound web requests (for validation of proofs) # socks proxy used for outbound web requests (for validation of proofs)
proxy = {'https': "socks5h://{host}:{port}"} proxy = {'https': "socks5h://{host}:{port}"}
# we use this UA string when connecting to webservers to fetch rsa-fingerprint.txt proof files # we use this UA string when connecting to webservers to fetch rsa-fingerprint.txt proof files
@ -372,7 +372,11 @@ from urllib3.contrib.socks import SOCKSProxyManager
# from urllib3 import Retry # from urllib3 import Retry
def oDownloadUrlUrllib3(uri, sCAfile, timeout=30, host='127.0.0.1', port=9050): def oDownloadUrlUrllib3(uri, sCAfile,
timeout=30,
host='127.0.0.1',
port=9050,
content_type=''):
"""Theres no need to use requests here and it """Theres no need to use requests here and it
adds too many layers on the SSL to be able to get at things adds too many layers on the SSL to be able to get at things
""" """
@ -404,8 +408,8 @@ def oDownloadUrlUrllib3(uri, sCAfile, timeout=30, host='127.0.0.1', port=9050):
if head.status >= 300: if head.status >= 300:
raise TrustorError(f"HTTP Errorcode {head.status}") raise TrustorError(f"HTTP Errorcode {head.status}")
if not head.headers['Content-Type'].startswith('text/plain'): if content_type and not head.headers['Content-Type'].startswith(content_type):
raise TrustorError(f"HTTP Content-Type != text/plain") raise TrustorError(f"HTTP Content-Type != {content_type}")
if not os.path.exists(sCAfile): if not os.path.exists(sCAfile):
raise TrustorError(f"File not found CAfile {sCAfile}") raise TrustorError(f"File not found CAfile {sCAfile}")
@ -419,8 +423,8 @@ def oDownloadUrlUrllib3(uri, sCAfile, timeout=30, host='127.0.0.1', port=9050):
raise raise
if oReqResp.status != 200: if oReqResp.status != 200:
raise TrustorError(f"HTTP Errorcode {head.status}") raise TrustorError(f"HTTP Errorcode {head.status}")
if not oReqResp.headers['Content-Type'].startswith('text/plain'): if content_type and not oReqResp.headers['Content-Type'].startswith(content_type):
raise TrustorError(f"HTTP Content-Type != text/plain") raise TrustorError(f"HTTP Content-Type != {content_type}")
# check for redirects (not allowed as per spec) # check for redirects (not allowed as per spec)
if oReqResp.geturl() != uri: if oReqResp.geturl() != uri:
@ -429,6 +433,7 @@ def oDownloadUrlUrllib3(uri, sCAfile, timeout=30, host='127.0.0.1', port=9050):
oReqResp.decode_content = True oReqResp.decode_content = True
return oReqResp return oReqResp
import urllib3.connectionpool import urllib3.connectionpool
from urllib3.connection import HTTPSConnection from urllib3.connection import HTTPSConnection