[+] reformat with ruff
This commit is contained in:
parent
cf9ede1dde
commit
64a898ce44
@ -1,694 +0,0 @@
|
||||
# %% [markdown]
|
||||
# # About this Notebook
|
||||
#
|
||||
# NLP is a very hot topic right now and as belived by many experts '2020 is going to be NLP's Year' ,with its ever changing dynamics it is experiencing a boom , same as computer vision once did. Owing to its popularity Kaggle launched two NLP competitions recently and me being a lover of this Hot topic prepared myself to join in my first Kaggle Competition.<br><br>
|
||||
# As I joined the competitions and since I was a complete beginner with Deep Learning Techniques for NLP, all my enthusiasm took a beating when I saw everyone Using all kinds of BERT , everything just went over my head,I thought to quit but there is a special thing about Kaggle ,it just hooks you. I thought I have to learn someday , why not now , so I braced myself and sat on the learning curve. I wrote a kernel on the Tweet Sentiment Extraction competition that has now got a gold medal , it can be viewed here : https://www.kaggle.com/tanulsingh077/twitter-sentiment-extaction-analysis-eda-and-model <br><br>
|
||||
# After 10 days of extensive learning(finishing all the latest NLP approaches) , I am back here to share my leaning , by writing a kernel that starts from the very Basic RNN's to built over , all the way to BERT . I invite you all to come and learn alongside with me and take a step closer towards becoming an NLP expert
|
||||
|
||||
# %% [markdown]
|
||||
# # Contents
|
||||
#
|
||||
# In this Notebook I will start with the very Basics of RNN's and Build all the way to latest deep learning architectures to solve NLP problems. It will cover the Following:
|
||||
# * Simple RNN's
|
||||
# * Word Embeddings : Definition and How to get them
|
||||
# * LSTM's
|
||||
# * GRU's
|
||||
# * BI-Directional RNN's
|
||||
# * Encoder-Decoder Models (Seq2Seq Models)
|
||||
# * Attention Models
|
||||
# * Transformers - Attention is all you need
|
||||
# * BERT
|
||||
#
|
||||
# I will divide every Topic into four subsections:
|
||||
# * Basic Overview
|
||||
# * In-Depth Understanding : In this I will attach links of articles and videos to learn about the topic in depth
|
||||
# * Code-Implementation
|
||||
# * Code Explanation
|
||||
#
|
||||
# This is a comprehensive kernel and if you follow along till the end , I promise you would learn all the techniques completely
|
||||
#
|
||||
# Note that the aim of this notebook is not to have a High LB score but to present a beginner guide to understand Deep Learning techniques used for NLP. Also after discussing all of these ideas , I will present a starter solution for this competiton
|
||||
|
||||
# %% [markdown]
|
||||
# **<span style="color:Red">This kernel has been a work of more than 10 days If you find my kernel useful and my efforts appreciable, Please Upvote it , it motivates me to write more Quality content**
|
||||
|
||||
# %% [code]
|
||||
import numpy as np # linear algebra
|
||||
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
|
||||
from tqdm import tqdm
|
||||
from sklearn.model_selection import train_test_split
|
||||
import tensorflow as tf
|
||||
from keras.models import Sequential
|
||||
from keras.layers.recurrent import LSTM, GRU,SimpleRNN
|
||||
from keras.layers.core import Dense, Activation, Dropout
|
||||
from keras.layers.embeddings import Embedding
|
||||
from keras.layers.normalization import BatchNormalization
|
||||
from keras.utils import np_utils
|
||||
from sklearn import preprocessing, decomposition, model_selection, metrics, pipeline
|
||||
from keras.layers import GlobalMaxPooling1D, Conv1D, MaxPooling1D, Flatten, Bidirectional, SpatialDropout1D
|
||||
from keras.preprocessing import sequence, text
|
||||
from keras.callbacks import EarlyStopping
|
||||
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
import seaborn as sns
|
||||
#%matplotlib inline
|
||||
from plotly import graph_objs as go
|
||||
import plotly.express as px
|
||||
import plotly.figure_factory as ff
|
||||
|
||||
# %% [markdown]
|
||||
# # Configuring TPU's
|
||||
#
|
||||
# For this version of Notebook we will be using TPU's as we have to built a BERT Model
|
||||
|
||||
# %% [code]
|
||||
# Detect hardware, return appropriate distribution strategy
|
||||
try:
|
||||
# TPU detection. No parameters necessary if TPU_NAME environment variable is
|
||||
# set: this is always the case on Kaggle.
|
||||
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
|
||||
print('Running on TPU ', tpu.master())
|
||||
except ValueError:
|
||||
tpu = None
|
||||
|
||||
if tpu:
|
||||
tf.config.experimental_connect_to_cluster(tpu)
|
||||
tf.tpu.experimental.initialize_tpu_system(tpu)
|
||||
strategy = tf.distribute.experimental.TPUStrategy(tpu)
|
||||
else:
|
||||
# Default distribution strategy in Tensorflow. Works on CPU and single GPU.
|
||||
strategy = tf.distribute.get_strategy()
|
||||
|
||||
print("REPLICAS: ", strategy.num_replicas_in_sync)
|
||||
|
||||
# %% [code]
|
||||
train = pd.read_csv('/kaggle/input/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train.csv')
|
||||
validation = pd.read_csv('/kaggle/input/jigsaw-multilingual-toxic-comment-classification/validation.csv')
|
||||
test = pd.read_csv('/kaggle/input/jigsaw-multilingual-toxic-comment-classification/test.csv')
|
||||
|
||||
# %% [markdown]
|
||||
# We will drop the other columns and approach this problem as a Binary Classification Problem and also we will have our exercise done on a smaller subsection of the dataset(only 12000 data points) to make it easier to train the models
|
||||
|
||||
# %% [code]
|
||||
train.drop(['severe_toxic','obscene','threat','insult','identity_hate'],axis=1,inplace=True)
|
||||
|
||||
# %% [code]
|
||||
train = train.loc[:12000,:]
|
||||
train.shape
|
||||
|
||||
# %% [markdown]
|
||||
# We will check the maximum number of words that can be present in a comment , this will help us in padding later
|
||||
|
||||
# %% [code]
|
||||
train['comment_text'].apply(lambda x:len(str(x).split())).max()
|
||||
|
||||
# %% [markdown]
|
||||
# Writing a function for getting auc score for validation
|
||||
|
||||
# %% [code]
|
||||
def roc_auc(predictions,target):
|
||||
'''
|
||||
This methods returns the AUC Score when given the Predictions
|
||||
and Labels
|
||||
'''
|
||||
|
||||
fpr, tpr, thresholds = metrics.roc_curve(target, predictions)
|
||||
roc_auc = metrics.auc(fpr, tpr)
|
||||
return roc_auc
|
||||
|
||||
# %% [markdown]
|
||||
# ### Data Preparation
|
||||
|
||||
# %% [code]
|
||||
xtrain, xvalid, ytrain, yvalid = train_test_split(train.comment_text.values, train.toxic.values,
|
||||
stratify=train.toxic.values,
|
||||
random_state=42,
|
||||
test_size=0.2, shuffle=True)
|
||||
|
||||
# %% [markdown]
|
||||
# # Before We Begin
|
||||
#
|
||||
# Before we Begin If you are a complete starter with NLP and never worked with text data, I am attaching a few kernels that will serve as a starting point of your journey
|
||||
# * https://www.kaggle.com/arthurtok/spooky-nlp-and-topic-modelling-tutorial
|
||||
# * https://www.kaggle.com/abhishek/approaching-almost-any-nlp-problem-on-kaggle
|
||||
#
|
||||
# If you want a more basic dataset to practice with here is another kernel which I wrote:
|
||||
# * https://www.kaggle.com/tanulsingh077/what-s-cooking
|
||||
#
|
||||
# Below are some Resources to get started with basic level Neural Networks, It will help us to easily understand the upcoming parts
|
||||
# * https://www.youtube.com/watch?v=aircAruvnKk&list=PL_h2yd2CGtBHEKwEH5iqTZH85wLS-eUzv
|
||||
# * https://www.youtube.com/watch?v=IHZwWFHWa-w&list=PL_h2yd2CGtBHEKwEH5iqTZH85wLS-eUzv&index=2
|
||||
# * https://www.youtube.com/watch?v=Ilg3gGewQ5U&list=PL_h2yd2CGtBHEKwEH5iqTZH85wLS-eUzv&index=3
|
||||
# * https://www.youtube.com/watch?v=tIeHLnjs5U8&list=PL_h2yd2CGtBHEKwEH5iqTZH85wLS-eUzv&index=4
|
||||
#
|
||||
# For Learning how to visualize test data and what to use view:
|
||||
# * https://www.kaggle.com/tanulsingh077/twitter-sentiment-extaction-analysis-eda-and-model
|
||||
# * https://www.kaggle.com/jagangupta/stop-the-s-toxic-comments-eda
|
||||
|
||||
# %% [markdown]
|
||||
# # Simple RNN
|
||||
#
|
||||
# ## Basic Overview
|
||||
#
|
||||
# What is a RNN?
|
||||
#
|
||||
# Recurrent Neural Network(RNN) are a type of Neural Network where the output from previous step are fed as input to the current step. In traditional neural networks, all the inputs and outputs are independent of each other, but in cases like when it is required to predict the next word of a sentence, the previous words are required and hence there is a need to remember the previous words. Thus RNN came into existence, which solved this issue with the help of a Hidden Layer.
|
||||
#
|
||||
# Why RNN's?
|
||||
#
|
||||
# https://www.quora.com/Why-do-we-use-an-RNN-instead-of-a-simple-neural-network
|
||||
#
|
||||
# ## In-Depth Understanding
|
||||
#
|
||||
# * https://medium.com/mindorks/understanding-the-recurrent-neural-network-44d593f112a2
|
||||
# * https://www.youtube.com/watch?v=2E65LDnM2cA&list=PL1F3ABbhcqa3BBWo170U4Ev2wfsF7FN8l
|
||||
# * https://www.d2l.ai/chapter_recurrent-neural-networks/rnn.html
|
||||
#
|
||||
# ## Code Implementation
|
||||
#
|
||||
# So first I will implement the and then I will explain the code step by step
|
||||
|
||||
# %% [code]
|
||||
# using keras tokenizer here
|
||||
token = text.Tokenizer(num_words=None)
|
||||
max_len = 1500
|
||||
|
||||
token.fit_on_texts(list(xtrain) + list(xvalid))
|
||||
xtrain_seq = token.texts_to_sequences(xtrain)
|
||||
xvalid_seq = token.texts_to_sequences(xvalid)
|
||||
|
||||
#zero pad the sequences
|
||||
xtrain_pad = sequence.pad_sequences(xtrain_seq, maxlen=max_len)
|
||||
xvalid_pad = sequence.pad_sequences(xvalid_seq, maxlen=max_len)
|
||||
|
||||
word_index = token.word_index
|
||||
|
||||
# %% [code]
|
||||
#%%time
|
||||
with strategy.scope():
|
||||
# A simpleRNN without any pretrained embeddings and one dense layer
|
||||
model = Sequential()
|
||||
model.add(Embedding(len(word_index) + 1,
|
||||
300,
|
||||
input_length=max_len))
|
||||
model.add(SimpleRNN(100))
|
||||
model.add(Dense(1, activation='sigmoid'))
|
||||
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
|
||||
|
||||
model.summary()
|
||||
|
||||
# %% [code]
|
||||
model.fit(xtrain_pad, ytrain, nb_epoch=5, batch_size=64*strategy.num_replicas_in_sync) #Multiplying by Strategy to run on TPU's
|
||||
|
||||
# %% [code]
|
||||
scores = model.predict(xvalid_pad)
|
||||
print("Auc: %.2f%%" % (roc_auc(scores,yvalid)))
|
||||
|
||||
# %% [code]
|
||||
scores_model = []
|
||||
scores_model.append({'Model': 'SimpleRNN','AUC_Score': roc_auc(scores,yvalid)})
|
||||
|
||||
# %% [markdown]
|
||||
# ## Code Explanantion
|
||||
# * Tokenization<br><br>
|
||||
# So if you have watched the videos and referred to the links, you would know that in an RNN we input a sentence word by word. We represent every word as one hot vectors of dimensions : Numbers of words in Vocab +1. <br>
|
||||
# What keras Tokenizer does is , it takes all the unique words in the corpus,forms a dictionary with words as keys and their number of occurences as values,it then sorts the dictionary in descending order of counts. It then assigns the first value 1 , second value 2 and so on. So let's suppose word 'the' occured the most in the corpus then it will assigned index 1 and vector representing 'the' would be a one-hot vector with value 1 at position 1 and rest zereos.<br>
|
||||
# Try printing first 2 elements of xtrain_seq you will see every word is represented as a digit now
|
||||
|
||||
# %% [code]
|
||||
xtrain_seq[:1]
|
||||
|
||||
# %% [markdown]
|
||||
# <b>Now you might be wondering What is padding? Why its done</b><br><br>
|
||||
#
|
||||
# Here is the answer :
|
||||
# * https://www.quora.com/Which-effect-does-sequence-padding-have-on-the-training-of-a-neural-network
|
||||
# * https://machinelearningmastery.com/data-preparation-variable-length-input-sequences-sequence-prediction/
|
||||
# * https://www.coursera.org/lecture/natural-language-processing-tensorflow/padding-2Cyzs
|
||||
#
|
||||
# Also sometimes people might use special tokens while tokenizing like EOS(end of string) and BOS(Begining of string). Here is the reason why it's done
|
||||
# * https://stackoverflow.com/questions/44579161/why-do-we-do-padding-in-nlp-tasks
|
||||
#
|
||||
#
|
||||
# The code token.word_index simply gives the dictionary of vocab that keras created for us
|
||||
|
||||
# %% [markdown]
|
||||
# * Building the Neural Network
|
||||
#
|
||||
# To understand the Dimensions of input and output given to RNN in keras her is a beautiful article : https://medium.com/@shivajbd/understanding-input-and-output-shape-in-lstm-keras-c501ee95c65e
|
||||
#
|
||||
# The first line model.Sequential() tells keras that we will be building our network sequentially . Then we first add the Embedding layer.
|
||||
# Embedding layer is also a layer of neurons which takes in as input the nth dimensional one hot vector of every word and converts it into 300 dimensional vector , it gives us word embeddings similar to word2vec. We could have used word2vec but the embeddings layer learns during training to enhance the embeddings.
|
||||
# Next we add an 100 LSTM units without any dropout or regularization
|
||||
# At last we add a single neuron with sigmoid function which takes output from 100 LSTM cells (Please note we have 100 LSTM cells not layers) to predict the results and then we compile the model using adam optimizer
|
||||
#
|
||||
# * Comments on the model<br><br>
|
||||
# We can see our model achieves an accuracy of 1 which is just insane , we are clearly overfitting I know , but this was the simplest model of all ,we can tune a lot of hyperparameters like RNN units, we can do batch normalization , dropouts etc to get better result. The point is we got an AUC score of 0.82 without much efforts and we know have learnt about RNN's .Deep learning is really revolutionary
|
||||
|
||||
# %% [markdown]
|
||||
# # Word Embeddings
|
||||
#
|
||||
# While building our simple RNN models we talked about using word-embeddings , So what is word-embeddings and how do we get word-embeddings?
|
||||
# Here is the answer :
|
||||
# * https://www.coursera.org/learn/nlp-sequence-models/lecture/6Oq70/word-representation
|
||||
# * https://machinelearningmastery.com/what-are-word-embeddings/
|
||||
# <br> <br>
|
||||
# The latest approach to getting word Embeddings is using pretained GLoVe or using Fasttext. Without going into too much details, I would explain how to create sentence vectors and how can we use them to create a machine learning model on top of it and since I am a fan of GloVe vectors, word2vec and fasttext. In this Notebook, I'll be using the GloVe vectors. You can download the GloVe vectors from here http://www-nlp.stanford.edu/data/glove.840B.300d.zip or you can search for GloVe in datasets on Kaggle and add the file
|
||||
|
||||
# %% [code]
|
||||
# load the GloVe vectors in a dictionary:
|
||||
|
||||
embeddings_index = {}
|
||||
f = open('/kaggle/input/glove840b300dtxt/glove.840B.300d.txt','r',encoding='utf-8')
|
||||
for line in tqdm(f):
|
||||
values = line.split(' ')
|
||||
word = values[0]
|
||||
coefs = np.asarray([float(val) for val in values[1:]])
|
||||
embeddings_index[word] = coefs
|
||||
f.close()
|
||||
|
||||
print('Found %s word vectors.' % len(embeddings_index))
|
||||
|
||||
# %% [markdown]
|
||||
# # LSTM's
|
||||
#
|
||||
# ## Basic Overview
|
||||
#
|
||||
# Simple RNN's were certainly better than classical ML algorithms and gave state of the art results, but it failed to capture long term dependencies that is present in sentences . So in 1998-99 LSTM's were introduced to counter to these drawbacks.
|
||||
#
|
||||
# ## In Depth Understanding
|
||||
#
|
||||
# Why LSTM's?
|
||||
# * https://www.coursera.org/learn/nlp-sequence-models/lecture/PKMRR/vanishing-gradients-with-rnns
|
||||
# * https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/
|
||||
#
|
||||
# What are LSTM's?
|
||||
# * https://www.coursera.org/learn/nlp-sequence-models/lecture/KXoay/long-short-term-memory-lstm
|
||||
# * https://distill.pub/2019/memorization-in-rnns/
|
||||
# * https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
|
||||
#
|
||||
# # Code Implementation
|
||||
#
|
||||
# We have already tokenized and paded our text for input to LSTM's
|
||||
|
||||
# %% [code]
|
||||
# create an embedding matrix for the words we have in the dataset
|
||||
embedding_matrix = np.zeros((len(word_index) + 1, 300))
|
||||
for word, i in tqdm(word_index.items()):
|
||||
embedding_vector = embeddings_index.get(word)
|
||||
if embedding_vector is not None:
|
||||
embedding_matrix[i] = embedding_vector
|
||||
|
||||
# %% [code]
|
||||
#%%time
|
||||
with strategy.scope():
|
||||
|
||||
# A simple LSTM with glove embeddings and one dense layer
|
||||
model = Sequential()
|
||||
model.add(Embedding(len(word_index) + 1,
|
||||
300,
|
||||
weights=[embedding_matrix],
|
||||
input_length=max_len,
|
||||
trainable=False))
|
||||
|
||||
model.add(LSTM(100, dropout=0.3, recurrent_dropout=0.3))
|
||||
model.add(Dense(1, activation='sigmoid'))
|
||||
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['accuracy'])
|
||||
|
||||
model.summary()
|
||||
|
||||
# %% [code]
|
||||
model.fit(xtrain_pad, ytrain, nb_epoch=5, batch_size=64*strategy.num_replicas_in_sync)
|
||||
|
||||
# %% [code]
|
||||
scores = model.predict(xvalid_pad)
|
||||
print("Auc: %.2f%%" % (roc_auc(scores,yvalid)))
|
||||
|
||||
# %% [code]
|
||||
scores_model.append({'Model': 'LSTM','AUC_Score': roc_auc(scores,yvalid)})
|
||||
|
||||
# %% [markdown]
|
||||
# ## Code Explanation
|
||||
#
|
||||
# As a first step we calculate embedding matrix for our vocabulary from the pretrained GLoVe vectors . Then while building the embedding layer we pass Embedding Matrix as weights to the layer instead of training it over Vocabulary and thus we pass trainable = False.
|
||||
# Rest of the model is same as before except we have replaced the SimpleRNN By LSTM Units
|
||||
#
|
||||
# * Comments on the Model
|
||||
#
|
||||
# We now see that the model is not overfitting and achieves an auc score of 0.96 which is quite commendable , also we close in on the gap between accuracy and auc .
|
||||
# We see that in this case we used dropout and prevented overfitting the data
|
||||
|
||||
# %% [markdown]
|
||||
# # GRU's
|
||||
#
|
||||
# ## Basic Overview
|
||||
#
|
||||
# Introduced by Cho, et al. in 2014, GRU (Gated Recurrent Unit) aims to solve the vanishing gradient problem which comes with a standard recurrent neural network. GRU's are a variation on the LSTM because both are designed similarly and, in some cases, produce equally excellent results . GRU's were designed to be simpler and faster than LSTM's and in most cases produce equally good results and thus there is no clear winner.
|
||||
#
|
||||
# ## In Depth Explanation
|
||||
#
|
||||
# * https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be
|
||||
# * https://www.coursera.org/learn/nlp-sequence-models/lecture/agZiL/gated-recurrent-unit-gru
|
||||
# * https://www.geeksforgeeks.org/gated-recurrent-unit-networks/
|
||||
#
|
||||
# ## Code Implementation
|
||||
|
||||
# %% [code]
|
||||
#%%time
|
||||
with strategy.scope():
|
||||
# GRU with glove embeddings and two dense layers
|
||||
model = Sequential()
|
||||
model.add(Embedding(len(word_index) + 1,
|
||||
300,
|
||||
weights=[embedding_matrix],
|
||||
input_length=max_len,
|
||||
trainable=False))
|
||||
model.add(SpatialDropout1D(0.3))
|
||||
model.add(GRU(300))
|
||||
model.add(Dense(1, activation='sigmoid'))
|
||||
|
||||
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['accuracy'])
|
||||
|
||||
model.summary()
|
||||
|
||||
# %% [code]
|
||||
model.fit(xtrain_pad, ytrain, nb_epoch=5, batch_size=64*strategy.num_replicas_in_sync)
|
||||
|
||||
# %% [code]
|
||||
scores = model.predict(xvalid_pad)
|
||||
print("Auc: %.2f%%" % (roc_auc(scores,yvalid)))
|
||||
|
||||
# %% [code]
|
||||
scores_model.append({'Model': 'GRU','AUC_Score': roc_auc(scores,yvalid)})
|
||||
|
||||
# %% [code]
|
||||
scores_model
|
||||
|
||||
# %% [markdown]
|
||||
# # Bi-Directional RNN's
|
||||
#
|
||||
# ## In Depth Explanation
|
||||
#
|
||||
# * https://www.coursera.org/learn/nlp-sequence-models/lecture/fyXnn/bidirectional-rnn
|
||||
# * https://towardsdatascience.com/understanding-bidirectional-rnn-in-pytorch-5bd25a5dd66
|
||||
# * https://d2l.ai/chapter_recurrent-modern/bi-rnn.html
|
||||
#
|
||||
# ## Code Implementation
|
||||
|
||||
# %% [code]
|
||||
#%%time
|
||||
with strategy.scope():
|
||||
# A simple bidirectional LSTM with glove embeddings and one dense layer
|
||||
model = Sequential()
|
||||
model.add(Embedding(len(word_index) + 1,
|
||||
300,
|
||||
weights=[embedding_matrix],
|
||||
input_length=max_len,
|
||||
trainable=False))
|
||||
model.add(Bidirectional(LSTM(300, dropout=0.3, recurrent_dropout=0.3)))
|
||||
|
||||
model.add(Dense(1,activation='sigmoid'))
|
||||
model.compile(loss='binary_crossentropy', optimizer='adam',metrics=['accuracy'])
|
||||
|
||||
|
||||
model.summary()
|
||||
|
||||
# %% [code]
|
||||
model.fit(xtrain_pad, ytrain, nb_epoch=5, batch_size=64*strategy.num_replicas_in_sync)
|
||||
|
||||
# %% [code]
|
||||
scores = model.predict(xvalid_pad)
|
||||
print("Auc: %.2f%%" % (roc_auc(scores,yvalid)))
|
||||
|
||||
# %% [code]
|
||||
scores_model.append({'Model': 'Bi-directional LSTM','AUC_Score': roc_auc(scores,yvalid)})
|
||||
|
||||
# %% [markdown]
|
||||
# ## Code Explanation
|
||||
#
|
||||
# Code is same as before,only we have added bidirectional nature to the LSTM cells we used before and is self explanatory. We have achieve similar accuracy and auc score as before and now we have learned all the types of typical RNN architectures
|
||||
|
||||
# %% [markdown]
|
||||
# **We are now at the end of part 1 of this notebook and things are about to go wild now as we Enter more complex and State of the art models .If you have followed along from the starting and read all the articles and understood everything , these complex models would be fairly easy to understand.I recommend Finishing Part 1 before continuing as the upcoming techniques can be quite overwhelming**
|
||||
|
||||
# %% [markdown]
|
||||
# # Seq2Seq Model Architecture
|
||||
#
|
||||
# ## Overview
|
||||
#
|
||||
# RNN's are of many types and different architectures are used for different purposes. Here is a nice video explanining different types of model architectures : https://www.coursera.org/learn/nlp-sequence-models/lecture/BO8PS/different-types-of-rnns.
|
||||
# Seq2Seq is a many to many RNN architecture where the input is a sequence and the output is also a sequence (where input and output sequences can be or cannot be of different lengths). This architecture is used in a lot of applications like Machine Translation, text summarization, question answering etc
|
||||
#
|
||||
# ## In Depth Understanding
|
||||
#
|
||||
# I will not write the code implementation for this,but rather I will provide the resources where code has already been implemented and explained in a much better way than I could have ever explained.
|
||||
#
|
||||
# * https://www.coursera.org/learn/nlp-sequence-models/lecture/HyEui/basic-models ---> A basic idea of different Seq2Seq Models
|
||||
#
|
||||
# * https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html , https://machinelearningmastery.com/define-encoder-decoder-sequence-sequence-model-neural-machine-translation-keras/ ---> Basic Encoder-Decoder Model and its explanation respectively
|
||||
#
|
||||
# * https://towardsdatascience.com/how-to-implement-seq2seq-lstm-model-in-keras-shortcutnlp-6f355f3e5639 ---> A More advanced Seq2seq Model and its explanation
|
||||
#
|
||||
# * https://d2l.ai/chapter_recurrent-modern/machine-translation-and-dataset.html , https://d2l.ai/chapter_recurrent-modern/encoder-decoder.html ---> Implementation of Encoder-Decoder Model from scratch
|
||||
#
|
||||
# * https://www.youtube.com/watch?v=IfsjMg4fLWQ&list=PLtmWHNX-gukKocXQOkQjuVxglSDYWsSh9&index=8&t=0s ---> Introduction to Seq2seq By fast.ai
|
||||
|
||||
# %% [code]
|
||||
# Visualization of Results obtained from various Deep learning models
|
||||
results = pd.DataFrame(scores_model).sort_values(by='AUC_Score',ascending=False)
|
||||
results.style.background_gradient(cmap='Blues')
|
||||
|
||||
# %% [code]
|
||||
fig = go.Figure(go.Funnelarea(
|
||||
text =results.Model,
|
||||
values = results.AUC_Score,
|
||||
title = {"position": "top center", "text": "Funnel-Chart of Sentiment Distribution"}
|
||||
))
|
||||
fig.show()
|
||||
|
||||
# %% [markdown]
|
||||
# # Attention Models
|
||||
#
|
||||
# This is the toughest and most tricky part. If you are able to understand the intiuition and working of attention block , understanding transformers and transformer based architectures like BERT will be a piece of cake. This is the part where I spent the most time on and I suggest you do the same . Please read and view the following resources in the order I am providing to ignore getting confused, also at the end of this try to write and draw an attention block in your own way :-
|
||||
#
|
||||
# * https://www.coursera.org/learn/nlp-sequence-models/lecture/RDXpX/attention-model-intuition --> Only watch this video and not the next one
|
||||
# * https://towardsdatascience.com/sequence-2-sequence-model-with-attention-mechanism-9e9ca2a613a
|
||||
# * https://towardsdatascience.com/attention-and-its-different-forms-7fc3674d14dc
|
||||
# * https://distill.pub/2016/augmented-rnns/
|
||||
#
|
||||
# ## Code Implementation
|
||||
#
|
||||
# * https://www.analyticsvidhya.com/blog/2019/11/comprehensive-guide-attention-mechanism-deep-learning/ --> Basic Level
|
||||
# * https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html ---> Implementation from Scratch in Pytorch
|
||||
|
||||
# %% [markdown]
|
||||
# # Transformers : Attention is all you need
|
||||
#
|
||||
# So finally we have reached the end of the learning curve and are about to start learning the technology that changed NLP completely and are the reasons for the state of the art NLP techniques .Transformers were introduced in the paper Attention is all you need by Google. If you have understood the Attention models,this will be very easy , Here is transformers fully explained:
|
||||
#
|
||||
# * http://jalammar.github.io/illustrated-transformer/
|
||||
#
|
||||
# ## Code Implementation
|
||||
#
|
||||
# * http://nlp.seas.harvard.edu/2018/04/03/attention.html ---> This presents the code implementation of the architecture presented in the paper by Google
|
||||
|
||||
# %% [markdown]
|
||||
# # BERT and Its Implementation on this Competition
|
||||
#
|
||||
# As Promised I am back with Resiurces , to understand about BERT architecture , please follow the contents in the given order :-
|
||||
#
|
||||
# * http://jalammar.github.io/illustrated-bert/ ---> In Depth Understanding of BERT
|
||||
#
|
||||
# After going through the post Above , I guess you must have understood how transformer architecture have been utilized by the current SOTA models . Now these architectures can be used in two ways :<br><br>
|
||||
# 1) We can use the model for prediction on our problems using the pretrained weights without fine-tuning or training the model for our sepcific tasks
|
||||
# * EG: http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/ ---> Using Pre-trained BERT without Tuning
|
||||
#
|
||||
# 2) We can fine-tune or train these transformer models for our task by tweaking the already pre-trained weights and training on a much smaller dataset
|
||||
# * EG:* https://www.youtube.com/watch?v=hinZO--TEk4&t=2933s ---> Tuning BERT For your TASK
|
||||
#
|
||||
# We will be using the first example as a base for our implementation of BERT model using Hugging Face and KERAS , but contrary to first example we will also Fine-Tune our model for our task
|
||||
#
|
||||
# Acknowledgements : https://www.kaggle.com/xhlulu/jigsaw-tpu-distilbert-with-huggingface-and-keras
|
||||
#
|
||||
#
|
||||
# Steps Involved :
|
||||
# * Data Preparation : Tokenization and encoding of data
|
||||
# * Configuring TPU's
|
||||
# * Building a Function for Model Training and adding an output layer for classification
|
||||
# * Train the model and get the results
|
||||
|
||||
# %% [code]
|
||||
# Loading Dependencies
|
||||
import os
|
||||
import tensorflow as tf
|
||||
from tensorflow.keras.layers import Dense, Input
|
||||
from tensorflow.keras.optimizers import Adam
|
||||
from tensorflow.keras.models import Model
|
||||
from tensorflow.keras.callbacks import ModelCheckpoint
|
||||
from kaggle_datasets import KaggleDatasets
|
||||
import transformers
|
||||
|
||||
from tokenizers import BertWordPieceTokenizer
|
||||
|
||||
# %% [code]
|
||||
# LOADING THE DATA
|
||||
|
||||
train1 = pd.read_csv("/kaggle/input/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train.csv")
|
||||
valid = pd.read_csv('/kaggle/input/jigsaw-multilingual-toxic-comment-classification/validation.csv')
|
||||
test = pd.read_csv('/kaggle/input/jigsaw-multilingual-toxic-comment-classification/test.csv')
|
||||
sub = pd.read_csv('/kaggle/input/jigsaw-multilingual-toxic-comment-classification/sample_submission.csv')
|
||||
|
||||
# %% [markdown]
|
||||
# Encoder FOr DATA for understanding waht encode batch does read documentation of hugging face tokenizer :
|
||||
# https://huggingface.co/transformers/main_classes/tokenizer.html here
|
||||
|
||||
# %% [code]
|
||||
def fast_encode(texts, tokenizer, chunk_size=256, maxlen=512):
|
||||
"""
|
||||
Encoder for encoding the text into sequence of integers for BERT Input
|
||||
"""
|
||||
tokenizer.enable_truncation(max_length=maxlen)
|
||||
tokenizer.enable_padding(max_length=maxlen)
|
||||
all_ids = []
|
||||
|
||||
for i in tqdm(range(0, len(texts), chunk_size)):
|
||||
text_chunk = texts[i:i+chunk_size].tolist()
|
||||
encs = tokenizer.encode_batch(text_chunk)
|
||||
all_ids.extend([enc.ids for enc in encs])
|
||||
|
||||
return np.array(all_ids)
|
||||
|
||||
# %% [code]
|
||||
#IMP DATA FOR CONFIG
|
||||
|
||||
AUTO = tf.data.experimental.AUTOTUNE
|
||||
|
||||
|
||||
# Configuration
|
||||
EPOCHS = 3
|
||||
BATCH_SIZE = 16 * strategy.num_replicas_in_sync
|
||||
MAX_LEN = 192
|
||||
|
||||
# %% [markdown]
|
||||
# ## Tokenization
|
||||
#
|
||||
# For understanding please refer to hugging face documentation again
|
||||
|
||||
# %% [code]
|
||||
# First load the real tokenizer
|
||||
tokenizer = transformers.DistilBertTokenizer.from_pretrained('distilbert-base-multilingual-cased')
|
||||
# Save the loaded tokenizer locally
|
||||
tokenizer.save_pretrained('.')
|
||||
# Reload it with the huggingface tokenizers library
|
||||
fast_tokenizer = BertWordPieceTokenizer('vocab.txt', lowercase=False)
|
||||
fast_tokenizer
|
||||
|
||||
# %% [code]
|
||||
x_train = fast_encode(train1.comment_text.astype(str), fast_tokenizer, maxlen=MAX_LEN)
|
||||
x_valid = fast_encode(valid.comment_text.astype(str), fast_tokenizer, maxlen=MAX_LEN)
|
||||
x_test = fast_encode(test.content.astype(str), fast_tokenizer, maxlen=MAX_LEN)
|
||||
|
||||
y_train = train1.toxic.values
|
||||
y_valid = valid.toxic.values
|
||||
|
||||
# %% [code]
|
||||
train_dataset = (
|
||||
tf.data.Dataset
|
||||
.from_tensor_slices((x_train, y_train))
|
||||
.repeat()
|
||||
.shuffle(2048)
|
||||
.batch(BATCH_SIZE)
|
||||
.prefetch(AUTO)
|
||||
)
|
||||
|
||||
valid_dataset = (
|
||||
tf.data.Dataset
|
||||
.from_tensor_slices((x_valid, y_valid))
|
||||
.batch(BATCH_SIZE)
|
||||
.cache()
|
||||
.prefetch(AUTO)
|
||||
)
|
||||
|
||||
test_dataset = (
|
||||
tf.data.Dataset
|
||||
.from_tensor_slices(x_test)
|
||||
.batch(BATCH_SIZE)
|
||||
)
|
||||
|
||||
# %% [code]
|
||||
def build_model(transformer, max_len=512):
|
||||
"""
|
||||
function for training the BERT model
|
||||
"""
|
||||
input_word_ids = Input(shape=(max_len,), dtype=tf.int32, name="input_word_ids")
|
||||
sequence_output = transformer(input_word_ids)[0]
|
||||
cls_token = sequence_output[:, 0, :]
|
||||
out = Dense(1, activation='sigmoid')(cls_token)
|
||||
|
||||
model = Model(inputs=input_word_ids, outputs=out)
|
||||
model.compile(Adam(lr=1e-5), loss='binary_crossentropy', metrics=['accuracy'])
|
||||
|
||||
return model
|
||||
|
||||
# %% [markdown]
|
||||
# ## Starting Training
|
||||
#
|
||||
# If you want to use any another model just replace the model name in transformers._____ and use accordingly
|
||||
|
||||
# %% [code]
|
||||
#%%time
|
||||
with strategy.scope():
|
||||
transformer_layer = (
|
||||
transformers.TFDistilBertModel
|
||||
.from_pretrained('distilbert-base-multilingual-cased')
|
||||
)
|
||||
model = build_model(transformer_layer, max_len=MAX_LEN)
|
||||
model.summary()
|
||||
|
||||
# %% [code]
|
||||
n_steps = x_train.shape[0] // BATCH_SIZE
|
||||
train_history = model.fit(
|
||||
train_dataset,
|
||||
steps_per_epoch=n_steps,
|
||||
validation_data=valid_dataset,
|
||||
epochs=EPOCHS
|
||||
)
|
||||
|
||||
# %% [code]
|
||||
n_steps = x_valid.shape[0] // BATCH_SIZE
|
||||
train_history_2 = model.fit(
|
||||
valid_dataset.repeat(),
|
||||
steps_per_epoch=n_steps,
|
||||
epochs=EPOCHS*2
|
||||
)
|
||||
|
||||
# %% [code]
|
||||
sub['toxic'] = model.predict(test_dataset, verbose=1)
|
||||
sub.to_csv('submission.csv', index=False)
|
||||
|
||||
# %% [markdown]
|
||||
# # End Notes
|
||||
#
|
||||
# This was my effort to share my learnings so that everyone can benifit from it.As this community has been very kind to me and helped me in learning all of this , I want to take this forward. I have shared all the resources I used to learn all the stuff .Join me and make these NLP competitions your first ,without being overwhelmed by the shear number of techniques used . It took me 10 days to learn all of this , you can learn it at your pace and dont give in , at the end of all this you will be a different person and it will all be worth it.
|
||||
#
|
||||
#
|
||||
# ### I am attaching more resources if you want NLP end to end:
|
||||
#
|
||||
# 1) Books
|
||||
#
|
||||
# * https://d2l.ai/
|
||||
# * Jason Brownlee's Books
|
||||
#
|
||||
# 2) Courses
|
||||
#
|
||||
# * https://www.coursera.org/learn/nlp-sequence-models/home/welcome
|
||||
# * Fast.ai NLP Course
|
||||
#
|
||||
# 3) Blogs and websites
|
||||
#
|
||||
# * Machine Learning Mastery
|
||||
# * https://distill.pub/
|
||||
# * http://jalammar.github.io/
|
||||
#
|
||||
# **<span style="color:Red">This is subtle effort of contributing towards the community, if it helped you in any way please show a token of love by upvoting**
|
||||
757
d1/mlb_player.py
757
d1/mlb_player.py
@ -1,757 +0,0 @@
|
||||
# %% [markdown]
|
||||
# <div>
|
||||
# <h1 align="center">MLB Player Digital Engagement Forecasting</h1>
|
||||
# <h1 align="center">LightGBM + CatBoost + ANN 2505f2</h1>
|
||||
# </div>
|
||||
|
||||
# %% [markdown]
|
||||
# <div class="alert alert-success">
|
||||
# </div>
|
||||
|
||||
# %% [markdown]
|
||||
# <div class="alert alert-success">
|
||||
# <h1 align="center">If you find this work useful, please don't forget upvoting :)</h1>
|
||||
# </div>
|
||||
|
||||
# %% [markdown]
|
||||
# #### Thanks to: @lhagiimn https://www.kaggle.com/lhagiimn/lightgbm-catboost-ann-2505f2
|
||||
#
|
||||
# #### https://www.kaggle.com/columbia2131/mlb-lightgbm-starter-dataset-code-en-ja
|
||||
#
|
||||
# #### https://www.kaggle.com/mlconsult/1-3816-lb-lbgm-descriptive-stats-param-tune
|
||||
#
|
||||
# #### https://www.kaggle.com/batprem/lightgbm-ann-weight-with-love
|
||||
#
|
||||
# #### https://www.kaggle.com/mlconsult/1-3816-lb-lbgm-descriptive-stats-param-tune
|
||||
#
|
||||
# #### https://www.kaggle.com/ulrich07/mlb-ann-with-lags-tf-keras
|
||||
#
|
||||
|
||||
# %% [markdown]
|
||||
# <div class="alert alert-success">
|
||||
# </div>
|
||||
|
||||
# %% [markdown]
|
||||
# ## About Dataset
|
||||
|
||||
# %% [markdown]
|
||||
# Train.csv is stored as a csv file with each column as follows.
|
||||
#
|
||||
# train.csvを以下のようにして各カラムをcsvファイルとして保管しています。
|
||||
|
||||
# %% [code] {"execution":{"iopub.status.busy":"2021-06-26T07:16:47.242749Z","iopub.execute_input":"2021-06-26T07:16:47.243324Z","iopub.status.idle":"2021-06-26T07:16:48.030215Z","shell.execute_reply.started":"2021-06-26T07:16:47.243266Z","shell.execute_reply":"2021-06-26T07:16:48.029Z"}}
|
||||
import os
|
||||
|
||||
assert os.system(r'''cp ../input/fork-of-1-35-lightgbm-ann-2505f2-c4e96a/* .''') == 0
|
||||
|
||||
# %% [code] {"execution":{"iopub.status.busy":"2021-06-26T07:16:48.031858Z","iopub.execute_input":"2021-06-26T07:16:48.032396Z","iopub.status.idle":"2021-06-26T07:16:48.799514Z","shell.execute_reply.started":"2021-06-26T07:16:48.032357Z","shell.execute_reply":"2021-06-26T07:16:48.798628Z"}}
|
||||
assert os.system(r'''ls''') == 0
|
||||
|
||||
# %% [code] {"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2021-06-26T07:16:48.801992Z","iopub.execute_input":"2021-06-26T07:16:48.802645Z","iopub.status.idle":"2021-06-26T07:16:48.813801Z","shell.execute_reply.started":"2021-06-26T07:16:48.802592Z","shell.execute_reply":"2021-06-26T07:16:48.812863Z"}}
|
||||
#%%capture
|
||||
|
||||
"""
|
||||
!pip install pandarallel
|
||||
|
||||
import gc
|
||||
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from pathlib import Path
|
||||
|
||||
from pandarallel import pandarallel
|
||||
pandarallel.initialize()
|
||||
|
||||
BASE_DIR = Path('../input/mlb-player-digital-engagement-forecasting')
|
||||
train = pd.read_csv(BASE_DIR / 'train.csv')
|
||||
|
||||
null = np.nan
|
||||
true = True
|
||||
false = False
|
||||
|
||||
for col in train.columns:
|
||||
|
||||
if col == 'date': continue
|
||||
|
||||
_index = train[col].notnull()
|
||||
train.loc[_index, col] = train.loc[_index, col].parallel_apply(lambda x: eval(x))
|
||||
|
||||
outputs = []
|
||||
for index, date, record in train.loc[_index, ['date', col]].itertuples():
|
||||
_df = pd.DataFrame(record)
|
||||
_df['index'] = index
|
||||
_df['date'] = date
|
||||
outputs.append(_df)
|
||||
|
||||
outputs = pd.concat(outputs).reset_index(drop=True)
|
||||
|
||||
outputs.to_csv(f'{col}_train.csv', index=False)
|
||||
outputs.to_pickle(f'{col}_train.pkl')
|
||||
|
||||
del outputs
|
||||
del train[col]
|
||||
gc.collect()
|
||||
"""
|
||||
|
||||
# %% [markdown] {"execution":{"iopub.status.busy":"2021-06-16T09:14:33.869464Z","iopub.execute_input":"2021-06-16T09:14:33.869905Z","iopub.status.idle":"2021-06-16T09:14:33.874766Z","shell.execute_reply.started":"2021-06-16T09:14:33.869879Z","shell.execute_reply":"2021-06-16T09:14:33.873097Z"}}
|
||||
# ## Training
|
||||
|
||||
# %% [code] {"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2021-06-26T07:16:48.81564Z","iopub.execute_input":"2021-06-26T07:16:48.816326Z","iopub.status.idle":"2021-06-26T07:16:50.081995Z","shell.execute_reply.started":"2021-06-26T07:16:48.816246Z","shell.execute_reply":"2021-06-26T07:16:50.080828Z"}}
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from pathlib import Path
|
||||
from sklearn.metrics import mean_absolute_error
|
||||
from datetime import timedelta
|
||||
from functools import reduce
|
||||
from tqdm import tqdm
|
||||
import lightgbm as lgbm
|
||||
import mlb
|
||||
import os
|
||||
|
||||
# %% [code] {"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2021-06-26T07:16:50.083534Z","iopub.execute_input":"2021-06-26T07:16:50.083899Z","iopub.status.idle":"2021-06-26T07:16:50.088159Z","shell.execute_reply.started":"2021-06-26T07:16:50.083863Z","shell.execute_reply":"2021-06-26T07:16:50.087357Z"}}
|
||||
BASE_DIR = Path('../input/mlb-player-digital-engagement-forecasting')
|
||||
TRAIN_DIR = Path('../input/mlb-pdef-train-dataset')
|
||||
|
||||
# %% [code] {"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2021-06-26T07:16:50.08951Z","iopub.execute_input":"2021-06-26T07:16:50.090053Z","iopub.status.idle":"2021-06-26T07:16:54.221868Z","shell.execute_reply.started":"2021-06-26T07:16:50.090018Z","shell.execute_reply":"2021-06-26T07:16:54.220656Z"}}
|
||||
players = pd.read_csv(BASE_DIR / 'players.csv')
|
||||
|
||||
rosters = pd.read_pickle(TRAIN_DIR / 'rosters_train.pkl')
|
||||
targets = pd.read_pickle(TRAIN_DIR / 'nextDayPlayerEngagement_train.pkl')
|
||||
scores = pd.read_pickle(TRAIN_DIR / 'playerBoxScores_train.pkl')
|
||||
scores = scores.groupby(['playerId', 'date']).sum().reset_index()
|
||||
|
||||
# %% [code] {"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2021-06-26T07:16:54.223547Z","iopub.execute_input":"2021-06-26T07:16:54.224Z","iopub.status.idle":"2021-06-26T07:16:54.243132Z","shell.execute_reply.started":"2021-06-26T07:16:54.22395Z","shell.execute_reply":"2021-06-26T07:16:54.242076Z"}}
|
||||
targets_cols = ['playerId', 'target1', 'target2', 'target3', 'target4', 'date']
|
||||
players_cols = ['playerId', 'primaryPositionName']
|
||||
rosters_cols = ['playerId', 'teamId', 'status', 'date']
|
||||
scores_cols = ['playerId', 'battingOrder', 'gamesPlayedBatting', 'flyOuts',
|
||||
'groundOuts', 'runsScored', 'doubles', 'triples', 'homeRuns',
|
||||
'strikeOuts', 'baseOnBalls', 'intentionalWalks', 'hits', 'hitByPitch',
|
||||
'atBats', 'caughtStealing', 'stolenBases', 'groundIntoDoublePlay',
|
||||
'groundIntoTriplePlay', 'plateAppearances', 'totalBases', 'rbi',
|
||||
'leftOnBase', 'sacBunts', 'sacFlies', 'catchersInterference',
|
||||
'pickoffs', 'gamesPlayedPitching', 'gamesStartedPitching',
|
||||
'completeGamesPitching', 'shutoutsPitching', 'winsPitching',
|
||||
'lossesPitching', 'flyOutsPitching', 'airOutsPitching',
|
||||
'groundOutsPitching', 'runsPitching', 'doublesPitching',
|
||||
'triplesPitching', 'homeRunsPitching', 'strikeOutsPitching',
|
||||
'baseOnBallsPitching', 'intentionalWalksPitching', 'hitsPitching',
|
||||
'hitByPitchPitching', 'atBatsPitching', 'caughtStealingPitching',
|
||||
'stolenBasesPitching', 'inningsPitched', 'saveOpportunities',
|
||||
'earnedRuns', 'battersFaced', 'outsPitching', 'pitchesThrown', 'balls',
|
||||
'strikes', 'hitBatsmen', 'balks', 'wildPitches', 'pickoffsPitching',
|
||||
'rbiPitching', 'gamesFinishedPitching', 'inheritedRunners',
|
||||
'inheritedRunnersScored', 'catchersInterferencePitching',
|
||||
'sacBuntsPitching', 'sacFliesPitching', 'saves', 'holds', 'blownSaves',
|
||||
'assists', 'putOuts', 'errors', 'chances', 'date']
|
||||
|
||||
feature_cols = ['label_playerId', 'label_primaryPositionName', 'label_teamId',
|
||||
'label_status', 'battingOrder', 'gamesPlayedBatting', 'flyOuts',
|
||||
'groundOuts', 'runsScored', 'doubles', 'triples', 'homeRuns',
|
||||
'strikeOuts', 'baseOnBalls', 'intentionalWalks', 'hits', 'hitByPitch',
|
||||
'atBats', 'caughtStealing', 'stolenBases', 'groundIntoDoublePlay',
|
||||
'groundIntoTriplePlay', 'plateAppearances', 'totalBases', 'rbi',
|
||||
'leftOnBase', 'sacBunts', 'sacFlies', 'catchersInterference',
|
||||
'pickoffs', 'gamesPlayedPitching', 'gamesStartedPitching',
|
||||
'completeGamesPitching', 'shutoutsPitching', 'winsPitching',
|
||||
'lossesPitching', 'flyOutsPitching', 'airOutsPitching',
|
||||
'groundOutsPitching', 'runsPitching', 'doublesPitching',
|
||||
'triplesPitching', 'homeRunsPitching', 'strikeOutsPitching',
|
||||
'baseOnBallsPitching', 'intentionalWalksPitching', 'hitsPitching',
|
||||
'hitByPitchPitching', 'atBatsPitching', 'caughtStealingPitching',
|
||||
'stolenBasesPitching', 'inningsPitched', 'saveOpportunities',
|
||||
'earnedRuns', 'battersFaced', 'outsPitching', 'pitchesThrown', 'balls',
|
||||
'strikes', 'hitBatsmen', 'balks', 'wildPitches', 'pickoffsPitching',
|
||||
'rbiPitching', 'gamesFinishedPitching', 'inheritedRunners',
|
||||
'inheritedRunnersScored', 'catchersInterferencePitching',
|
||||
'sacBuntsPitching', 'sacFliesPitching', 'saves', 'holds', 'blownSaves',
|
||||
'assists', 'putOuts', 'errors', 'chances','target1_mean',
|
||||
'target1_median',
|
||||
'target1_std',
|
||||
'target1_min',
|
||||
'target1_max',
|
||||
'target1_prob',
|
||||
'target2_mean',
|
||||
'target2_median',
|
||||
'target2_std',
|
||||
'target2_min',
|
||||
'target2_max',
|
||||
'target2_prob',
|
||||
'target3_mean',
|
||||
'target3_median',
|
||||
'target3_std',
|
||||
'target3_min',
|
||||
'target3_max',
|
||||
'target3_prob',
|
||||
'target4_mean',
|
||||
'target4_median',
|
||||
'target4_std',
|
||||
'target4_min',
|
||||
'target4_max',
|
||||
'target4_prob']
|
||||
feature_cols2 = ['label_playerId', 'label_primaryPositionName', 'label_teamId',
|
||||
'label_status', 'battingOrder', 'gamesPlayedBatting', 'flyOuts',
|
||||
'groundOuts', 'runsScored', 'doubles', 'triples', 'homeRuns',
|
||||
'strikeOuts', 'baseOnBalls', 'intentionalWalks', 'hits', 'hitByPitch',
|
||||
'atBats', 'caughtStealing', 'stolenBases', 'groundIntoDoublePlay',
|
||||
'groundIntoTriplePlay', 'plateAppearances', 'totalBases', 'rbi',
|
||||
'leftOnBase', 'sacBunts', 'sacFlies', 'catchersInterference',
|
||||
'pickoffs', 'gamesPlayedPitching', 'gamesStartedPitching',
|
||||
'completeGamesPitching', 'shutoutsPitching', 'winsPitching',
|
||||
'lossesPitching', 'flyOutsPitching', 'airOutsPitching',
|
||||
'groundOutsPitching', 'runsPitching', 'doublesPitching',
|
||||
'triplesPitching', 'homeRunsPitching', 'strikeOutsPitching',
|
||||
'baseOnBallsPitching', 'intentionalWalksPitching', 'hitsPitching',
|
||||
'hitByPitchPitching', 'atBatsPitching', 'caughtStealingPitching',
|
||||
'stolenBasesPitching', 'inningsPitched', 'saveOpportunities',
|
||||
'earnedRuns', 'battersFaced', 'outsPitching', 'pitchesThrown', 'balls',
|
||||
'strikes', 'hitBatsmen', 'balks', 'wildPitches', 'pickoffsPitching',
|
||||
'rbiPitching', 'gamesFinishedPitching', 'inheritedRunners',
|
||||
'inheritedRunnersScored', 'catchersInterferencePitching',
|
||||
'sacBuntsPitching', 'sacFliesPitching', 'saves', 'holds', 'blownSaves',
|
||||
'assists', 'putOuts', 'errors', 'chances','target1_mean',
|
||||
'target1_median',
|
||||
'target1_std',
|
||||
'target1_min',
|
||||
'target1_max',
|
||||
'target1_prob',
|
||||
'target2_mean',
|
||||
'target2_median',
|
||||
'target2_std',
|
||||
'target2_min',
|
||||
'target2_max',
|
||||
'target2_prob',
|
||||
'target3_mean',
|
||||
'target3_median',
|
||||
'target3_std',
|
||||
'target3_min',
|
||||
'target3_max',
|
||||
'target3_prob',
|
||||
'target4_mean',
|
||||
'target4_median',
|
||||
'target4_std',
|
||||
'target4_min',
|
||||
'target4_max',
|
||||
'target4_prob',
|
||||
'target1']
|
||||
|
||||
# %% [code] {"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2021-06-26T07:16:54.244866Z","iopub.execute_input":"2021-06-26T07:16:54.24532Z","iopub.status.idle":"2021-06-26T07:16:54.296844Z","shell.execute_reply.started":"2021-06-26T07:16:54.245257Z","shell.execute_reply":"2021-06-26T07:16:54.295689Z"}}
|
||||
player_target_stats = pd.read_csv("../input/player-target-stats/player_target_stats.csv")
|
||||
data_names=player_target_stats.columns.values.tolist()
|
||||
data_names
|
||||
|
||||
# %% [code] {"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2021-06-26T07:16:54.300157Z","iopub.execute_input":"2021-06-26T07:16:54.300622Z","iopub.status.idle":"2021-06-26T07:17:02.252208Z","shell.execute_reply.started":"2021-06-26T07:16:54.300578Z","shell.execute_reply":"2021-06-26T07:17:02.250423Z"}}
|
||||
# creat dataset
|
||||
train = targets[targets_cols].merge(players[players_cols], on=['playerId'], how='left')
|
||||
train = train.merge(rosters[rosters_cols], on=['playerId', 'date'], how='left')
|
||||
train = train.merge(scores[scores_cols], on=['playerId', 'date'], how='left')
|
||||
train = train.merge(player_target_stats, how='inner', left_on=["playerId"],right_on=["playerId"])
|
||||
|
||||
|
||||
# label encoding
|
||||
player2num = {c: i for i, c in enumerate(train['playerId'].unique())}
|
||||
position2num = {c: i for i, c in enumerate(train['primaryPositionName'].unique())}
|
||||
teamid2num = {c: i for i, c in enumerate(train['teamId'].unique())}
|
||||
status2num = {c: i for i, c in enumerate(train['status'].unique())}
|
||||
train['label_playerId'] = train['playerId'].map(player2num)
|
||||
train['label_primaryPositionName'] = train['primaryPositionName'].map(position2num)
|
||||
train['label_teamId'] = train['teamId'].map(teamid2num)
|
||||
train['label_status'] = train['status'].map(status2num)
|
||||
|
||||
# %% [code] {"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2021-06-26T07:17:02.253453Z","iopub.status.idle":"2021-06-26T07:17:02.254076Z"}}
|
||||
train_X = train[feature_cols]
|
||||
train_y = train[['target1', 'target2', 'target3', 'target4']]
|
||||
|
||||
_index = (train['date'] < 20210401)
|
||||
x_train1 = train_X.loc[_index].reset_index(drop=True)
|
||||
y_train1 = train_y.loc[_index].reset_index(drop=True)
|
||||
x_valid1 = train_X.loc[~_index].reset_index(drop=True)
|
||||
y_valid1 = train_y.loc[~_index].reset_index(drop=True)
|
||||
|
||||
# %% [code] {"execution":{"iopub.status.busy":"2021-06-26T07:17:02.255068Z","iopub.status.idle":"2021-06-26T07:17:02.255685Z"}}
|
||||
train_X = train[feature_cols2]
|
||||
train_y = train[['target1', 'target2', 'target3', 'target4']]
|
||||
|
||||
_index = (train['date'] < 20210401)
|
||||
x_train2 = train_X.loc[_index].reset_index(drop=True)
|
||||
y_train2 = train_y.loc[_index].reset_index(drop=True)
|
||||
x_valid2 = train_X.loc[~_index].reset_index(drop=True)
|
||||
y_valid2 = train_y.loc[~_index].reset_index(drop=True)
|
||||
|
||||
# %% [code] {"execution":{"iopub.status.busy":"2021-06-26T07:17:02.256629Z","iopub.status.idle":"2021-06-26T07:17:02.257215Z"}}
|
||||
train_X
|
||||
|
||||
# %% [code] {"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2021-06-26T07:17:02.258224Z","iopub.status.idle":"2021-06-26T07:17:02.258854Z"}}
|
||||
def fit_lgbm(x_train, y_train, x_valid, y_valid, params: dict=None, verbose=100):
|
||||
oof_pred = np.zeros(len(y_valid), dtype=np.float32)
|
||||
model = lgbm.LGBMRegressor(**params)
|
||||
model.fit(x_train, y_train,
|
||||
eval_set=[(x_valid, y_valid)],
|
||||
early_stopping_rounds=verbose,
|
||||
verbose=verbose)
|
||||
oof_pred = model.predict(x_valid)
|
||||
score = mean_absolute_error(oof_pred, y_valid)
|
||||
print('mae:', score)
|
||||
return oof_pred, model, score
|
||||
|
||||
|
||||
# training lightgbm
|
||||
|
||||
params1 = {'objective':'mae',
|
||||
'reg_alpha': 0.14947461820098767,
|
||||
'reg_lambda': 0.10185644384043743,
|
||||
'n_estimators': 3633,
|
||||
'learning_rate': 0.08046301304430488,
|
||||
'num_leaves': 674,
|
||||
'feature_fraction': 0.9101240539122566,
|
||||
'bagging_fraction': 0.9884451442950513,
|
||||
'bagging_freq': 8,
|
||||
'min_child_samples': 51}
|
||||
|
||||
params2 = {
|
||||
'objective':'mae',
|
||||
'reg_alpha': 0.1,
|
||||
'reg_lambda': 0.1,
|
||||
'n_estimators': 80,
|
||||
'learning_rate': 0.1,
|
||||
'random_state': 42,
|
||||
"num_leaves": 22
|
||||
}
|
||||
|
||||
params4 = {'objective':'mae',
|
||||
'reg_alpha': 0.016468100279441976,
|
||||
'reg_lambda': 0.09128335764019105,
|
||||
'n_estimators': 9868,
|
||||
'learning_rate': 0.10528150510326864,
|
||||
'num_leaves': 157,
|
||||
'feature_fraction': 0.5419185713426886,
|
||||
'bagging_fraction': 0.2637405128936662,
|
||||
'bagging_freq': 19,
|
||||
'min_child_samples': 71}
|
||||
|
||||
|
||||
params = {
|
||||
'objective':'mae',
|
||||
'reg_alpha': 0.1,
|
||||
'reg_lambda': 0.1,
|
||||
'n_estimators': 10000,
|
||||
'learning_rate': 0.1,
|
||||
'random_state': 42,
|
||||
"num_leaves": 100
|
||||
}
|
||||
|
||||
|
||||
# Slow from this point !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
|
||||
|
||||
oof1, model1, score1 = fit_lgbm(
|
||||
x_train1, y_train1['target1'],
|
||||
x_valid1, y_valid1['target1'],
|
||||
params1
|
||||
)
|
||||
|
||||
oof2, model2, score2 = fit_lgbm(
|
||||
x_train2, y_train2['target2'],
|
||||
x_valid2, y_valid2['target2'],
|
||||
params2
|
||||
)
|
||||
|
||||
oof3, model3, score3 = fit_lgbm(
|
||||
x_train2, y_train2['target3'],
|
||||
x_valid2, y_valid2['target3'],
|
||||
params
|
||||
)
|
||||
|
||||
oof4, model4, score4 = fit_lgbm(
|
||||
x_train2, y_train2['target4'],
|
||||
x_valid2, y_valid2['target4'],
|
||||
params4
|
||||
)
|
||||
|
||||
score = (score1+score2+score3+score4) / 4
|
||||
print(f'score: {score}')
|
||||
|
||||
# %% [code]
|
||||
import pickle
|
||||
from catboost import CatBoostRegressor
|
||||
|
||||
def fit_lgbm(x_train, y_train, x_valid, y_valid, target, params: dict=None, verbose=100):
|
||||
oof_pred_lgb = np.zeros(len(y_valid), dtype=np.float32)
|
||||
oof_pred_cat = np.zeros(len(y_valid), dtype=np.float32)
|
||||
|
||||
if os.path.isfile(f'../input/mlb-lgbm-and-catboost-models/model_lgb_{target}.pkl'):
|
||||
with open(f'../input/mlb-lgbm-and-catboost-models/model_lgb_{target}.pkl', 'rb') as fin:
|
||||
model = pickle.load(fin)
|
||||
else:
|
||||
|
||||
model = lgbm.LGBMRegressor(**params)
|
||||
model.fit(x_train, y_train,
|
||||
eval_set=[(x_valid, y_valid)],
|
||||
early_stopping_rounds=verbose,
|
||||
verbose=verbose)
|
||||
|
||||
with open(f'model_lgb_{target}.pkl', 'wb') as handle:
|
||||
pickle.dump(model, handle, protocol=pickle.HIGHEST_PROTOCOL)
|
||||
|
||||
oof_pred_lgb = model.predict(x_valid)
|
||||
score_lgb = mean_absolute_error(oof_pred_lgb, y_valid)
|
||||
print('mae:', score_lgb)
|
||||
|
||||
if os.path.isfile(f'../input/mlb-lgbm-and-catboost-models/model_cb_{target}.pkl'):
|
||||
with open(f'../input/mlb-lgbm-and-catboost-models/model_cb_{target}.pkl', 'rb') as fin:
|
||||
model_cb = pickle.load(fin)
|
||||
else:
|
||||
|
||||
model_cb = CatBoostRegressor(
|
||||
n_estimators=2000,
|
||||
learning_rate=0.05,
|
||||
loss_function='MAE',
|
||||
eval_metric='MAE',
|
||||
max_bin=50,
|
||||
subsample=0.9,
|
||||
colsample_bylevel=0.5,
|
||||
verbose=100)
|
||||
|
||||
model_cb.fit(x_train, y_train, use_best_model=True,
|
||||
eval_set=(x_valid, y_valid),
|
||||
early_stopping_rounds=25)
|
||||
|
||||
with open(f'model_cb_{target}.pkl', 'wb') as handle:
|
||||
pickle.dump(model_cb, handle, protocol=pickle.HIGHEST_PROTOCOL)
|
||||
|
||||
oof_pred_cat = model_cb.predict(x_valid)
|
||||
score_cat = mean_absolute_error(oof_pred_cat, y_valid)
|
||||
print('mae:', score_cat)
|
||||
|
||||
return oof_pred_lgb, model, oof_pred_cat, model_cb, score_lgb, score_cat
|
||||
|
||||
|
||||
# training lightgbm
|
||||
params = {
|
||||
'boosting_type': 'gbdt',
|
||||
'objective':'mae',
|
||||
'subsample': 0.5,
|
||||
'subsample_freq': 1,
|
||||
'learning_rate': 0.03,
|
||||
'num_leaves': 2**11-1,
|
||||
'min_data_in_leaf': 2**12-1,
|
||||
'feature_fraction': 0.5,
|
||||
'max_bin': 100,
|
||||
'n_estimators': 2500,
|
||||
'boost_from_average': False,
|
||||
"random_seed":42,
|
||||
}
|
||||
|
||||
oof_pred_lgb2, model_lgb2, oof_pred_cat2, model_cb2, score_lgb2, score_cat2 = fit_lgbm(
|
||||
x_train1, y_train1['target2'],
|
||||
x_valid1, y_valid1['target2'],
|
||||
2, params
|
||||
)
|
||||
|
||||
oof_pred_lgb1, model_lgb1, oof_pred_cat1, model_cb1, score_lgb1, score_cat1 = fit_lgbm(
|
||||
x_train1, y_train1['target1'],
|
||||
x_valid1, y_valid1['target1'],
|
||||
1, params
|
||||
)
|
||||
|
||||
oof_pred_lgb3, model_lgb3, oof_pred_cat3, model_cb3, score_lgb3, score_cat3 = fit_lgbm(
|
||||
x_train1, y_train1['target3'],
|
||||
x_valid1, y_valid1['target3'],
|
||||
3, params
|
||||
)
|
||||
oof_pred_lgb4, model_lgb4, oof_pred_cat4, model_cb4, score_lgb4, score_cat4= fit_lgbm(
|
||||
x_train1, y_train1['target4'],
|
||||
x_valid1, y_valid1['target4'],
|
||||
4, params
|
||||
)
|
||||
|
||||
score = (score_lgb1+score_lgb2+score_lgb3+score_lgb4) / 4
|
||||
print(f'LightGBM score: {score}')
|
||||
|
||||
score = (score_cat1+score_cat2+score_cat3+score_cat4) / 4
|
||||
print(f'Catboost score: {score}')
|
||||
|
||||
# %% [markdown]
|
||||
# ## Inference
|
||||
|
||||
# %% [code] {"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2021-06-26T07:17:02.259872Z","iopub.status.idle":"2021-06-26T07:17:02.260506Z"}}
|
||||
players_cols = ['playerId', 'primaryPositionName']
|
||||
rosters_cols = ['playerId', 'teamId', 'status']
|
||||
scores_cols = ['playerId', 'battingOrder', 'gamesPlayedBatting', 'flyOuts',
|
||||
'groundOuts', 'runsScored', 'doubles', 'triples', 'homeRuns',
|
||||
'strikeOuts', 'baseOnBalls', 'intentionalWalks', 'hits', 'hitByPitch',
|
||||
'atBats', 'caughtStealing', 'stolenBases', 'groundIntoDoublePlay',
|
||||
'groundIntoTriplePlay', 'plateAppearances', 'totalBases', 'rbi',
|
||||
'leftOnBase', 'sacBunts', 'sacFlies', 'catchersInterference',
|
||||
'pickoffs', 'gamesPlayedPitching', 'gamesStartedPitching',
|
||||
'completeGamesPitching', 'shutoutsPitching', 'winsPitching',
|
||||
'lossesPitching', 'flyOutsPitching', 'airOutsPitching',
|
||||
'groundOutsPitching', 'runsPitching', 'doublesPitching',
|
||||
'triplesPitching', 'homeRunsPitching', 'strikeOutsPitching',
|
||||
'baseOnBallsPitching', 'intentionalWalksPitching', 'hitsPitching',
|
||||
'hitByPitchPitching', 'atBatsPitching', 'caughtStealingPitching',
|
||||
'stolenBasesPitching', 'inningsPitched', 'saveOpportunities',
|
||||
'earnedRuns', 'battersFaced', 'outsPitching', 'pitchesThrown', 'balls',
|
||||
'strikes', 'hitBatsmen', 'balks', 'wildPitches', 'pickoffsPitching',
|
||||
'rbiPitching', 'gamesFinishedPitching', 'inheritedRunners',
|
||||
'inheritedRunnersScored', 'catchersInterferencePitching',
|
||||
'sacBuntsPitching', 'sacFliesPitching', 'saves', 'holds', 'blownSaves',
|
||||
'assists', 'putOuts', 'errors', 'chances']
|
||||
|
||||
null = np.nan
|
||||
true = True
|
||||
false = False
|
||||
|
||||
# %% [code] {"execution":{"iopub.status.busy":"2021-06-26T07:17:02.26162Z","iopub.status.idle":"2021-06-26T07:17:02.262287Z"}}
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
from datetime import timedelta
|
||||
from tqdm import tqdm
|
||||
import gc
|
||||
from functools import reduce
|
||||
from sklearn.model_selection import StratifiedKFold
|
||||
|
||||
ROOT_DIR = "../input/mlb-player-digital-engagement-forecasting"
|
||||
|
||||
#=======================#
|
||||
def flatten(df, col):
|
||||
du = (df.pivot(index="playerId", columns="EvalDate",
|
||||
values=col).add_prefix(f"{col}_").
|
||||
rename_axis(None, axis=1).reset_index())
|
||||
return du
|
||||
#============================#
|
||||
def reducer(left, right):
|
||||
return left.merge(right, on="playerId")
|
||||
#========================
|
||||
|
||||
TGTCOLS = ["target1","target2","target3","target4"]
|
||||
def train_lag(df, lag=1):
|
||||
dp = df[["playerId","EvalDate"]+TGTCOLS].copy()
|
||||
dp["EvalDate"] =dp["EvalDate"] + timedelta(days=lag)
|
||||
df = df.merge(dp, on=["playerId", "EvalDate"], suffixes=["",f"_{lag}"], how="left")
|
||||
return df
|
||||
#=================================
|
||||
def test_lag(sub):
|
||||
sub["playerId"] = sub["date_playerId"].apply(lambda s: int( s.split("_")[1] ) )
|
||||
assert sub.date.nunique() == 1
|
||||
dte = sub["date"].unique()[0]
|
||||
|
||||
eval_dt = pd.to_datetime(dte, format="%Y%m%d")
|
||||
dtes = [eval_dt + timedelta(days=-k) for k in LAGS]
|
||||
mp_dtes = {eval_dt + timedelta(days=-k):k for k in LAGS}
|
||||
|
||||
sl = LAST.loc[LAST.EvalDate.between(dtes[-1], dtes[0]), ["EvalDate","playerId"]+TGTCOLS].copy()
|
||||
sl["EvalDate"] = sl["EvalDate"].map(mp_dtes)
|
||||
du = [flatten(sl, col) for col in TGTCOLS]
|
||||
du = reduce(reducer, du)
|
||||
return du, eval_dt
|
||||
#
|
||||
#===============
|
||||
|
||||
tr = pd.read_csv("../input/mlb-data/target.csv")
|
||||
print(tr.shape)
|
||||
gc.collect()
|
||||
|
||||
tr["EvalDate"] = pd.to_datetime(tr["EvalDate"])
|
||||
tr["EvalDate"] = tr["EvalDate"] + timedelta(days=-1)
|
||||
tr["EvalYear"] = tr["EvalDate"].dt.year
|
||||
|
||||
MED_DF = tr.groupby(["playerId","EvalYear"])[TGTCOLS].median().reset_index()
|
||||
MEDCOLS = ["tgt1_med","tgt2_med", "tgt3_med", "tgt4_med"]
|
||||
MED_DF.columns = ["playerId","EvalYear"] + MEDCOLS
|
||||
|
||||
LAGS = list(range(1,21))
|
||||
FECOLS = [f"{col}_{lag}" for lag in reversed(LAGS) for col in TGTCOLS]
|
||||
|
||||
for lag in tqdm(LAGS):
|
||||
tr = train_lag(tr, lag=lag)
|
||||
gc.collect()
|
||||
#===========
|
||||
tr = tr.sort_values(by=["playerId", "EvalDate"])
|
||||
print(tr.shape)
|
||||
tr = tr.dropna()
|
||||
print(tr.shape)
|
||||
tr = tr.merge(MED_DF, on=["playerId","EvalYear"])
|
||||
gc.collect()
|
||||
|
||||
X = tr[FECOLS+MEDCOLS].values
|
||||
y = tr[TGTCOLS].values
|
||||
cl = tr["playerId"].values
|
||||
|
||||
NFOLDS = 6
|
||||
skf = StratifiedKFold(n_splits=NFOLDS)
|
||||
folds = skf.split(X, cl)
|
||||
folds = list(folds)
|
||||
|
||||
import tensorflow as tf
|
||||
import tensorflow.keras.layers as L
|
||||
import tensorflow.keras.models as M
|
||||
from sklearn.metrics import mean_absolute_error, mean_squared_error
|
||||
from tensorflow.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
|
||||
|
||||
tf.random.set_seed(777)
|
||||
|
||||
def make_model(n_in):
|
||||
inp = L.Input(name="inputs", shape=(n_in,))
|
||||
x = L.Dense(50, activation="relu", name="d1")(inp)
|
||||
x = L.Dense(50, activation="relu", name="d2")(x)
|
||||
preds = L.Dense(4, activation="linear", name="preds")(x)
|
||||
|
||||
model = M.Model(inp, preds, name="ANN")
|
||||
model.compile(loss="mean_absolute_error", optimizer="adam")
|
||||
return model
|
||||
|
||||
net = make_model(X.shape[1])
|
||||
print(net.summary())
|
||||
|
||||
oof = np.zeros(y.shape)
|
||||
nets = []
|
||||
for idx in range(NFOLDS):
|
||||
print("FOLD:", idx)
|
||||
tr_idx, val_idx = folds[idx]
|
||||
ckpt = ModelCheckpoint(f"w{idx}.h5", monitor='val_loss', verbose=1, save_best_only=True,mode='min')
|
||||
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,patience=3, min_lr=0.0005)
|
||||
es = EarlyStopping(monitor='val_loss', patience=6)
|
||||
reg = make_model(X.shape[1])
|
||||
# reg.fit(X[tr_idx], y[tr_idx], epochs=10, batch_size=35_000, validation_data=(X[val_idx], y[val_idx]),
|
||||
# verbose=1, callbacks=[ckpt, reduce_lr, es])
|
||||
reg.load_weights(f"w{idx}.h5")
|
||||
oof[val_idx] = reg.predict(X[val_idx], batch_size=50_000, verbose=1)
|
||||
nets.append(reg)
|
||||
gc.collect()
|
||||
#
|
||||
#
|
||||
|
||||
mae = mean_absolute_error(y, oof)
|
||||
mse = mean_squared_error(y, oof, squared=False)
|
||||
print("mae:", mae)
|
||||
print("mse:", mse)
|
||||
|
||||
# Historical information to use in prediction time
|
||||
bound_dt = pd.to_datetime("2021-01-01")
|
||||
LAST = tr.loc[tr.EvalDate>bound_dt].copy()
|
||||
|
||||
LAST_MED_DF = MED_DF.loc[MED_DF.EvalYear==2021].copy()
|
||||
LAST_MED_DF.drop("EvalYear", axis=1, inplace=True)
|
||||
del tr
|
||||
|
||||
#"""
|
||||
import mlb
|
||||
FE = []; SUB = [];
|
||||
|
||||
# %% [markdown]
|
||||
# <div class="alert alert-success">
|
||||
# </div>
|
||||
|
||||
# %% [code] {"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2021-06-26T07:17:02.263332Z","iopub.status.idle":"2021-06-26T07:17:02.263974Z"}}
|
||||
import copy
|
||||
|
||||
env = mlb.make_env() # initialize the environment
|
||||
iter_test = env.iter_test() # iterator which loops over each date in test set
|
||||
|
||||
for (test_df, sample_prediction_df) in iter_test: # make predictions here
|
||||
|
||||
sub = copy.deepcopy(sample_prediction_df.reset_index())
|
||||
sample_prediction_df = copy.deepcopy(sample_prediction_df.reset_index(drop=True))
|
||||
|
||||
# LGBM summit
|
||||
# creat dataset
|
||||
sample_prediction_df['playerId'] = sample_prediction_df['date_playerId']\
|
||||
.map(lambda x: int(x.split('_')[1]))
|
||||
# Dealing with missing values
|
||||
if test_df['rosters'].iloc[0] == test_df['rosters'].iloc[0]:
|
||||
test_rosters = pd.DataFrame(eval(test_df['rosters'].iloc[0]))
|
||||
else:
|
||||
test_rosters = pd.DataFrame({'playerId': sample_prediction_df['playerId']})
|
||||
for col in rosters.columns:
|
||||
if col == 'playerId': continue
|
||||
test_rosters[col] = np.nan
|
||||
|
||||
if test_df['playerBoxScores'].iloc[0] == test_df['playerBoxScores'].iloc[0]:
|
||||
test_scores = pd.DataFrame(eval(test_df['playerBoxScores'].iloc[0]))
|
||||
else:
|
||||
test_scores = pd.DataFrame({'playerId': sample_prediction_df['playerId']})
|
||||
for col in scores.columns:
|
||||
if col == 'playerId': continue
|
||||
test_scores[col] = np.nan
|
||||
test_scores = test_scores.groupby('playerId').sum().reset_index()
|
||||
test = sample_prediction_df[['playerId']].copy()
|
||||
test = test.merge(players[players_cols], on='playerId', how='left')
|
||||
test = test.merge(test_rosters[rosters_cols], on='playerId', how='left')
|
||||
test = test.merge(test_scores[scores_cols], on='playerId', how='left')
|
||||
test = test.merge(player_target_stats, how='inner', left_on=["playerId"],right_on=["playerId"])
|
||||
|
||||
|
||||
test['label_playerId'] = test['playerId'].map(player2num)
|
||||
test['label_primaryPositionName'] = test['primaryPositionName'].map(position2num)
|
||||
test['label_teamId'] = test['teamId'].map(teamid2num)
|
||||
test['label_status'] = test['status'].map(status2num)
|
||||
|
||||
test_X = test[feature_cols]
|
||||
# predict
|
||||
pred1 = model1.predict(test_X)
|
||||
|
||||
# predict
|
||||
pred_lgd1 = model_lgb1.predict(test_X)
|
||||
pred_lgd2 = model_lgb2.predict(test_X)
|
||||
pred_lgd3 = model_lgb3.predict(test_X)
|
||||
pred_lgd4 = model_lgb4.predict(test_X)
|
||||
|
||||
pred_cat1 = model_cb1.predict(test_X)
|
||||
pred_cat2 = model_cb2.predict(test_X)
|
||||
pred_cat3 = model_cb3.predict(test_X)
|
||||
pred_cat4 = model_cb4.predict(test_X)
|
||||
|
||||
test['target1'] = np.clip(pred1,0,100)
|
||||
test_X = test[feature_cols2]
|
||||
|
||||
pred2 = model2.predict(test_X)
|
||||
pred3 = model3.predict(test_X)
|
||||
pred4 = model4.predict(test_X)
|
||||
|
||||
# merge submission
|
||||
sample_prediction_df['target1'] = 0.65*np.clip(pred1, 0, 100)+0.25*np.clip(pred_lgd1, 0, 100)+0.10*np.clip(pred_cat1, 0, 100)
|
||||
sample_prediction_df['target2'] = 0.65*np.clip(pred2, 0, 100)+0.25*np.clip(pred_lgd2, 0, 100)+0.10*np.clip(pred_cat2, 0, 100)
|
||||
sample_prediction_df['target3'] = 0.65*np.clip(pred3, 0, 100)+0.25*np.clip(pred_lgd3, 0, 100)+0.10*np.clip(pred_cat3, 0, 100)
|
||||
sample_prediction_df['target4'] = 0.65*np.clip(pred4, 0, 100)+0.25*np.clip(pred_lgd4, 0, 100)+0.10*np.clip(pred_cat4, 0, 100)
|
||||
sample_prediction_df = sample_prediction_df.fillna(0.)
|
||||
del sample_prediction_df['playerId']
|
||||
# TF summit
|
||||
# Features computation at Evaluation Date
|
||||
sub_fe, eval_dt = test_lag(sub)
|
||||
sub_fe = sub_fe.merge(LAST_MED_DF, on="playerId", how="left")
|
||||
sub_fe = sub_fe.fillna(0.)
|
||||
|
||||
_preds = 0.
|
||||
for reg in nets:
|
||||
_preds += reg.predict(sub_fe[FECOLS + MEDCOLS]) / NFOLDS
|
||||
sub_fe[TGTCOLS] = np.clip(_preds, 0, 100)
|
||||
sub.drop(["date"]+TGTCOLS, axis=1, inplace=True)
|
||||
sub = sub.merge(sub_fe[["playerId"]+TGTCOLS], on="playerId", how="left")
|
||||
sub.drop("playerId", axis=1, inplace=True)
|
||||
sub = sub.fillna(0.)
|
||||
# Blending
|
||||
blend = pd.concat(
|
||||
[sub[['date_playerId']],
|
||||
(0.35*sub.drop('date_playerId', axis=1) + 0.65*sample_prediction_df.drop('date_playerId', axis=1))],
|
||||
axis=1
|
||||
)
|
||||
env.predict(blend)
|
||||
# Update Available information
|
||||
sub_fe["EvalDate"] = eval_dt
|
||||
#sub_fe.drop(MEDCOLS, axis=1, inplace=True)
|
||||
LAST = LAST.append(sub_fe)
|
||||
LAST = LAST.drop_duplicates(subset=["EvalDate","playerId"], keep="last")
|
||||
|
||||
# %% [code] {"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2021-06-26T07:17:02.264951Z","iopub.status.idle":"2021-06-26T07:17:02.265581Z"}}
|
||||
pd.concat(
|
||||
[sub[['date_playerId']],
|
||||
(sub.drop('date_playerId', axis=1) + sample_prediction_df.drop('date_playerId', axis=1)) / 2],
|
||||
axis=1
|
||||
)
|
||||
|
||||
# %% [code] {"jupyter":{"outputs_hidden":false},"execution":{"iopub.status.busy":"2021-06-26T07:17:02.26657Z","iopub.status.idle":"2021-06-26T07:17:02.267169Z"}}
|
||||
sample_prediction_df
|
||||
|
||||
# %% [markdown]
|
||||
# <div class="alert alert-success">
|
||||
# </div>
|
||||
1399
d1/mlb_player_v2.py
1399
d1/mlb_player_v2.py
File diff suppressed because it is too large
Load Diff
@ -1,168 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
# coding: utf-8
|
||||
|
||||
# # Overview
|
||||
# The kernel shows how to use the [tf_pose_estimation](https://github.com/ildoonet/tf-pose-estimation) package in Python on a series of running videos.
|
||||
|
||||
# ## Libraries we need
|
||||
# Install tf_pose and pycocotools
|
||||
|
||||
# In[1]:
|
||||
|
||||
|
||||
import os
|
||||
def get_ipython():
|
||||
return os
|
||||
|
||||
get_ipython().system('pip install -qq https://www.github.com/ildoonet/tf-pose-estimation')
|
||||
|
||||
|
||||
# In[2]:
|
||||
|
||||
|
||||
get_ipython().system('pip install -qq pycocotools')
|
||||
|
||||
|
||||
# In[3]:
|
||||
|
||||
|
||||
get_ipython().run_line_magic('load_ext', 'autoreload')
|
||||
get_ipython().run_line_magic('autoreload', '2')
|
||||
import seaborn as sns
|
||||
import matplotlib.pyplot as plt
|
||||
plt.rcParams["figure.figsize"] = (8, 8)
|
||||
plt.rcParams["figure.dpi"] = 125
|
||||
plt.rcParams["font.size"] = 14
|
||||
plt.rcParams['font.family'] = ['sans-serif']
|
||||
plt.rcParams['font.sans-serif'] = ['DejaVu Sans']
|
||||
plt.style.use('ggplot')
|
||||
sns.set_style("whitegrid", {'axes.grid': False})
|
||||
|
||||
|
||||
# In[4]:
|
||||
|
||||
|
||||
get_ipython().run_line_magic('matplotlib', 'inline')
|
||||
import tf_pose
|
||||
import cv2
|
||||
from glob import glob
|
||||
from tqdm import tqdm_notebook
|
||||
from PIL import Image
|
||||
import numpy as np
|
||||
import os
|
||||
def video_gen(in_path):
|
||||
c_cap = cv2.VideoCapture(in_path)
|
||||
while c_cap.isOpened():
|
||||
ret, frame = c_cap.read()
|
||||
if not ret:
|
||||
break
|
||||
yield c_cap.get(cv2.CAP_PROP_POS_MSEC), frame[:, :, ::-1]
|
||||
c_cap.release()
|
||||
|
||||
|
||||
# In[5]:
|
||||
|
||||
|
||||
video_paths = glob('../input/*.mp4')
|
||||
c_video = video_gen(video_paths[0])
|
||||
for _ in range(300):
|
||||
c_ts, c_frame = next(c_video)
|
||||
plt.imshow(c_frame)
|
||||
|
||||
|
||||
# In[6]:
|
||||
|
||||
|
||||
from tf_pose.estimator import TfPoseEstimator
|
||||
from tf_pose.networks import get_graph_path, model_wh
|
||||
tfpe = tf_pose.get_estimator()
|
||||
|
||||
|
||||
# In[7]:
|
||||
|
||||
|
||||
humans = tfpe.inference(npimg=c_frame, upsample_size=4.0)
|
||||
print(humans)
|
||||
|
||||
|
||||
# In[8]:
|
||||
|
||||
|
||||
new_image = TfPoseEstimator.draw_humans(c_frame[:, :, ::-1], humans, imgcopy=False)
|
||||
fig, ax1 = plt.subplots(1, 1, figsize=(10, 10))
|
||||
ax1.imshow(new_image[:, :, ::-1])
|
||||
|
||||
|
||||
# In[9]:
|
||||
|
||||
|
||||
body_to_dict = lambda c_fig: {'bp_{}_{}'.format(k, vec_name): vec_val
|
||||
for k, part_vec in c_fig.body_parts.items()
|
||||
for vec_name, vec_val in zip(['x', 'y', 'score'],
|
||||
(part_vec.x, 1-part_vec.y, part_vec.score))}
|
||||
c_fig = humans[0]
|
||||
body_to_dict(c_fig)
|
||||
|
||||
|
||||
# In[10]:
|
||||
|
||||
|
||||
MAX_FRAMES = 200
|
||||
body_pose_list = []
|
||||
for vid_path in tqdm_notebook(video_paths, desc='Files'):
|
||||
c_video = video_gen(vid_path)
|
||||
c_ts, c_frame = next(c_video)
|
||||
out_path = '{}_out.avi'.format(os.path.split(vid_path)[1])
|
||||
out = cv2.VideoWriter(out_path,
|
||||
cv2.VideoWriter_fourcc('M','J','P','G'),
|
||||
10,
|
||||
(c_frame.shape[1], c_frame.shape[0]))
|
||||
for (c_ts, c_frame), _ in zip(c_video,
|
||||
tqdm_notebook(range(MAX_FRAMES), desc='Frames')):
|
||||
bgr_frame = c_frame[:,:,::-1]
|
||||
humans = tfpe.inference(npimg=bgr_frame, upsample_size=4.0)
|
||||
for c_body in humans:
|
||||
body_pose_list += [dict(video=out_path, time=c_ts, **body_to_dict(c_body))]
|
||||
new_image = TfPoseEstimator.draw_humans(bgr_frame, humans, imgcopy=False)
|
||||
out.write(new_image)
|
||||
out.release()
|
||||
|
||||
|
||||
# In[11]:
|
||||
|
||||
|
||||
import pandas as pd
|
||||
body_pose_df = pd.DataFrame(body_pose_list)
|
||||
body_pose_df.describe()
|
||||
|
||||
|
||||
# In[12]:
|
||||
|
||||
|
||||
fig, m_axs = plt.subplots(1, 2, figsize=(15, 5))
|
||||
for c_ax, (c_name, c_rows) in zip(m_axs, body_pose_df.groupby('video')):
|
||||
for i in range(17):
|
||||
c_ax.plot(c_rows['time'], c_rows['bp_{}_y'.format(i)], label='x {}'.format(i))
|
||||
c_ax.legend()
|
||||
c_ax.set_title(c_name)
|
||||
|
||||
|
||||
# In[13]:
|
||||
|
||||
|
||||
fig, m_axs = plt.subplots(1, 2, figsize=(15, 5))
|
||||
for c_ax, (c_name, n_rows) in zip(m_axs, body_pose_df.groupby('video')):
|
||||
for i in range(17):
|
||||
c_rows = n_rows.query('bp_{}_score>0.6'.format(i)) # only keep confident results
|
||||
c_ax.plot(c_rows['bp_{}_x'.format(i)], c_rows['bp_{}_y'.format(i)], label='BP {}'.format(i))
|
||||
c_ax.legend()
|
||||
c_ax.set_title(c_name)
|
||||
|
||||
|
||||
# In[14]:
|
||||
|
||||
|
||||
body_pose_df.to_csv('body_pose.csv', index=False)
|
||||
|
||||
|
||||
# In[15]:
|
||||
@ -1,576 +0,0 @@
|
||||
#!/usr/bin/env python
|
||||
# coding: utf-8
|
||||
|
||||
#
|
||||
#
|
||||
# NOTE: Turn on Internet and GPU
|
||||
|
||||
# The code hidden below handles all the imports and function definitions (the heavy lifting). If you're a beginner I'd advice you skip this for now. When you are able to understand the rest of the code, come back here and understand each function to get a deeper knowledge.
|
||||
|
||||
# In[1]:
|
||||
|
||||
|
||||
# !/usr/bin/env python3
|
||||
# coding=utf-8
|
||||
# author=dave.fang@outlook.com
|
||||
# create=20171225
|
||||
|
||||
import os
|
||||
import pprint
|
||||
import cv2
|
||||
import sys
|
||||
import math
|
||||
import time
|
||||
import tempfile
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
import torch.nn.parallel
|
||||
import torch.backends.cudnn as cudnn
|
||||
import torch.optim as optim
|
||||
import torchvision.transforms as transforms
|
||||
import torchvision.datasets as datasets
|
||||
import torchvision.models as models
|
||||
|
||||
from torch.autograd import Variable
|
||||
|
||||
from scipy.ndimage.filters import gaussian_filter
|
||||
|
||||
#get_ipython().run_line_magic('matplotlib', 'inline')
|
||||
#get_ipython().run_line_magic('config', "InlineBackend.figure_format = 'retina'")
|
||||
|
||||
# find connection in the specified sequence, center 29 is in the position 15
|
||||
limb_seq = [[2, 3], [2, 6], [3, 4], [4, 5], [6, 7], [7, 8], [2, 9], [9, 10],
|
||||
[10, 11], [2, 12], [12, 13], [13, 14], [2, 1], [1, 15], [15, 17],
|
||||
[1, 16], [16, 18], [3, 17], [6, 18]]
|
||||
|
||||
# the middle joints heatmap correpondence
|
||||
map_ids = [[31, 32], [39, 40], [33, 34], [35, 36], [41, 42], [43, 44], [19, 20], [21, 22],
|
||||
[23, 24], [25, 26], [27, 28], [29, 30], [47, 48], [49, 50], [53, 54], [51, 52],
|
||||
[55, 56], [37, 38], [45, 46]]
|
||||
|
||||
# these are the colours for the 18 body points
|
||||
colors = [[255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], [85, 255, 0], [0, 255, 0],
|
||||
[0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255],
|
||||
[170, 0, 255], [255, 0, 255], [255, 0, 170], [255, 0, 85]]
|
||||
|
||||
|
||||
class PoseEstimation(nn.Module):
|
||||
def __init__(self, model_dict):
|
||||
super(PoseEstimation, self).__init__()
|
||||
|
||||
self.model0 = model_dict['block_0']
|
||||
self.model1_1 = model_dict['block1_1']
|
||||
self.model2_1 = model_dict['block2_1']
|
||||
self.model3_1 = model_dict['block3_1']
|
||||
self.model4_1 = model_dict['block4_1']
|
||||
self.model5_1 = model_dict['block5_1']
|
||||
self.model6_1 = model_dict['block6_1']
|
||||
|
||||
self.model1_2 = model_dict['block1_2']
|
||||
self.model2_2 = model_dict['block2_2']
|
||||
self.model3_2 = model_dict['block3_2']
|
||||
self.model4_2 = model_dict['block4_2']
|
||||
self.model5_2 = model_dict['block5_2']
|
||||
self.model6_2 = model_dict['block6_2']
|
||||
|
||||
def forward(self, x):
|
||||
out1 = self.model0(x)
|
||||
|
||||
out1_1 = self.model1_1(out1)
|
||||
out1_2 = self.model1_2(out1)
|
||||
out2 = torch.cat([out1_1, out1_2, out1], 1)
|
||||
|
||||
out2_1 = self.model2_1(out2)
|
||||
out2_2 = self.model2_2(out2)
|
||||
out3 = torch.cat([out2_1, out2_2, out1], 1)
|
||||
|
||||
out3_1 = self.model3_1(out3)
|
||||
out3_2 = self.model3_2(out3)
|
||||
out4 = torch.cat([out3_1, out3_2, out1], 1)
|
||||
|
||||
out4_1 = self.model4_1(out4)
|
||||
out4_2 = self.model4_2(out4)
|
||||
out5 = torch.cat([out4_1, out4_2, out1], 1)
|
||||
|
||||
out5_1 = self.model5_1(out5)
|
||||
out5_2 = self.model5_2(out5)
|
||||
out6 = torch.cat([out5_1, out5_2, out1], 1)
|
||||
|
||||
out6_1 = self.model6_1(out6)
|
||||
out6_2 = self.model6_2(out6)
|
||||
|
||||
return out6_1, out6_2
|
||||
|
||||
|
||||
def make_layers(layer_dict):
|
||||
layers = []
|
||||
|
||||
for i in range(len(layer_dict) - 1):
|
||||
layer = layer_dict[i]
|
||||
for k in layer:
|
||||
v = layer[k]
|
||||
if 'pool' in k:
|
||||
layers += [nn.MaxPool2d(kernel_size=v[0], stride=v[1], padding=v[2])]
|
||||
else:
|
||||
conv2d = nn.Conv2d(in_channels=v[0], out_channels=v[1], kernel_size=v[2], stride=v[3], padding=v[4])
|
||||
layers += [conv2d, nn.ReLU(inplace=True)]
|
||||
layer = list(layer_dict[-1].keys())
|
||||
k = layer[0]
|
||||
v = layer_dict[-1][k]
|
||||
|
||||
conv2d = nn.Conv2d(in_channels=v[0], out_channels=v[1], kernel_size=v[2], stride=v[3], padding=v[4])
|
||||
layers += [conv2d]
|
||||
|
||||
return nn.Sequential(*layers)
|
||||
|
||||
|
||||
def get_pose_model():
|
||||
blocks = {}
|
||||
|
||||
block_0 = [{'conv1_1': [3, 64, 3, 1, 1]}, {'conv1_2': [64, 64, 3, 1, 1]}, {'pool1_stage1': [2, 2, 0]},
|
||||
{'conv2_1': [64, 128, 3, 1, 1]}, {'conv2_2': [128, 128, 3, 1, 1]}, {'pool2_stage1': [2, 2, 0]},
|
||||
{'conv3_1': [128, 256, 3, 1, 1]}, {'conv3_2': [256, 256, 3, 1, 1]}, {'conv3_3': [256, 256, 3, 1, 1]},
|
||||
{'conv3_4': [256, 256, 3, 1, 1]}, {'pool3_stage1': [2, 2, 0]}, {'conv4_1': [256, 512, 3, 1, 1]},
|
||||
{'conv4_2': [512, 512, 3, 1, 1]}, {'conv4_3_CPM': [512, 256, 3, 1, 1]},
|
||||
{'conv4_4_CPM': [256, 128, 3, 1, 1]}]
|
||||
|
||||
blocks['block1_1'] = [{'conv5_1_CPM_L1': [128, 128, 3, 1, 1]}, {'conv5_2_CPM_L1': [128, 128, 3, 1, 1]},
|
||||
{'conv5_3_CPM_L1': [128, 128, 3, 1, 1]}, {'conv5_4_CPM_L1': [128, 512, 1, 1, 0]},
|
||||
{'conv5_5_CPM_L1': [512, 38, 1, 1, 0]}]
|
||||
|
||||
blocks['block1_2'] = [{'conv5_1_CPM_L2': [128, 128, 3, 1, 1]}, {'conv5_2_CPM_L2': [128, 128, 3, 1, 1]},
|
||||
{'conv5_3_CPM_L2': [128, 128, 3, 1, 1]}, {'conv5_4_CPM_L2': [128, 512, 1, 1, 0]},
|
||||
{'conv5_5_CPM_L2': [512, 19, 1, 1, 0]}]
|
||||
|
||||
for i in range(2, 7):
|
||||
blocks['block%d_1' % i] = [{'Mconv1_stage%d_L1' % i: [185, 128, 7, 1, 3]},
|
||||
{'Mconv2_stage%d_L1' % i: [128, 128, 7, 1, 3]},
|
||||
{'Mconv3_stage%d_L1' % i: [128, 128, 7, 1, 3]},
|
||||
{'Mconv4_stage%d_L1' % i: [128, 128, 7, 1, 3]},
|
||||
{'Mconv5_stage%d_L1' % i: [128, 128, 7, 1, 3]},
|
||||
{'Mconv6_stage%d_L1' % i: [128, 128, 1, 1, 0]},
|
||||
{'Mconv7_stage%d_L1' % i: [128, 38, 1, 1, 0]}]
|
||||
blocks['block%d_2' % i] = [{'Mconv1_stage%d_L2' % i: [185, 128, 7, 1, 3]},
|
||||
{'Mconv2_stage%d_L2' % i: [128, 128, 7, 1, 3]},
|
||||
{'Mconv3_stage%d_L2' % i: [128, 128, 7, 1, 3]},
|
||||
{'Mconv4_stage%d_L2' % i: [128, 128, 7, 1, 3]},
|
||||
{'Mconv5_stage%d_L2' % i: [128, 128, 7, 1, 3]},
|
||||
{'Mconv6_stage%d_L2' % i: [128, 128, 1, 1, 0]},
|
||||
{'Mconv7_stage%d_L2' % i: [128, 19, 1, 1, 0]}]
|
||||
|
||||
layers = []
|
||||
for block in block_0:
|
||||
# print(block)
|
||||
for key in block:
|
||||
v = block[key]
|
||||
if 'pool' in key:
|
||||
layers += [nn.MaxPool2d(kernel_size=v[0], stride=v[1], padding=v[2])]
|
||||
else:
|
||||
conv2d = nn.Conv2d(in_channels=v[0], out_channels=v[1], kernel_size=v[2], stride=v[3], padding=v[4])
|
||||
layers += [conv2d, nn.ReLU(inplace=True)]
|
||||
|
||||
models = {
|
||||
'block_0': nn.Sequential(*layers)
|
||||
}
|
||||
|
||||
for k in blocks:
|
||||
v = blocks[k]
|
||||
models[k] = make_layers(v)
|
||||
|
||||
return PoseEstimation(models)
|
||||
|
||||
|
||||
def get_paf_and_heatmap(model, img_raw, scale_search, param_stride=8, box_size=368):
|
||||
multiplier = [scale * box_size / img_raw.shape[0] for scale in scale_search]
|
||||
|
||||
heatmap_avg = torch.zeros((len(multiplier), 19, img_raw.shape[0], img_raw.shape[1])).cuda()
|
||||
paf_avg = torch.zeros((len(multiplier), 38, img_raw.shape[0], img_raw.shape[1])).cuda()
|
||||
|
||||
for i, scale in enumerate(multiplier):
|
||||
img_test = cv2.resize(img_raw, (0, 0), fx=scale, fy=scale, interpolation=cv2.INTER_CUBIC)
|
||||
img_test_pad, pad = pad_right_down_corner(img_test, param_stride, param_stride)
|
||||
img_test_pad = np.transpose(np.float32(img_test_pad[:, :, :, np.newaxis]), (3, 2, 0, 1)) / 256 - 0.5
|
||||
|
||||
feed = Variable(torch.from_numpy(img_test_pad)).cuda()
|
||||
output1, output2 = model(feed)
|
||||
|
||||
print(output1.size())
|
||||
print(output2.size())
|
||||
|
||||
heatmap = nn.UpsamplingBilinear2d((img_raw.shape[0], img_raw.shape[1])).cuda()(output2)
|
||||
|
||||
paf = nn.UpsamplingBilinear2d((img_raw.shape[0], img_raw.shape[1])).cuda()(output1)
|
||||
|
||||
heatmap_avg[i] = heatmap[0].data
|
||||
paf_avg[i] = paf[0].data
|
||||
|
||||
heatmap_avg = torch.transpose(torch.transpose(torch.squeeze(torch.mean(heatmap_avg, 0)), 0, 1), 1, 2).cuda()
|
||||
heatmap_avg = heatmap_avg.cpu().numpy()
|
||||
|
||||
paf_avg = torch.transpose(torch.transpose(torch.squeeze(torch.mean(paf_avg, 0)), 0, 1), 1, 2).cuda()
|
||||
paf_avg = paf_avg.cpu().numpy()
|
||||
|
||||
return paf_avg, heatmap_avg
|
||||
|
||||
|
||||
def extract_heatmap_info(heatmap_avg, param_thre1=0.1):
|
||||
all_peaks = []
|
||||
peak_counter = 0
|
||||
|
||||
for part in range(18):
|
||||
map_ori = heatmap_avg[:, :, part]
|
||||
map_gau = gaussian_filter(map_ori, sigma=3)
|
||||
|
||||
map_left = np.zeros(map_gau.shape)
|
||||
map_left[1:, :] = map_gau[:-1, :]
|
||||
map_right = np.zeros(map_gau.shape)
|
||||
map_right[:-1, :] = map_gau[1:, :]
|
||||
map_up = np.zeros(map_gau.shape)
|
||||
map_up[:, 1:] = map_gau[:, :-1]
|
||||
map_down = np.zeros(map_gau.shape)
|
||||
map_down[:, :-1] = map_gau[:, 1:]
|
||||
|
||||
peaks_binary = np.logical_and.reduce(
|
||||
(map_gau >= map_left, map_gau >= map_right, map_gau >= map_up,
|
||||
map_gau >= map_down, map_gau > param_thre1))
|
||||
|
||||
peaks = zip(np.nonzero(peaks_binary)[1], np.nonzero(peaks_binary)[0]) # note reverse
|
||||
peaks = list(peaks)
|
||||
peaks_with_score = [x + (map_ori[x[1], x[0]],) for x in peaks]
|
||||
ids = range(peak_counter, peak_counter + len(peaks))
|
||||
peaks_with_score_and_id = [peaks_with_score[i] + (ids[i],) for i in range(len(ids))]
|
||||
|
||||
all_peaks.append(peaks_with_score_and_id)
|
||||
peak_counter += len(peaks)
|
||||
|
||||
return all_peaks
|
||||
|
||||
|
||||
def extract_paf_info(img_raw, paf_avg, all_peaks, param_thre2=0.05, param_thre3=0.5):
|
||||
connection_all = []
|
||||
special_k = []
|
||||
mid_num = 10
|
||||
|
||||
for k in range(len(map_ids)):
|
||||
score_mid = paf_avg[:, :, [x - 19 for x in map_ids[k]]]
|
||||
candA = all_peaks[limb_seq[k][0] - 1]
|
||||
candB = all_peaks[limb_seq[k][1] - 1]
|
||||
nA = len(candA)
|
||||
nB = len(candB)
|
||||
if nA != 0 and nB != 0:
|
||||
connection_candidate = []
|
||||
for i in range(nA):
|
||||
for j in range(nB):
|
||||
vec = np.subtract(candB[j][:2], candA[i][:2])
|
||||
norm = math.sqrt(vec[0] * vec[0] + vec[1] * vec[1])
|
||||
vec = np.divide(vec, norm)
|
||||
|
||||
startend = zip(np.linspace(candA[i][0], candB[j][0], num=mid_num),
|
||||
np.linspace(candA[i][1], candB[j][1], num=mid_num))
|
||||
startend = list(startend)
|
||||
|
||||
vec_x = np.array([score_mid[int(round(startend[I][1])), int(round(startend[I][0])), 0]
|
||||
for I in range(len(startend))])
|
||||
vec_y = np.array([score_mid[int(round(startend[I][1])), int(round(startend[I][0])), 1]
|
||||
for I in range(len(startend))])
|
||||
|
||||
score_midpts = np.multiply(vec_x, vec[0]) + np.multiply(vec_y, vec[1])
|
||||
score_with_dist_prior = sum(score_midpts) / len(score_midpts)
|
||||
score_with_dist_prior += min(0.5 * img_raw.shape[0] / norm - 1, 0)
|
||||
|
||||
criterion1 = len(np.nonzero(score_midpts > param_thre2)[0]) > 0.8 * len(score_midpts)
|
||||
criterion2 = score_with_dist_prior > 0
|
||||
if criterion1 and criterion2:
|
||||
connection_candidate.append(
|
||||
[i, j, score_with_dist_prior, score_with_dist_prior + candA[i][2] + candB[j][2]])
|
||||
|
||||
connection_candidate = sorted(connection_candidate, key=lambda x: x[2], reverse=True)
|
||||
connection = np.zeros((0, 5))
|
||||
for c in range(len(connection_candidate)):
|
||||
i, j, s = connection_candidate[c][0:3]
|
||||
if i not in connection[:, 3] and j not in connection[:, 4]:
|
||||
connection = np.vstack([connection, [candA[i][3], candB[j][3], s, i, j]])
|
||||
if len(connection) >= min(nA, nB):
|
||||
break
|
||||
|
||||
connection_all.append(connection)
|
||||
else:
|
||||
special_k.append(k)
|
||||
connection_all.append([])
|
||||
|
||||
return special_k, connection_all
|
||||
|
||||
|
||||
def get_subsets(connection_all, special_k, all_peaks):
|
||||
# last number in each row is the total parts number of that person
|
||||
# the second last number in each row is the score of the overall configuration
|
||||
subset = -1 * np.ones((0, 20))
|
||||
candidate = np.array([item for sublist in all_peaks for item in sublist])
|
||||
|
||||
for k in range(len(map_ids)):
|
||||
if k not in special_k:
|
||||
partAs = connection_all[k][:, 0]
|
||||
partBs = connection_all[k][:, 1]
|
||||
indexA, indexB = np.array(limb_seq[k]) - 1
|
||||
|
||||
for i in range(len(connection_all[k])): # = 1:size(temp,1)
|
||||
found = 0
|
||||
subset_idx = [-1, -1]
|
||||
for j in range(len(subset)): # 1:size(subset,1):
|
||||
if subset[j][indexA] == partAs[i] or subset[j][indexB] == partBs[i]:
|
||||
subset_idx[found] = j
|
||||
found += 1
|
||||
|
||||
if found == 1:
|
||||
j = subset_idx[0]
|
||||
if (subset[j][indexB] != partBs[i]):
|
||||
subset[j][indexB] = partBs[i]
|
||||
subset[j][-1] += 1
|
||||
subset[j][-2] += candidate[partBs[i].astype(int), 2] + connection_all[k][i][2]
|
||||
elif found == 2: # if found 2 and disjoint, merge them
|
||||
j1, j2 = subset_idx
|
||||
print("found = 2")
|
||||
membership = ((subset[j1] >= 0).astype(int) + (subset[j2] >= 0).astype(int))[:-2]
|
||||
if len(np.nonzero(membership == 2)[0]) == 0: # merge
|
||||
subset[j1][:-2] += (subset[j2][:-2] + 1)
|
||||
subset[j1][-2:] += subset[j2][-2:]
|
||||
subset[j1][-2] += connection_all[k][i][2]
|
||||
subset = np.delete(subset, j2, 0)
|
||||
else: # as like found == 1
|
||||
subset[j1][indexB] = partBs[i]
|
||||
subset[j1][-1] += 1
|
||||
subset[j1][-2] += candidate[partBs[i].astype(int), 2] + connection_all[k][i][2]
|
||||
|
||||
# if find no partA in the subset, create a new subset
|
||||
elif not found and k < 17:
|
||||
row = -1 * np.ones(20)
|
||||
row[indexA] = partAs[i]
|
||||
row[indexB] = partBs[i]
|
||||
row[-1] = 2
|
||||
row[-2] = sum(candidate[connection_all[k][i, :2].astype(int), 2]) + connection_all[k][i][2]
|
||||
subset = np.vstack([subset, row])
|
||||
return subset, candidate
|
||||
|
||||
|
||||
def draw_key_point(subset, all_peaks, img_raw):
|
||||
del_ids = []
|
||||
for i in range(len(subset)):
|
||||
if subset[i][-1] < 4 or subset[i][-2] / subset[i][-1] < 0.4:
|
||||
del_ids.append(i)
|
||||
subset = np.delete(subset, del_ids, axis=0)
|
||||
|
||||
img_canvas = img_raw.copy() # B,G,R order
|
||||
|
||||
for i in range(18):
|
||||
for j in range(len(all_peaks[i])):
|
||||
cv2.circle(img_canvas, all_peaks[i][j][0:2], 4, colors[i], thickness=-1)
|
||||
|
||||
return subset, img_canvas
|
||||
|
||||
|
||||
def link_key_point(img_canvas, candidate, subset, stickwidth=4):
|
||||
for i in range(17):
|
||||
for n in range(len(subset)):
|
||||
index = subset[n][np.array(limb_seq[i]) - 1]
|
||||
if -1 in index:
|
||||
continue
|
||||
cur_canvas = img_canvas.copy()
|
||||
Y = candidate[index.astype(int), 0]
|
||||
X = candidate[index.astype(int), 1]
|
||||
mX = np.mean(X)
|
||||
mY = np.mean(Y)
|
||||
length = ((X[0] - X[1]) ** 2 + (Y[0] - Y[1]) ** 2) ** 0.5
|
||||
angle = math.degrees(math.atan2(X[0] - X[1], Y[0] - Y[1]))
|
||||
polygon = cv2.ellipse2Poly((int(mY), int(mX)), (int(length / 2), stickwidth), int(angle), 0, 360, 1)
|
||||
cv2.fillConvexPoly(cur_canvas, polygon, colors[i])
|
||||
img_canvas = cv2.addWeighted(img_canvas, 0.4, cur_canvas, 0.6, 0)
|
||||
|
||||
return img_canvas
|
||||
|
||||
def pad_right_down_corner(img, stride, pad_value):
|
||||
h = img.shape[0]
|
||||
w = img.shape[1]
|
||||
|
||||
pad = 4 * [None]
|
||||
pad[0] = 0 # up
|
||||
pad[1] = 0 # left
|
||||
pad[2] = 0 if (h % stride == 0) else stride - (h % stride) # down
|
||||
pad[3] = 0 if (w % stride == 0) else stride - (w % stride) # right
|
||||
|
||||
img_padded = img
|
||||
pad_up = np.tile(img_padded[0:1, :, :] * 0 + pad_value, (pad[0], 1, 1))
|
||||
img_padded = np.concatenate((pad_up, img_padded), axis=0)
|
||||
pad_left = np.tile(img_padded[:, 0:1, :] * 0 + pad_value, (1, pad[1], 1))
|
||||
img_padded = np.concatenate((pad_left, img_padded), axis=1)
|
||||
pad_down = np.tile(img_padded[-2:-1, :, :] * 0 + pad_value, (pad[2], 1, 1))
|
||||
img_padded = np.concatenate((img_padded, pad_down), axis=0)
|
||||
pad_right = np.tile(img_padded[:, -2:-1, :] * 0 + pad_value, (1, pad[3], 1))
|
||||
img_padded = np.concatenate((img_padded, pad_right), axis=1)
|
||||
|
||||
return img_padded, pad
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
print(get_pose_model())
|
||||
|
||||
|
||||
# First let's download the pre-trained model.
|
||||
|
||||
# In[2]:
|
||||
|
||||
|
||||
# Using gdown to download the model directly from Google Drive
|
||||
|
||||
#assert os.system(' conda install -y gdown') == 0
|
||||
import gdown
|
||||
|
||||
|
||||
# In[3]:
|
||||
|
||||
|
||||
model = 'coco_pose_iter_440000.pth.tar'
|
||||
if not os.path.exists(model):
|
||||
url = 'https://drive.google.com/u/0/uc?export=download&confirm=f_Ix&id=0B1asvDK18cu_MmY1ZkpaOUhhRHM'
|
||||
gdown.download(
|
||||
url,
|
||||
model,
|
||||
quiet=False
|
||||
)
|
||||
|
||||
|
||||
# In[4]:
|
||||
|
||||
|
||||
state_dict = torch.load('./coco_pose_iter_440000.pth.tar')['state_dict'] # getting the pre-trained model's parameters
|
||||
# A state_dict is simply a Python dictionary object that maps each layer to its parameter tensor.
|
||||
|
||||
model_pose = get_pose_model() # building the model (see fn. defn. above). To see the architecture, see below cell.
|
||||
model_pose.load_state_dict(state_dict) # Loading the parameters (weights, biases) into the model.
|
||||
|
||||
model_pose.float() # I'm not sure why this is used. No difference if you remove it.
|
||||
|
||||
|
||||
# In[5]:
|
||||
|
||||
|
||||
arch_image = '../input/indonesian-traditional-dance/tgagrakanyar/tga_0000.jpg'
|
||||
img_ori = cv2.imread(arch_image)
|
||||
plt.figure(figsize=(15, 8))
|
||||
plt.imshow(img_ori[...,::-1])
|
||||
|
||||
|
||||
# Notice, the first 10 layers are from VGG-19. But here instead of downloading the model and loading the layers from there, we simply hardcoaded it in get_pose_model()
|
||||
|
||||
# In[6]:
|
||||
|
||||
|
||||
# Run this to view the model's architecture
|
||||
#model_pose.eval()
|
||||
|
||||
|
||||
# In[7]:
|
||||
|
||||
|
||||
use_gpu = True
|
||||
|
||||
if use_gpu:
|
||||
model_pose.cuda()
|
||||
model_pose = torch.nn.DataParallel(model_pose, device_ids=range(torch.cuda.device_count()))
|
||||
cudnn.benchmark = True
|
||||
|
||||
|
||||
# In[8]:
|
||||
|
||||
|
||||
def estimate_pose(img_ori, name=None):
|
||||
if name is None:
|
||||
name = tempfile.mktemp(
|
||||
dir='/kaggle/working',
|
||||
suffix='.png',
|
||||
)
|
||||
pprint.pprint(
|
||||
['estimate_pose', dict(name=name)],
|
||||
)
|
||||
|
||||
# People might be at different scales in the image, perform inference at multiple scales to boost results
|
||||
scale_param = [0.5, 1.0, 1.5, 2.0]
|
||||
|
||||
# Predict Heatmaps for approximate joint position
|
||||
# Use Part Affinity Fields (PAF's) as guidance to link joints to form skeleton
|
||||
# PAF's are just unit vectors along the limb encoding the direction of the limb
|
||||
# A dot product of possible joint connection will be high if actual limb else low
|
||||
|
||||
paf_info, heatmap_info = get_paf_and_heatmap(model_pose, img_ori, scale_param)
|
||||
peaks = extract_heatmap_info(heatmap_info)
|
||||
sp_k, con_all = extract_paf_info(img_ori, paf_info, peaks)
|
||||
|
||||
subsets, candidates = get_subsets(con_all, sp_k, peaks)
|
||||
subsets, img_points = draw_key_point(subsets, peaks, img_ori)
|
||||
|
||||
# After predicting Heatmaps and PAF's, proceeed to link joints correctly
|
||||
img_canvas = link_key_point(img_points, candidates, subsets)
|
||||
|
||||
|
||||
f = plt.figure(figsize=(15, 10))
|
||||
|
||||
plt.subplot(1, 2, 1)
|
||||
plt.imshow(img_points[...,::-1])
|
||||
|
||||
plt.subplot(1, 2, 2)
|
||||
plt.imshow(img_canvas[...,::-1])
|
||||
|
||||
f.savefig(name)
|
||||
|
||||
|
||||
# In[9]:
|
||||
|
||||
|
||||
test_image = '../input/indonesian-traditional-dance/tgagrakanyar/tga_0000.jpg'
|
||||
img_ori = cv2.imread(test_image)
|
||||
estimate_pose(img_ori)
|
||||
|
||||
|
||||
# In[10]:
|
||||
|
||||
|
||||
test_image = '../input/indonesian-traditional-dance/tgagrakanyar/tga_0010.jpg'
|
||||
img_ori = cv2.imread(test_image)
|
||||
estimate_pose(img_ori)
|
||||
|
||||
|
||||
# In[11]:
|
||||
|
||||
|
||||
test_image = '../input/indonesian-traditional-dance/tgagrakanyar/tga_0020.jpg'
|
||||
img_ori = cv2.imread(test_image)
|
||||
estimate_pose(img_ori)
|
||||
|
||||
|
||||
# In[12]:
|
||||
|
||||
|
||||
test_image = '../input/indonesian-traditional-dance/tgagrakanyar/tga_0030.jpg'
|
||||
img_ori = cv2.imread(test_image)
|
||||
estimate_pose(img_ori)
|
||||
|
||||
|
||||
# In[13]:
|
||||
|
||||
|
||||
test_image = '../input/indonesian-traditional-dance/tgagrakanyar/tga_0040.jpg'
|
||||
img_ori = cv2.imread(test_image)
|
||||
estimate_pose(img_ori)
|
||||
|
||||
|
||||
# In[14]:
|
||||
|
||||
|
||||
test_image = '../input/indonesian-traditional-dance/tgagrakanyar/tga_0050.jpg'
|
||||
img_ori = cv2.imread(test_image)
|
||||
estimate_pose(img_ori)
|
||||
|
||||
|
||||
# In[ ]:
|
||||
@ -1,56 +0,0 @@
|
||||
import os
|
||||
|
||||
if os.system(r''' pip show alphapose''') != 0:
|
||||
t1 = r'''
|
||||
pip install pycocotools
|
||||
rm -fr /kaggle/working/AlphaPose
|
||||
pip install pyyaml==5.2
|
||||
pip install scipy==1.1.0
|
||||
git clone https://github.com/WildflowerSchools/AlphaPose
|
||||
python -m pip install cython gdown
|
||||
apt-get install libyaml-dev
|
||||
cd /kaggle/working/AlphaPose && python setup.py build develop
|
||||
'''
|
||||
|
||||
for o in t1.splitlines():
|
||||
print(o)
|
||||
assert os.system(o) == 0
|
||||
|
||||
import os
|
||||
#!git clone https://github.com/MVIG-SJTU/AlphaPose.git
|
||||
|
||||
import torch
|
||||
print(torch.__version__)
|
||||
import yaml, scipy
|
||||
print(yaml.__version__)
|
||||
print(scipy.__version__)
|
||||
|
||||
import gdown
|
||||
import os
|
||||
for o1, o2 in [
|
||||
(
|
||||
'1D47msNOOiJKvPOXlnpyzdKA3k6E97NTC',
|
||||
'/kaggle/working/AlphaPose/detector/yolo/data/yolov3-spp.weights',
|
||||
),
|
||||
(
|
||||
'1nlnuYfGNuHWZztQHXwVZSL_FvfE551pA',
|
||||
'/kaggle/working/AlphaPose/detector/tracker/data/JDE-1088x608-uncertainty',
|
||||
),
|
||||
(
|
||||
'1kQhnMRURFiy7NsdS8EFL-8vtqEXOgECn',
|
||||
'/kaggle/working/AlphaPose/pretrained_models/fast_res50_256x192.pth'
|
||||
),
|
||||
]:
|
||||
os.makedirs(os.path.split(o2)[0], exist_ok=True)
|
||||
if not os.path.exists(o2):
|
||||
gdown.download(
|
||||
'https://drive.google.com/u/0/uc?export=download&confirm=f_Ix&id=%s' % o1,
|
||||
o2,
|
||||
quiet=False
|
||||
)
|
||||
|
||||
|
||||
assert os.system(r'''
|
||||
mkdir -p /kaggle/working/test-input && mkdir -p /kaggle/working/test-output && cp /kaggle/working/AlphaPose/examples/demo/*.jpg /kaggle/working/test-input
|
||||
cd /kaggle/working/AlphaPose && python3 scripts/demo_inference.py --cfg configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml --checkpoint pretrained_models/fast_res50_256x192.pth --indir /kaggle/working/test-input --outdir /kaggle/working/test-output --save_img
|
||||
''') == 0
|
||||
@ -1,172 +0,0 @@
|
||||
# https://raw.githubusercontent.com/hafizas101/Real-time-human-pose-estimation-and-classification/master/main.py
|
||||
# From Python
|
||||
# It requires OpenCV installed for Python
|
||||
import sys
|
||||
import cv2
|
||||
import os
|
||||
from sys import platform
|
||||
import argparse
|
||||
from math import sqrt, acos, degrees, atan, degrees
|
||||
import numpy as np
|
||||
|
||||
# ----------------------------------------- Arslan Part ----------------------------------------------------------------------------------
|
||||
def get_angle(a,b):
|
||||
#print(a)
|
||||
#print(b)
|
||||
del_y = a[1]-b[1]
|
||||
del_x = b[0]-a[0]
|
||||
if del_x == 0:
|
||||
del_x = 0.1
|
||||
#print("Del_X : "+str(del_x)+"-----Del_Y: "+str(del_y))
|
||||
angle = 0
|
||||
|
||||
if del_x > 0 and del_y > 0:
|
||||
angle = degrees(atan(del_y / del_x))
|
||||
elif del_x < 0 and del_y > 0:
|
||||
angle = degrees(atan(del_y / del_x)) + 180
|
||||
|
||||
return angle
|
||||
|
||||
# ------------------------------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
# ----------------------------------------- Maksim Part ----------------------------------------------------------------------------------
|
||||
|
||||
def angle_gor(a,b,c,d):
|
||||
ab=[a[0]-b[0],a[1]-b[1]]
|
||||
ab1=[c[0]-d[0],c[1]-d[1]]
|
||||
cos=abs(ab[0]*ab1[0]+ab[1]*ab1[1])/(sqrt(ab[0]**2+ab[1]**2)*sqrt(ab1[0]**2+ab1[1]**2))
|
||||
ang = acos(cos)
|
||||
return ang*180/np.pi
|
||||
|
||||
|
||||
def sit_ang(a,b,c,d):
|
||||
ang=angle_gor(a,b,c,d)
|
||||
s1=0
|
||||
if ang != None:
|
||||
#print("Angle",ang)
|
||||
if ang < 120 and ang>40:
|
||||
s1=1
|
||||
return s1
|
||||
|
||||
def sit_rec(a,b,c,d):
|
||||
ab = [a[0] - b[0], a[1] - b[1]]
|
||||
ab1 = [c[0] - d[0], c[1] - d[1]]
|
||||
l1=sqrt(ab[0]**2+ab[1]**2)
|
||||
l2=sqrt(ab1[0]**2+ab1[1]**2)
|
||||
s=0
|
||||
if l1!=0 and l2!=0:
|
||||
#print(l1,l2, "---------->>>")
|
||||
if l2/l1>=1.5:
|
||||
s=1
|
||||
return s
|
||||
|
||||
# ------------------------------------------------------------------------------------------------------------------------------------------
|
||||
|
||||
# ----------------------------------------------------------- OpenPose Example Code ----------------------------------------------------------
|
||||
|
||||
# Import Openpose (Windows/Ubuntu/OSX)
|
||||
dir_path = os.path.dirname(os.path.realpath(__file__))
|
||||
try:
|
||||
# Windows Import
|
||||
if platform == "win32":
|
||||
# Change these variables to point to the correct folder (Release/x64 etc.)
|
||||
sys.path.append(dir_path + '/../../python/openpose/Release');
|
||||
os.environ['PATH'] = os.environ['PATH'] + ';' + dir_path + '/../../x64/Release;' + dir_path + '/../../bin;'
|
||||
import pyopenpose as op
|
||||
else:
|
||||
# Change these variables to point to the correct folder (Release/x64 etc.)
|
||||
sys.path.append('../../python');
|
||||
# If you run `make install` (default path is `/usr/local/python` for Ubuntu), you can also access the OpenPose/python module from there. This will install OpenPose and the python library at your desired installation path. Ensure that this is in your python path in order to use it.
|
||||
# sys.path.append('/usr/local/python')
|
||||
from openpose import pyopenpose as op
|
||||
except ImportError as e:
|
||||
print('Error: OpenPose library could not be found. Did you enable `BUILD_PYTHON` in CMake and have this Python script in the right folder?')
|
||||
raise e
|
||||
|
||||
# Flags
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--image_path", default="../../../examples/media/COCO_val2014_000000000192.jpg", help="Process an image. Read all standard formats (jpg, png, bmp, etc.).")
|
||||
args = parser.parse_known_args()
|
||||
|
||||
# Custom Params (refer to include/openpose/flags.hpp for more parameters)
|
||||
params = dict()
|
||||
params["model_folder"] = "/home/nvidia/openpose/models/"
|
||||
|
||||
# Add others in path?
|
||||
for i in range(0, len(args[1])):
|
||||
curr_item = args[1][i]
|
||||
if i != len(args[1])-1: next_item = args[1][i+1]
|
||||
else: next_item = "1"
|
||||
if "--" in curr_item and "--" in next_item:
|
||||
key = curr_item.replace('-','')
|
||||
if key not in params: params[key] = "1"
|
||||
elif "--" in curr_item and "--" not in next_item:
|
||||
key = curr_item.replace('-','')
|
||||
if key not in params: params[key] = next_item
|
||||
|
||||
# Construct it from system arguments
|
||||
# op.init_argv(args[1])
|
||||
# oppython = op.OpenposePython()
|
||||
|
||||
c=0
|
||||
# Starting OpenPose
|
||||
opWrapper = op.WrapperPython()
|
||||
opWrapper.configure(params)
|
||||
opWrapper.start()
|
||||
|
||||
# ------------------------------------------------------- OUR CONTRIBUTIONS ----------------------------------------------------------------
|
||||
|
||||
cam = cv2.VideoCapture(1)
|
||||
for i in range(1000):
|
||||
# Process Image
|
||||
datum = op.Datum()
|
||||
s, im = cam.read() # captures image
|
||||
#cv2.imshow("Test Picture", im) # displays captured image
|
||||
#im=cv2.resize(im,(480,270), interpolation = cv2.INTER_AREA)
|
||||
image1 = im
|
||||
#imageToProcess = cv2.imread(args[0].image_path)
|
||||
c+=1
|
||||
if c==8:
|
||||
c=0
|
||||
datum.cvInputData = image1
|
||||
opWrapper.emplaceAndPop([datum]) # OpenPose being applied to the frame image.
|
||||
# Display Image
|
||||
#print("Body keypoints: \n" + str(datum.poseKeypoints))
|
||||
#print(datum.poseKeypoints.shape)
|
||||
if len(datum.poseKeypoints.shape)>=2:
|
||||
x1=0
|
||||
x2=0
|
||||
|
||||
for j in range(len(datum.poseKeypoints)):
|
||||
x1=0
|
||||
x2=0
|
||||
s=0
|
||||
s1=0
|
||||
ang1 = get_angle(datum.poseKeypoints[j][3], datum.poseKeypoints[j][4])
|
||||
ang2 = get_angle(datum.poseKeypoints[j][6], datum.poseKeypoints[j][7])
|
||||
if (30 < ang1 < 150):
|
||||
x1 = 1
|
||||
if (30 < ang2 < 150):
|
||||
x2 = 1
|
||||
x3 = x1+x2
|
||||
if (x3 == 1):
|
||||
print("The {} person says: HELLO !".format(j+1))
|
||||
#cv2.putText(datum.cvOutputData,'OpenPose using Python-OpenCV',(20,30), cv2.FONT_HERSHEY_SIMPLEX, 1,(255,255,255),1,cv2.LINE_AA)
|
||||
elif (x3 == 2):
|
||||
print("The {} person says: STOP PLEASE !".format(j+1))
|
||||
s += sit_rec(datum.poseKeypoints[j][9], datum.poseKeypoints[j][10],datum.poseKeypoints[j][10],datum.poseKeypoints[j][11])
|
||||
s += sit_rec(datum.poseKeypoints[j][12], datum.poseKeypoints[j][13],datum.poseKeypoints[j][13],datum.poseKeypoints[j][14])
|
||||
s1+=sit_ang(datum.poseKeypoints[j][9], datum.poseKeypoints[j][10],datum.poseKeypoints[j][10],datum.poseKeypoints[j][11])
|
||||
s1+=sit_ang(datum.poseKeypoints[j][12], datum.poseKeypoints[j][13],datum.poseKeypoints[j][13],datum.poseKeypoints[j][14])
|
||||
if s > 0 or s1>0:
|
||||
print("The {} person is sitting".format(j+1))
|
||||
if s == 0 and s1 == 0:
|
||||
print("The {} person is standing".format(j+1))
|
||||
print("___________________________")
|
||||
print(" ")
|
||||
im=cv2.resize(datum.cvOutputData,(960,540), interpolation = cv2.INTER_AREA)
|
||||
cv2.imshow("OpenPose 1.4.0 - Tutorial Python API", im)
|
||||
cv2.waitKey(1)
|
||||
|
||||
|
||||
# ------------------------------------------------------------------------------------------------------------------------------------------
|
||||
390
python/_m.py
390
python/_m.py
@ -1,5 +1,5 @@
|
||||
#!/usr/bin/env python3
|
||||
#vim: set filetype=python
|
||||
# vim: set filetype=python
|
||||
|
||||
import logging
|
||||
import json
|
||||
@ -7,158 +7,184 @@ import enum
|
||||
import pathlib
|
||||
import sys
|
||||
import argparse
|
||||
#import optparse
|
||||
|
||||
# import optparse
|
||||
import dataclasses
|
||||
import subprocess
|
||||
import os
|
||||
|
||||
|
||||
|
||||
from typing import (
|
||||
Optional, Any, TypeAlias, Literal, cast, BinaryIO, Generator,
|
||||
ClassVar, Self,
|
||||
Optional,
|
||||
Any,
|
||||
TypeAlias,
|
||||
Literal,
|
||||
cast,
|
||||
BinaryIO,
|
||||
Generator,
|
||||
ClassVar,
|
||||
Self,
|
||||
)
|
||||
|
||||
logger = logging.getLogger()
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
class Settings:
|
||||
project_root : pathlib.Path = pathlib.Path.cwd()
|
||||
project_root: pathlib.Path = pathlib.Path.cwd()
|
||||
|
||||
env_path : pathlib.Path = project_root / 'tmp' / 'env3'
|
||||
env_path: pathlib.Path = project_root / 'tmp' / 'env3'
|
||||
|
||||
_settings : ClassVar[Optional['Settings']] = None
|
||||
_settings: ClassVar[Optional['Settings']] = None
|
||||
|
||||
@classmethod
|
||||
def settings(cls) -> Self:
|
||||
if cls._settings is None:
|
||||
cls._settings = cls()
|
||||
@classmethod
|
||||
def settings(cls) -> Self:
|
||||
if cls._settings is None:
|
||||
cls._settings = cls()
|
||||
|
||||
return cls._settings
|
||||
|
||||
return cls._settings
|
||||
|
||||
def js(argv: list[str]) -> int:
|
||||
return subprocess.check_call([
|
||||
'sudo',
|
||||
'docker-compose',
|
||||
'--project-directory',
|
||||
Settings.settings().project_root,
|
||||
'-f',
|
||||
Settings.settings().project_root / 'docker' / 'js' / 'docker-compose.yml',
|
||||
*argv,
|
||||
])
|
||||
return subprocess.check_call(
|
||||
[
|
||||
'sudo',
|
||||
'docker-compose',
|
||||
'--project-directory',
|
||||
Settings.settings().project_root,
|
||||
'-f',
|
||||
Settings.settings().project_root / 'docker' / 'js' / 'docker-compose.yml',
|
||||
*argv,
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
def env(
|
||||
argv: Optional[list[str]] = None,
|
||||
mode: Literal['exec', 'subprocess'] = 'subprocess',
|
||||
**kwargs: Any,
|
||||
argv: Optional[list[str]] = None,
|
||||
mode: Literal['exec', 'subprocess'] = 'subprocess',
|
||||
**kwargs: Any,
|
||||
) -> Optional[subprocess.CompletedProcess[bytes]]:
|
||||
env_path = Settings.settings().env_path
|
||||
env_path = Settings.settings().env_path
|
||||
|
||||
if not env_path.exists():
|
||||
subprocess.check_call([
|
||||
sys.executable, '-m', 'venv',
|
||||
'--system-site-packages',
|
||||
str(env_path)
|
||||
])
|
||||
if not env_path.exists():
|
||||
subprocess.check_call([sys.executable, '-m', 'venv', '--system-site-packages', str(env_path)])
|
||||
|
||||
subprocess.check_call([
|
||||
env_path / 'bin' / 'python3',
|
||||
'-m', 'pip',
|
||||
'install', '-r', 'requirements.txt',
|
||||
])
|
||||
subprocess.check_call(
|
||||
[
|
||||
env_path / 'bin' / 'python3',
|
||||
'-m',
|
||||
'pip',
|
||||
'install',
|
||||
'-r',
|
||||
'requirements.txt',
|
||||
]
|
||||
)
|
||||
|
||||
if not argv is None:
|
||||
python_path = str(env_path / 'bin' / 'python3')
|
||||
if not argv is None:
|
||||
python_path = str(env_path / 'bin' / 'python3')
|
||||
|
||||
if mode == 'exec':
|
||||
os.execv(
|
||||
python_path,
|
||||
[
|
||||
python_path,
|
||||
*argv,
|
||||
],
|
||||
)
|
||||
return None
|
||||
elif mode == 'subprocess':
|
||||
return subprocess.run([
|
||||
python_path,
|
||||
*argv,
|
||||
], **kwargs)
|
||||
else:
|
||||
raise NotImplementedError
|
||||
if mode == 'exec':
|
||||
os.execv(
|
||||
python_path,
|
||||
[
|
||||
python_path,
|
||||
*argv,
|
||||
],
|
||||
)
|
||||
return None
|
||||
elif mode == 'subprocess':
|
||||
return subprocess.run(
|
||||
[
|
||||
python_path,
|
||||
*argv,
|
||||
],
|
||||
**kwargs,
|
||||
)
|
||||
else:
|
||||
raise NotImplementedError
|
||||
|
||||
return None
|
||||
|
||||
return None
|
||||
|
||||
def ruff(argv: list[str]) -> None:
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
'-i',
|
||||
dest='paths',
|
||||
help='specify paths to check',
|
||||
default=[],
|
||||
action='append',
|
||||
)
|
||||
parser.add_argument(
|
||||
'-e',
|
||||
dest='exclude',
|
||||
help='rules to ignore',
|
||||
default=[],
|
||||
action='append',
|
||||
)
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
'-i',
|
||||
dest='paths',
|
||||
help='specify paths to check',
|
||||
default=[],
|
||||
action='append',
|
||||
)
|
||||
parser.add_argument(
|
||||
'-e',
|
||||
dest='exclude',
|
||||
help='rules to ignore',
|
||||
default=[],
|
||||
action='append',
|
||||
)
|
||||
|
||||
options, args = parser.parse_known_args(argv)
|
||||
options, args = parser.parse_known_args(argv)
|
||||
|
||||
if len(options.paths) == 0:
|
||||
options.paths.extend([
|
||||
'.',
|
||||
'dotfiles/.local/bin/commands',
|
||||
])
|
||||
if len(options.paths) == 0:
|
||||
options.paths.extend(
|
||||
[
|
||||
'.',
|
||||
'dotfiles/.local/bin/commands',
|
||||
]
|
||||
)
|
||||
|
||||
if len(options.exclude) == 0:
|
||||
options.exclude.extend([
|
||||
'E731',
|
||||
'E713',
|
||||
'E714',
|
||||
'E703',
|
||||
])
|
||||
if len(options.exclude) == 0:
|
||||
options.exclude.extend(
|
||||
[
|
||||
'E731',
|
||||
'E713',
|
||||
'E714',
|
||||
'E703',
|
||||
]
|
||||
)
|
||||
|
||||
res = env([
|
||||
'-m',
|
||||
'ruff',
|
||||
'check',
|
||||
*args,
|
||||
'--output-format', 'json',
|
||||
'--ignore', ','.join(options.exclude),
|
||||
*options.paths,
|
||||
], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
res = env(
|
||||
[
|
||||
'-m',
|
||||
'ruff',
|
||||
'check',
|
||||
*args,
|
||||
'--output-format',
|
||||
'json',
|
||||
'--ignore',
|
||||
','.join(options.exclude),
|
||||
*options.paths,
|
||||
],
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
)
|
||||
|
||||
assert not res is None
|
||||
assert not res is None
|
||||
|
||||
errors = json.loads(res.stdout.decode('utf-8'))
|
||||
errors = json.loads(res.stdout.decode('utf-8'))
|
||||
|
||||
g: dict[str, Any] = dict()
|
||||
for o in errors:
|
||||
if not o['filename'] in g:
|
||||
g[o['filename']] = []
|
||||
g[o['filename']].append(o)
|
||||
g: dict[str, Any] = dict()
|
||||
for o in errors:
|
||||
if not o['filename'] in g:
|
||||
g[o['filename']] = []
|
||||
g[o['filename']].append(o)
|
||||
|
||||
h = {
|
||||
k : len(v)
|
||||
for k, v in g.items()
|
||||
}
|
||||
h = {k: len(v) for k, v in g.items()}
|
||||
|
||||
logger.info(json.dumps(errors, indent=4))
|
||||
logger.info(json.dumps(h, indent=4))
|
||||
logger.info(json.dumps(errors, indent=4))
|
||||
logger.info(json.dumps(h, indent=4))
|
||||
|
||||
|
||||
def inside_env() -> bool:
|
||||
try:
|
||||
import numpy
|
||||
return True
|
||||
except Exception:
|
||||
return False
|
||||
try:
|
||||
import numpy
|
||||
|
||||
#class Commands(enum.StrEnum):
|
||||
return True
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
# class Commands(enum.StrEnum):
|
||||
# js = 'js'
|
||||
# mypy = 'mypy'
|
||||
# env = 'env'
|
||||
@ -172,83 +198,97 @@ def inside_env() -> bool:
|
||||
# argv,
|
||||
# )
|
||||
|
||||
|
||||
def host_deps(argv: list[str]) -> None:
|
||||
if sys.platform in ['linux']:
|
||||
subprocess.check_call(r'''
|
||||
if sys.platform in ['linux']:
|
||||
subprocess.check_call(
|
||||
r"""
|
||||
exec yay -S $(cat requirements-archlinux.txt)
|
||||
''', shell=True,)
|
||||
else:
|
||||
raise NotImplementedError
|
||||
""",
|
||||
shell=True,
|
||||
)
|
||||
else:
|
||||
raise NotImplementedError
|
||||
|
||||
Command_args = ['js', 'mypy', 'env', 'ruff', 'm2', 'host_deps',]
|
||||
|
||||
Command : TypeAlias = Literal['js', 'mypy', 'env', 'ruff', 'm2', 'host_deps',]
|
||||
Command_args = [
|
||||
'js',
|
||||
'mypy',
|
||||
'env',
|
||||
'ruff',
|
||||
'm2',
|
||||
'host_deps',
|
||||
]
|
||||
|
||||
Command: TypeAlias = Literal[
|
||||
'js',
|
||||
'mypy',
|
||||
'env',
|
||||
'ruff',
|
||||
'm2',
|
||||
'host_deps',
|
||||
]
|
||||
|
||||
|
||||
def run(argv: Optional[list[str]] = None) -> None:
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format=(
|
||||
'%(levelname)s:%(name)s:%(message)s'
|
||||
':%(process)d'
|
||||
':%(asctime)s'
|
||||
':%(pathname)s:%(funcName)s:%(lineno)s'
|
||||
),
|
||||
)
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format=('%(levelname)s:%(name)s:%(message)s:%(process)d:%(asctime)s:%(pathname)s:%(funcName)s:%(lineno)s'),
|
||||
)
|
||||
|
||||
if argv is None:
|
||||
argv = sys.argv[:]
|
||||
if argv is None:
|
||||
argv = sys.argv[:]
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
'command',
|
||||
#'_command',
|
||||
choices=[o for o in Command_args],
|
||||
# required=True,
|
||||
)
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
'command',
|
||||
#'_command',
|
||||
choices=[
|
||||
o
|
||||
for o in Command_args
|
||||
],
|
||||
#required=True,
|
||||
)
|
||||
options, args = parser.parse_known_args(argv[1:])
|
||||
|
||||
options, args = parser.parse_known_args(argv[1:])
|
||||
assert options.command in Command_args
|
||||
|
||||
assert options.command in Command_args
|
||||
if len(args) > 0 and args[0] == '--':
|
||||
del args[0]
|
||||
|
||||
if len(args) > 0 and args[0] == '--':
|
||||
del args[0]
|
||||
# options.command = Commands(options._command)
|
||||
|
||||
#options.command = Commands(options._command)
|
||||
if options.command == 'js':
|
||||
js(args)
|
||||
elif options.command == 'host_deps':
|
||||
host_deps(args)
|
||||
elif options.command == 'env':
|
||||
env(
|
||||
args,
|
||||
mode='exec',
|
||||
)
|
||||
# elif options.command == 'mypy':
|
||||
# if not inside_env():
|
||||
# env(
|
||||
# [
|
||||
# pathlib.Path(__file__).parent / 'm.py',
|
||||
# *argv[1:],
|
||||
# ],
|
||||
# mode='exec'
|
||||
# )
|
||||
# else:
|
||||
# mypy(args)
|
||||
elif options.command == 'ruff':
|
||||
ruff(args)
|
||||
elif options.command == 'm2':
|
||||
if not inside_env():
|
||||
env(['--', '_m.py', 'm2', *args])
|
||||
return
|
||||
|
||||
if options.command == 'js':
|
||||
js(args)
|
||||
elif options.command == 'host_deps':
|
||||
host_deps(args)
|
||||
elif options.command == 'env':
|
||||
env(args, mode='exec',)
|
||||
# elif options.command == 'mypy':
|
||||
# if not inside_env():
|
||||
# env(
|
||||
# [
|
||||
# pathlib.Path(__file__).parent / 'm.py',
|
||||
# *argv[1:],
|
||||
# ],
|
||||
# mode='exec'
|
||||
# )
|
||||
# else:
|
||||
# mypy(args)
|
||||
elif options.command == 'ruff':
|
||||
ruff(args)
|
||||
elif options.command == 'm2':
|
||||
if not inside_env():
|
||||
env(['--', '_m.py', 'm2', *args])
|
||||
return
|
||||
import python.tasks.cython
|
||||
|
||||
python.tasks.cython.mypyc_build(pathlib.Path('_m.py'))
|
||||
else:
|
||||
raise NotImplementedError
|
||||
|
||||
import python.tasks.cython
|
||||
python.tasks.cython.mypyc_build(
|
||||
pathlib.Path('_m.py')
|
||||
)
|
||||
else:
|
||||
raise NotImplementedError
|
||||
|
||||
if __name__ == '__main__':
|
||||
run()
|
||||
run()
|
||||
|
||||
306
python/cli.py
306
python/cli.py
@ -10,7 +10,10 @@ import enum
|
||||
import argparse
|
||||
import dataclasses
|
||||
|
||||
from typing import (Optional, override,)
|
||||
from typing import (
|
||||
Optional,
|
||||
override,
|
||||
)
|
||||
|
||||
from online.fxreader.pr34.commands_typed.logging import setup as logging_setup
|
||||
|
||||
@ -24,183 +27,176 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class Command(enum.StrEnum):
|
||||
mypy = 'mypy'
|
||||
pyright = 'pyright'
|
||||
ruff = 'ruff'
|
||||
deploy_wheel = 'deploy:wheel'
|
||||
tests = 'tests'
|
||||
meson_setup = 'meson:setup'
|
||||
mypy = 'mypy'
|
||||
pyright = 'pyright'
|
||||
ruff = 'ruff'
|
||||
deploy_wheel = 'deploy:wheel'
|
||||
tests = 'tests'
|
||||
meson_setup = 'meson:setup'
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
class Settings(
|
||||
_cli.DistSettings,
|
||||
_cli.DistSettings,
|
||||
):
|
||||
base_dir: pathlib.Path = pathlib.Path(__file__).parent.parent
|
||||
build_dir: pathlib.Path = base_dir / 'tmp' / 'build'
|
||||
wheel_dir: pathlib.Path = base_dir / 'deps' / 'dist'
|
||||
env_path: pathlib.Path = cli_bootstrap.BootstrapSettings.get().env_path
|
||||
python_path: pathlib.Path = pathlib.Path(sys.executable)
|
||||
base_dir: pathlib.Path = pathlib.Path(__file__).parent.parent
|
||||
build_dir: pathlib.Path = base_dir / 'tmp' / 'build'
|
||||
wheel_dir: pathlib.Path = base_dir / 'deps' / 'dist'
|
||||
env_path: pathlib.Path = cli_bootstrap.BootstrapSettings.get().env_path
|
||||
python_path: pathlib.Path = pathlib.Path(sys.executable)
|
||||
|
||||
|
||||
class CLI(_cli.CLI):
|
||||
def __init__(self) -> None:
|
||||
self.settings = Settings()
|
||||
self._projects: dict[str, _cli.Project] = {
|
||||
'online.fxreader.pr34': _cli.Project(
|
||||
source_dir=self.settings.base_dir / 'python',
|
||||
build_dir=self.settings.base_dir / 'tmp' / 'online' / 'fxreader' / 'pr34' / 'build',
|
||||
dest_dir=self.settings.base_dir / 'tmp' / 'online' / 'fxreader' / 'pr34' / 'install',
|
||||
meson_path=self.settings.base_dir / 'python' / 'meson.build',
|
||||
)
|
||||
}
|
||||
def __init__(self) -> None:
|
||||
self.settings = Settings()
|
||||
self._projects: dict[str, _cli.Project] = {
|
||||
'online.fxreader.pr34': _cli.Project(
|
||||
source_dir=self.settings.base_dir / 'python',
|
||||
build_dir=self.settings.base_dir / 'tmp' / 'online' / 'fxreader' / 'pr34' / 'build',
|
||||
dest_dir=self.settings.base_dir / 'tmp' / 'online' / 'fxreader' / 'pr34' / 'install',
|
||||
meson_path=self.settings.base_dir / 'python' / 'meson.build',
|
||||
)
|
||||
}
|
||||
|
||||
self._dependencies : dict[str, _cli.Dependency] = dict()
|
||||
self._dependencies: dict[str, _cli.Dependency] = dict()
|
||||
|
||||
@override
|
||||
@property
|
||||
def dist_settings(self) -> _cli.DistSettings:
|
||||
return self.settings
|
||||
@override
|
||||
@property
|
||||
def dist_settings(self) -> _cli.DistSettings:
|
||||
return self.settings
|
||||
|
||||
@override
|
||||
@property
|
||||
def projects(self) -> dict[str, _cli.Project]:
|
||||
return self._projects
|
||||
@override
|
||||
@property
|
||||
def projects(self) -> dict[str, _cli.Project]:
|
||||
return self._projects
|
||||
|
||||
def mypy(
|
||||
self,
|
||||
argv: list[str],
|
||||
) -> None:
|
||||
import online.fxreader.pr34.commands_typed.mypy as _mypy
|
||||
def mypy(
|
||||
self,
|
||||
argv: list[str],
|
||||
) -> None:
|
||||
import online.fxreader.pr34.commands_typed.mypy as _mypy
|
||||
|
||||
project = self._projects['online.fxreader.pr34']
|
||||
project = self._projects['online.fxreader.pr34']
|
||||
|
||||
_mypy.run(
|
||||
argv,
|
||||
settings=_mypy.MypySettings(
|
||||
paths=[
|
||||
#Settings.settings().project_root / 'dotfiles/.local/bin/commands',
|
||||
# project.source_dir / 'm.py',
|
||||
project.source_dir / '_m.py',
|
||||
project.source_dir / 'online',
|
||||
project.source_dir / 'cli.py',
|
||||
project.source_dir / 'm.py',
|
||||
# Settings.settings().project_root / 'deps/com.github.aiortc.aiortc/src',
|
||||
#Settings.settings().project_root / 'm.py',
|
||||
],
|
||||
max_errors={
|
||||
'online/fxreader/pr34/commands_typed': 0,
|
||||
# 'online/fxreader/pr34/commands': 0,
|
||||
'cli.py': 0,
|
||||
'm.py': 0,
|
||||
'../deps/com.github.aiortc.aiortc/src/online_fxreader': 0,
|
||||
'../deps/com.github.aiortc.aiortc/src/aiortc/contrib/signaling': 0
|
||||
}
|
||||
),
|
||||
)
|
||||
_mypy.run(
|
||||
argv,
|
||||
settings=_mypy.MypySettings(
|
||||
paths=[
|
||||
# Settings.settings().project_root / 'dotfiles/.local/bin/commands',
|
||||
# project.source_dir / 'm.py',
|
||||
project.source_dir / '_m.py',
|
||||
project.source_dir / 'online',
|
||||
project.source_dir / 'cli.py',
|
||||
project.source_dir / 'm.py',
|
||||
# Settings.settings().project_root / 'deps/com.github.aiortc.aiortc/src',
|
||||
# Settings.settings().project_root / 'm.py',
|
||||
],
|
||||
max_errors={
|
||||
'online/fxreader/pr34/commands_typed': 0,
|
||||
# 'online/fxreader/pr34/commands': 0,
|
||||
'cli.py': 0,
|
||||
'm.py': 0,
|
||||
'../deps/com.github.aiortc.aiortc/src/online_fxreader': 0,
|
||||
'../deps/com.github.aiortc.aiortc/src/aiortc/contrib/signaling': 0,
|
||||
},
|
||||
),
|
||||
)
|
||||
|
||||
@override
|
||||
@property
|
||||
def dependencies(self) -> dict[str, _cli.Dependency]:
|
||||
return self._dependencies
|
||||
@override
|
||||
@property
|
||||
def dependencies(self) -> dict[str, _cli.Dependency]:
|
||||
return self._dependencies
|
||||
|
||||
def run(self, argv: Optional[list[str]] = None) -> None:
|
||||
if argv is None:
|
||||
argv = copy.deepcopy(sys.argv)
|
||||
def run(self, argv: Optional[list[str]] = None) -> None:
|
||||
if argv is None:
|
||||
argv = copy.deepcopy(sys.argv)
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
'command',
|
||||
choices=[
|
||||
o.value
|
||||
for o in Command
|
||||
]
|
||||
)
|
||||
parser.add_argument(
|
||||
'-p', '--project',
|
||||
choices=[
|
||||
o
|
||||
for o in self.projects
|
||||
]
|
||||
)
|
||||
parser.add_argument(
|
||||
'-o', '--output_dir',
|
||||
default=None,
|
||||
help='wheel output dir for deploy:wheel',
|
||||
)
|
||||
parser.add_argument(
|
||||
'-f', '--force',
|
||||
default=False,
|
||||
action='store_true',
|
||||
help='remove install dir, before installing, default = false',
|
||||
)
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument('command', choices=[o.value for o in Command])
|
||||
parser.add_argument('-p', '--project', choices=[o for o in self.projects])
|
||||
parser.add_argument(
|
||||
'-o',
|
||||
'--output_dir',
|
||||
default=None,
|
||||
help='wheel output dir for deploy:wheel',
|
||||
)
|
||||
parser.add_argument(
|
||||
'-f',
|
||||
'--force',
|
||||
default=False,
|
||||
action='store_true',
|
||||
help='remove install dir, before installing, default = false',
|
||||
)
|
||||
|
||||
options, args = parser.parse_known_args(argv[1:])
|
||||
options, args = parser.parse_known_args(argv[1:])
|
||||
|
||||
default_project : Optional[str] = None
|
||||
default_project: Optional[str] = None
|
||||
|
||||
for k, v in self.projects.items():
|
||||
if (
|
||||
cli_bootstrap.paths_equal(
|
||||
v.source_dir.resolve(),
|
||||
# pathlib.Path(__file__).parent.resolve(),
|
||||
pathlib.Path.cwd(),
|
||||
)
|
||||
):
|
||||
default_project = k
|
||||
for k, v in self.projects.items():
|
||||
if cli_bootstrap.paths_equal(
|
||||
v.source_dir.resolve(),
|
||||
# pathlib.Path(__file__).parent.resolve(),
|
||||
pathlib.Path.cwd(),
|
||||
):
|
||||
default_project = k
|
||||
|
||||
if options.project is None:
|
||||
if not default_project is None:
|
||||
options.project = default_project
|
||||
else:
|
||||
logger.error(dict(msg='not provided project name'))
|
||||
raise NotImplementedError
|
||||
if options.project is None:
|
||||
if not default_project is None:
|
||||
options.project = default_project
|
||||
else:
|
||||
logger.error(dict(msg='not provided project name'))
|
||||
raise NotImplementedError
|
||||
|
||||
options.command = Command(options.command)
|
||||
options.command = Command(options.command)
|
||||
|
||||
if options.command is Command.deploy_wheel:
|
||||
assert not options.project is None
|
||||
if options.command is Command.deploy_wheel:
|
||||
assert not options.project is None
|
||||
|
||||
self.deploy_wheel(
|
||||
project_name=options.project,
|
||||
argv=args,
|
||||
output_dir=options.output_dir,
|
||||
mypy=True,
|
||||
ruff=True,
|
||||
pyright=True,
|
||||
)
|
||||
elif options.command is Command.pyright:
|
||||
self.pyright(
|
||||
project_name=options.project,
|
||||
argv=args,
|
||||
)
|
||||
elif options.command is Command.ruff:
|
||||
self.ruff(
|
||||
project_name=options.project,
|
||||
argv=args,
|
||||
)
|
||||
elif options.command is Command.meson_setup:
|
||||
assert not options.project is None
|
||||
self.deploy_wheel(
|
||||
project_name=options.project,
|
||||
argv=args,
|
||||
output_dir=options.output_dir,
|
||||
mypy=True,
|
||||
ruff=True,
|
||||
pyright=True,
|
||||
)
|
||||
elif options.command is Command.pyright:
|
||||
self.pyright(
|
||||
project_name=options.project,
|
||||
argv=args,
|
||||
)
|
||||
elif options.command is Command.ruff:
|
||||
self.ruff(
|
||||
project_name=options.project,
|
||||
argv=args,
|
||||
)
|
||||
elif options.command is Command.meson_setup:
|
||||
assert not options.project is None
|
||||
|
||||
self.meson_setup(
|
||||
project_name=options.project,
|
||||
argv=args,
|
||||
force=options.force,
|
||||
)
|
||||
elif options.command is Command.mypy:
|
||||
self.mypy(
|
||||
argv=args,
|
||||
)
|
||||
elif options.command is Command.tests:
|
||||
for k, v in self.projects.items():
|
||||
subprocess.check_call(
|
||||
[
|
||||
sys.executable,
|
||||
'-m',
|
||||
'unittest',
|
||||
'online.fxreader.pr34.tests.test_crypto',
|
||||
*args,
|
||||
],
|
||||
cwd=str(v.source_dir),
|
||||
)
|
||||
else:
|
||||
raise NotImplementedError
|
||||
|
||||
self.meson_setup(
|
||||
project_name=options.project,
|
||||
argv=args,
|
||||
force=options.force,
|
||||
)
|
||||
elif options.command is Command.mypy:
|
||||
self.mypy(
|
||||
argv=args,
|
||||
)
|
||||
elif options.command is Command.tests:
|
||||
for k, v in self.projects.items():
|
||||
subprocess.check_call([
|
||||
sys.executable,
|
||||
'-m',
|
||||
'unittest',
|
||||
'online.fxreader.pr34.tests.test_crypto',
|
||||
*args,
|
||||
], cwd=str(v.source_dir))
|
||||
else:
|
||||
raise NotImplementedError
|
||||
|
||||
if __name__ == '__main__':
|
||||
CLI().run()
|
||||
CLI().run()
|
||||
|
||||
529
python/m.py
529
python/m.py
@ -10,329 +10,326 @@ import os
|
||||
import logging
|
||||
|
||||
|
||||
from typing import (Optional, Any,)
|
||||
from typing import (
|
||||
Optional,
|
||||
Any,
|
||||
)
|
||||
from typing_extensions import (
|
||||
Self, BinaryIO,
|
||||
Self,
|
||||
BinaryIO,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def toml_load(f: BinaryIO) -> Any:
|
||||
try:
|
||||
import tomllib
|
||||
return tomllib.load(f)
|
||||
except:
|
||||
pass
|
||||
try:
|
||||
import tomllib
|
||||
|
||||
try:
|
||||
import tomli
|
||||
return tomli.load(f)
|
||||
except:
|
||||
pass
|
||||
return tomllib.load(f)
|
||||
except:
|
||||
pass
|
||||
|
||||
try:
|
||||
import tomli
|
||||
|
||||
return tomli.load(f)
|
||||
except:
|
||||
pass
|
||||
|
||||
raise NotImplementedError
|
||||
|
||||
raise NotImplementedError
|
||||
|
||||
@dataclasses.dataclass
|
||||
class PyProject:
|
||||
path: pathlib.Path
|
||||
dependencies: dict[str, list[str]]
|
||||
early_features: Optional[list[str]] = None
|
||||
pip_find_links: Optional[list[pathlib.Path]] = None
|
||||
runtime_libdirs: Optional[list[pathlib.Path]] = None
|
||||
runtime_preload: Optional[list[pathlib.Path]] = None
|
||||
requirements: dict[str, pathlib.Path] = dataclasses.field(default_factory=lambda : dict())
|
||||
path: pathlib.Path
|
||||
dependencies: dict[str, list[str]]
|
||||
early_features: Optional[list[str]] = None
|
||||
pip_find_links: Optional[list[pathlib.Path]] = None
|
||||
runtime_libdirs: Optional[list[pathlib.Path]] = None
|
||||
runtime_preload: Optional[list[pathlib.Path]] = None
|
||||
requirements: dict[str, pathlib.Path] = dataclasses.field(default_factory=lambda: dict())
|
||||
|
||||
|
||||
def pyproject_load(
|
||||
d: pathlib.Path,
|
||||
d: pathlib.Path,
|
||||
) -> PyProject:
|
||||
with io.open(d, 'rb') as f:
|
||||
content = toml_load(f)
|
||||
with io.open(d, 'rb') as f:
|
||||
content = toml_load(f)
|
||||
|
||||
assert isinstance(content, dict)
|
||||
assert isinstance(content, dict)
|
||||
|
||||
dependencies : dict[str, list[str]] = dict()
|
||||
dependencies: dict[str, list[str]] = dict()
|
||||
|
||||
dependencies['default'] = content['project']['dependencies']
|
||||
dependencies['default'] = content['project']['dependencies']
|
||||
|
||||
if (
|
||||
'optional-dependencies' in content['project']
|
||||
):
|
||||
assert isinstance(
|
||||
content['project']['optional-dependencies'],
|
||||
dict
|
||||
)
|
||||
if 'optional-dependencies' in content['project']:
|
||||
assert isinstance(content['project']['optional-dependencies'], dict)
|
||||
|
||||
for k, v in content['project']['optional-dependencies'].items():
|
||||
assert isinstance(v, list)
|
||||
assert isinstance(k, str)
|
||||
for k, v in content['project']['optional-dependencies'].items():
|
||||
assert isinstance(v, list)
|
||||
assert isinstance(k, str)
|
||||
|
||||
dependencies[k] = v
|
||||
dependencies[k] = v
|
||||
|
||||
res = PyProject(
|
||||
path=d,
|
||||
dependencies=dependencies,
|
||||
)
|
||||
|
||||
res = PyProject(
|
||||
path=d,
|
||||
dependencies=dependencies,
|
||||
)
|
||||
tool_name = 'online.fxreader.pr34'.replace('.', '-')
|
||||
|
||||
tool_name = 'online.fxreader.pr34'.replace('.', '-')
|
||||
if 'tool' in content and isinstance(content['tool'], dict) and tool_name in content['tool'] and isinstance(content['tool'][tool_name], dict):
|
||||
if 'early_features' in content['tool'][tool_name]:
|
||||
res.early_features = content['tool'][tool_name]['early_features']
|
||||
|
||||
if (
|
||||
'tool' in content and
|
||||
isinstance(
|
||||
content['tool'], dict
|
||||
) and
|
||||
tool_name in content['tool'] and
|
||||
isinstance(
|
||||
content['tool'][tool_name],
|
||||
dict
|
||||
)
|
||||
):
|
||||
if 'early_features' in content['tool'][tool_name]:
|
||||
res.early_features = content['tool'][tool_name]['early_features']
|
||||
if 'pip_find_links' in content['tool'][tool_name]:
|
||||
res.pip_find_links = [d.parent / pathlib.Path(o) for o in content['tool'][tool_name]['pip_find_links']]
|
||||
|
||||
if 'pip_find_links' in content['tool'][tool_name]:
|
||||
res.pip_find_links = [
|
||||
d.parent / pathlib.Path(o)
|
||||
for o in content['tool'][tool_name]['pip_find_links']
|
||||
]
|
||||
if 'runtime_libdirs' in content['tool'][tool_name]:
|
||||
res.runtime_libdirs = [
|
||||
d.parent / pathlib.Path(o)
|
||||
# pathlib.Path(o)
|
||||
for o in content['tool'][tool_name]['runtime_libdirs']
|
||||
]
|
||||
|
||||
if 'runtime_libdirs' in content['tool'][tool_name]:
|
||||
res.runtime_libdirs = [
|
||||
d.parent / pathlib.Path(o)
|
||||
# pathlib.Path(o)
|
||||
for o in content['tool'][tool_name]['runtime_libdirs']
|
||||
]
|
||||
if 'runtime_preload' in content['tool'][tool_name]:
|
||||
res.runtime_preload = [
|
||||
d.parent / pathlib.Path(o)
|
||||
# pathlib.Path(o)
|
||||
for o in content['tool'][tool_name]['runtime_preload']
|
||||
]
|
||||
|
||||
if 'runtime_preload' in content['tool'][tool_name]:
|
||||
res.runtime_preload = [
|
||||
d.parent / pathlib.Path(o)
|
||||
# pathlib.Path(o)
|
||||
for o in content['tool'][tool_name]['runtime_preload']
|
||||
]
|
||||
if 'requirements' in content['tool'][tool_name]:
|
||||
assert isinstance(content['tool'][tool_name]['requirements'], dict)
|
||||
|
||||
if 'requirements' in content['tool'][tool_name]:
|
||||
assert isinstance(content['tool'][tool_name]['requirements'], dict)
|
||||
res.requirements = {
|
||||
k: d.parent / pathlib.Path(v)
|
||||
# pathlib.Path(o)
|
||||
for k, v in content['tool'][tool_name]['requirements'].items()
|
||||
}
|
||||
|
||||
res.requirements = {
|
||||
k : d.parent / pathlib.Path(v)
|
||||
# pathlib.Path(o)
|
||||
for k, v in content['tool'][tool_name]['requirements'].items()
|
||||
}
|
||||
return res
|
||||
|
||||
return res
|
||||
|
||||
@dataclasses.dataclass
|
||||
class BootstrapSettings:
|
||||
env_path: pathlib.Path
|
||||
python_path: pathlib.Path
|
||||
base_dir: pathlib.Path
|
||||
python_version: Optional[str] = dataclasses.field(
|
||||
default_factory=lambda : os.environ.get(
|
||||
'PYTHON_VERSION',
|
||||
'%d.%d' % (
|
||||
sys.version_info.major,
|
||||
sys.version_info.minor,
|
||||
),
|
||||
).strip()
|
||||
)
|
||||
uv_args: list[str] = dataclasses.field(
|
||||
default_factory=lambda : os.environ.get(
|
||||
'UV_ARGS',
|
||||
'--offline',
|
||||
).split(),
|
||||
)
|
||||
env_path: pathlib.Path
|
||||
python_path: pathlib.Path
|
||||
base_dir: pathlib.Path
|
||||
python_version: Optional[str] = dataclasses.field(
|
||||
default_factory=lambda: os.environ.get(
|
||||
'PYTHON_VERSION',
|
||||
'%d.%d'
|
||||
% (
|
||||
sys.version_info.major,
|
||||
sys.version_info.minor,
|
||||
),
|
||||
).strip()
|
||||
)
|
||||
uv_args: list[str] = dataclasses.field(
|
||||
default_factory=lambda: os.environ.get(
|
||||
'UV_ARGS',
|
||||
'--offline',
|
||||
).split(),
|
||||
)
|
||||
|
||||
@classmethod
|
||||
def get(
|
||||
cls,
|
||||
base_dir: Optional[pathlib.Path] = None,
|
||||
) -> Self:
|
||||
if base_dir is None:
|
||||
base_dir = pathlib.Path.cwd()
|
||||
@classmethod
|
||||
def get(
|
||||
cls,
|
||||
base_dir: Optional[pathlib.Path] = None,
|
||||
) -> Self:
|
||||
if base_dir is None:
|
||||
base_dir = pathlib.Path.cwd()
|
||||
|
||||
env_path = base_dir / '.venv'
|
||||
python_path = env_path / 'bin' / 'python3'
|
||||
env_path = base_dir / '.venv'
|
||||
python_path = env_path / 'bin' / 'python3'
|
||||
|
||||
return cls(
|
||||
base_dir=base_dir,
|
||||
env_path=env_path,
|
||||
python_path=python_path,
|
||||
)
|
||||
|
||||
return cls(
|
||||
base_dir=base_dir,
|
||||
env_path=env_path,
|
||||
python_path=python_path,
|
||||
)
|
||||
|
||||
def env_bootstrap(
|
||||
bootstrap_settings: BootstrapSettings,
|
||||
pyproject: PyProject,
|
||||
bootstrap_settings: BootstrapSettings,
|
||||
pyproject: PyProject,
|
||||
) -> None:
|
||||
pip_find_links : list[pathlib.Path] = []
|
||||
pip_find_links: list[pathlib.Path] = []
|
||||
|
||||
if not pyproject.pip_find_links is None:
|
||||
pip_find_links.extend(pyproject.pip_find_links)
|
||||
if not pyproject.pip_find_links is None:
|
||||
pip_find_links.extend(pyproject.pip_find_links)
|
||||
|
||||
pip_find_links_args = sum([
|
||||
['-f', str(o),]
|
||||
for o in pip_find_links
|
||||
], [])
|
||||
pip_find_links_args = sum(
|
||||
[
|
||||
[
|
||||
'-f',
|
||||
str(o),
|
||||
]
|
||||
for o in pip_find_links
|
||||
],
|
||||
[],
|
||||
)
|
||||
|
||||
features : list[str] = []
|
||||
features: list[str] = []
|
||||
|
||||
if pyproject.early_features:
|
||||
features.extend(pyproject.early_features)
|
||||
if pyproject.early_features:
|
||||
features.extend(pyproject.early_features)
|
||||
|
||||
requirements_python_version: Optional[str] = None
|
||||
if not bootstrap_settings.python_version is None:
|
||||
requirements_python_version = bootstrap_settings.python_version.replace('.', '_')
|
||||
requirements_python_version: Optional[str] = None
|
||||
if not bootstrap_settings.python_version is None:
|
||||
requirements_python_version = bootstrap_settings.python_version.replace('.', '_')
|
||||
|
||||
requirements_name = '_'.join(sorted(features))
|
||||
|
||||
if requirements_python_version:
|
||||
requirements_name += '_' + requirements_python_version
|
||||
|
||||
requirements_path: Optional[pathlib.Path] = None
|
||||
|
||||
if requirements_name in pyproject.requirements:
|
||||
requirements_path = pyproject.requirements[requirements_name]
|
||||
else:
|
||||
requirements_path = pyproject.path.parent / 'requirements.txt'
|
||||
|
||||
requirements_in: list[str] = []
|
||||
|
||||
requirements_in.extend(['uv', 'pip', 'build', 'setuptools', 'meson-python', 'pybind11'])
|
||||
|
||||
if pyproject.early_features:
|
||||
early_dependencies = sum([pyproject.dependencies[o] for o in pyproject.early_features], [])
|
||||
|
||||
logger.info(
|
||||
dict(
|
||||
early_dependencies=early_dependencies,
|
||||
)
|
||||
)
|
||||
|
||||
requirements_in.extend(early_dependencies)
|
||||
# if len(early_dependencies) > 0:
|
||||
# subprocess.check_call([
|
||||
# bootstrap_settings.python_path,
|
||||
# '-m',
|
||||
# 'uv', 'pip', 'install',
|
||||
# *pip_find_links_args,
|
||||
# # '-f', str(pathlib.Path(__file__).parent / 'deps' / 'dist'),
|
||||
# *bootstrap_settings.uv_args,
|
||||
# *early_dependencies,
|
||||
# ])
|
||||
|
||||
if not requirements_path.exists():
|
||||
with tempfile.NamedTemporaryFile(
|
||||
mode='w',
|
||||
prefix='requirements',
|
||||
suffix='.in',
|
||||
) as f:
|
||||
f.write('\n'.join(requirements_in))
|
||||
f.flush()
|
||||
|
||||
subprocess.check_call(
|
||||
[
|
||||
'uv',
|
||||
'pip',
|
||||
'compile',
|
||||
'--generate-hashes',
|
||||
*pip_find_links_args,
|
||||
# '-p',
|
||||
# bootstrap_settings.python_path,
|
||||
*bootstrap_settings.uv_args,
|
||||
'-o',
|
||||
str(requirements_path),
|
||||
f.name,
|
||||
]
|
||||
)
|
||||
|
||||
uv_python_version: list[str] = []
|
||||
|
||||
if not bootstrap_settings.python_version is None:
|
||||
uv_python_version.extend(
|
||||
[
|
||||
'-p',
|
||||
bootstrap_settings.python_version,
|
||||
]
|
||||
)
|
||||
|
||||
subprocess.check_call(
|
||||
[
|
||||
'uv',
|
||||
'venv',
|
||||
*uv_python_version,
|
||||
*pip_find_links_args,
|
||||
# '--seed',
|
||||
*bootstrap_settings.uv_args,
|
||||
str(bootstrap_settings.env_path),
|
||||
]
|
||||
)
|
||||
|
||||
subprocess.check_call(
|
||||
[
|
||||
'uv',
|
||||
'pip',
|
||||
'install',
|
||||
*pip_find_links_args,
|
||||
'-p',
|
||||
bootstrap_settings.python_path,
|
||||
'--require-hashes',
|
||||
*bootstrap_settings.uv_args,
|
||||
'-r',
|
||||
str(requirements_path),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
requirements_name = '_'.join(sorted(features))
|
||||
def paths_equal(a: pathlib.Path | str, b: pathlib.Path | str) -> bool:
|
||||
return os.path.abspath(str(a)) == os.path.abspath(str(b))
|
||||
|
||||
if requirements_python_version:
|
||||
requirements_name += '_' + requirements_python_version
|
||||
|
||||
requirements_path : Optional[pathlib.Path] = None
|
||||
|
||||
if requirements_name in pyproject.requirements:
|
||||
requirements_path = pyproject.requirements[requirements_name]
|
||||
else:
|
||||
requirements_path = pyproject.path.parent / 'requirements.txt'
|
||||
|
||||
requirements_in : list[str] = []
|
||||
|
||||
requirements_in.extend([
|
||||
'uv', 'pip', 'build', 'setuptools', 'meson-python', 'pybind11'
|
||||
])
|
||||
|
||||
if pyproject.early_features:
|
||||
early_dependencies = sum([
|
||||
pyproject.dependencies[o]
|
||||
for o in pyproject.early_features
|
||||
], [])
|
||||
|
||||
logger.info(dict(
|
||||
early_dependencies=early_dependencies,
|
||||
))
|
||||
|
||||
requirements_in.extend(early_dependencies)
|
||||
# if len(early_dependencies) > 0:
|
||||
# subprocess.check_call([
|
||||
# bootstrap_settings.python_path,
|
||||
# '-m',
|
||||
# 'uv', 'pip', 'install',
|
||||
# *pip_find_links_args,
|
||||
# # '-f', str(pathlib.Path(__file__).parent / 'deps' / 'dist'),
|
||||
# *bootstrap_settings.uv_args,
|
||||
# *early_dependencies,
|
||||
# ])
|
||||
|
||||
if not requirements_path.exists():
|
||||
with tempfile.NamedTemporaryFile(
|
||||
mode='w',
|
||||
prefix='requirements',
|
||||
suffix='.in',
|
||||
) as f:
|
||||
f.write(
|
||||
'\n'.join(requirements_in)
|
||||
)
|
||||
f.flush()
|
||||
|
||||
subprocess.check_call([
|
||||
'uv',
|
||||
'pip',
|
||||
'compile',
|
||||
'--generate-hashes',
|
||||
*pip_find_links_args,
|
||||
# '-p',
|
||||
# bootstrap_settings.python_path,
|
||||
*bootstrap_settings.uv_args,
|
||||
'-o', str(requirements_path),
|
||||
f.name,
|
||||
])
|
||||
|
||||
uv_python_version: list[str] = []
|
||||
|
||||
if not bootstrap_settings.python_version is None:
|
||||
uv_python_version.extend([
|
||||
'-p', bootstrap_settings.python_version,
|
||||
])
|
||||
|
||||
subprocess.check_call([
|
||||
'uv', 'venv',
|
||||
*uv_python_version,
|
||||
*pip_find_links_args,
|
||||
# '--seed',
|
||||
*bootstrap_settings.uv_args,
|
||||
str(bootstrap_settings.env_path)
|
||||
])
|
||||
|
||||
subprocess.check_call([
|
||||
'uv',
|
||||
'pip',
|
||||
'install',
|
||||
*pip_find_links_args,
|
||||
'-p',
|
||||
bootstrap_settings.python_path,
|
||||
'--require-hashes',
|
||||
*bootstrap_settings.uv_args,
|
||||
'-r', str(requirements_path),
|
||||
])
|
||||
|
||||
|
||||
def paths_equal(
|
||||
a: pathlib.Path | str,
|
||||
b: pathlib.Path | str
|
||||
) -> bool:
|
||||
return (
|
||||
os.path.abspath(str(a)) ==
|
||||
os.path.abspath(str(b))
|
||||
)
|
||||
|
||||
def run(
|
||||
d: Optional[pathlib.Path] = None,
|
||||
cli_path: Optional[pathlib.Path] = None,
|
||||
d: Optional[pathlib.Path] = None,
|
||||
cli_path: Optional[pathlib.Path] = None,
|
||||
) -> None:
|
||||
if cli_path is None:
|
||||
cli_path = pathlib.Path(__file__).parent / 'cli.py'
|
||||
if cli_path is None:
|
||||
cli_path = pathlib.Path(__file__).parent / 'cli.py'
|
||||
|
||||
if d is None:
|
||||
d = pathlib.Path(__file__).parent / 'pyproject.toml'
|
||||
if d is None:
|
||||
d = pathlib.Path(__file__).parent / 'pyproject.toml'
|
||||
|
||||
bootstrap_settings = BootstrapSettings.get()
|
||||
bootstrap_settings = BootstrapSettings.get()
|
||||
|
||||
pyproject : PyProject = pyproject_load(
|
||||
d
|
||||
)
|
||||
pyproject: PyProject = pyproject_load(d)
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
|
||||
if not bootstrap_settings.env_path.exists():
|
||||
env_bootstrap(
|
||||
bootstrap_settings=bootstrap_settings,
|
||||
pyproject=pyproject,
|
||||
)
|
||||
if not bootstrap_settings.env_path.exists():
|
||||
env_bootstrap(
|
||||
bootstrap_settings=bootstrap_settings,
|
||||
pyproject=pyproject,
|
||||
)
|
||||
|
||||
logger.info([sys.executable, sys.argv, bootstrap_settings.python_path])
|
||||
logger.info([sys.executable, sys.argv, bootstrap_settings.python_path])
|
||||
|
||||
if not paths_equal(sys.executable, bootstrap_settings.python_path):
|
||||
os.execv(
|
||||
str(bootstrap_settings.python_path),
|
||||
[
|
||||
str(bootstrap_settings.python_path),
|
||||
*sys.argv,
|
||||
]
|
||||
)
|
||||
if not paths_equal(sys.executable, bootstrap_settings.python_path):
|
||||
os.execv(
|
||||
str(bootstrap_settings.python_path),
|
||||
[
|
||||
str(bootstrap_settings.python_path),
|
||||
*sys.argv,
|
||||
],
|
||||
)
|
||||
|
||||
os.execv(
|
||||
str(bootstrap_settings.python_path),
|
||||
[
|
||||
str(bootstrap_settings.python_path),
|
||||
str(cli_path),
|
||||
*sys.argv[1:],
|
||||
],
|
||||
)
|
||||
|
||||
os.execv(
|
||||
str(bootstrap_settings.python_path),
|
||||
[
|
||||
str(bootstrap_settings.python_path),
|
||||
str(
|
||||
cli_path
|
||||
),
|
||||
*sys.argv[1:],
|
||||
]
|
||||
)
|
||||
|
||||
if __name__ == '__main__':
|
||||
run(
|
||||
d=pathlib.Path(__file__).parent / 'pyproject.toml',
|
||||
cli_path=pathlib.Path(__file__).parent / 'cli.py',
|
||||
)
|
||||
run(
|
||||
d=pathlib.Path(__file__).parent / 'pyproject.toml',
|
||||
cli_path=pathlib.Path(__file__).parent / 'cli.py',
|
||||
)
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@ -1,27 +1,28 @@
|
||||
__all__ = (
|
||||
'parse_args',
|
||||
)
|
||||
__all__ = ('parse_args',)
|
||||
|
||||
import sys
|
||||
import argparse
|
||||
|
||||
from typing import (Optional,)
|
||||
from typing import (
|
||||
Optional,
|
||||
)
|
||||
|
||||
|
||||
def parse_args(
|
||||
parser: argparse.ArgumentParser,
|
||||
args: Optional[list[str]] = None,
|
||||
parser: argparse.ArgumentParser,
|
||||
args: Optional[list[str]] = None,
|
||||
) -> tuple[argparse.Namespace, list[str]]:
|
||||
if args is None:
|
||||
args = sys.argv[1:]
|
||||
if args is None:
|
||||
args = sys.argv[1:]
|
||||
|
||||
argv : list[str] = []
|
||||
argv: list[str] = []
|
||||
|
||||
for i, o in enumerate(args):
|
||||
if o == '--':
|
||||
argv.extend(args[i + 1:])
|
||||
for i, o in enumerate(args):
|
||||
if o == '--':
|
||||
argv.extend(args[i + 1 :])
|
||||
|
||||
del args[i:]
|
||||
del args[i:]
|
||||
|
||||
break
|
||||
break
|
||||
|
||||
return parser.parse_args(args), argv
|
||||
return parser.parse_args(args), argv
|
||||
|
||||
@ -1,14 +1,23 @@
|
||||
import logging
|
||||
import asyncio
|
||||
|
||||
from typing import (Any,)
|
||||
from typing import (
|
||||
Any,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
def handle_task_result(fut: asyncio.Future[Any]) -> None:
|
||||
try:
|
||||
fut.result()
|
||||
|
||||
logger.debug(dict(fut=fut, msg='done'), stacklevel=2,)
|
||||
except:
|
||||
logger.exception('', stacklevel=2,)
|
||||
def handle_task_result(fut: asyncio.Future[Any]) -> None:
|
||||
try:
|
||||
fut.result()
|
||||
|
||||
logger.debug(
|
||||
dict(fut=fut, msg='done'),
|
||||
stacklevel=2,
|
||||
)
|
||||
except:
|
||||
logger.exception(
|
||||
'',
|
||||
stacklevel=2,
|
||||
)
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@ -11,532 +11,519 @@ import os
|
||||
import logging
|
||||
|
||||
|
||||
from typing import (Optional, Any, cast, Type, TypeVar,)
|
||||
from typing import (
|
||||
Optional,
|
||||
Any,
|
||||
cast,
|
||||
Type,
|
||||
TypeVar,
|
||||
)
|
||||
from typing_extensions import (
|
||||
Self, BinaryIO, overload,
|
||||
Self,
|
||||
BinaryIO,
|
||||
overload,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def toml_load(f: BinaryIO) -> Any:
|
||||
try:
|
||||
import tomllib
|
||||
return tomllib.load(f)
|
||||
except:
|
||||
pass
|
||||
try:
|
||||
import tomllib
|
||||
|
||||
try:
|
||||
import tomli
|
||||
return tomli.load(f)
|
||||
except:
|
||||
pass
|
||||
return tomllib.load(f)
|
||||
except:
|
||||
pass
|
||||
|
||||
try:
|
||||
import tomli
|
||||
|
||||
return tomli.load(f)
|
||||
except:
|
||||
pass
|
||||
|
||||
raise NotImplementedError
|
||||
|
||||
raise NotImplementedError
|
||||
|
||||
@dataclasses.dataclass
|
||||
class PyProject:
|
||||
@dataclasses.dataclass
|
||||
class Module:
|
||||
name: str
|
||||
meson: Optional[pathlib.Path] = None
|
||||
tool: dict[str, Any] = dataclasses.field(default_factory=lambda : dict())
|
||||
@dataclasses.dataclass
|
||||
class Module:
|
||||
name: str
|
||||
meson: Optional[pathlib.Path] = None
|
||||
tool: dict[str, Any] = dataclasses.field(default_factory=lambda: dict())
|
||||
|
||||
path: pathlib.Path
|
||||
dependencies: dict[str, list[str]]
|
||||
early_features: Optional[list[str]] = None
|
||||
pip_find_links: Optional[list[pathlib.Path]] = None
|
||||
runtime_libdirs: Optional[list[pathlib.Path]] = None
|
||||
runtime_preload: Optional[list[pathlib.Path]] = None
|
||||
requirements: dict[str, pathlib.Path] = dataclasses.field(default_factory=lambda: dict())
|
||||
|
||||
path: pathlib.Path
|
||||
dependencies: dict[str, list[str]]
|
||||
early_features: Optional[list[str]] = None
|
||||
pip_find_links: Optional[list[pathlib.Path]] = None
|
||||
runtime_libdirs: Optional[list[pathlib.Path]] = None
|
||||
runtime_preload: Optional[list[pathlib.Path]] = None
|
||||
requirements: dict[str, pathlib.Path] = dataclasses.field(default_factory=lambda : dict())
|
||||
modules: list[Module] = dataclasses.field(
|
||||
default_factory=lambda: [],
|
||||
)
|
||||
|
||||
modules: list[Module] = dataclasses.field(
|
||||
default_factory=lambda : [],
|
||||
)
|
||||
tool: dict[str, Any] = dataclasses.field(
|
||||
default_factory=lambda: dict(),
|
||||
)
|
||||
|
||||
tool: dict[str, Any] = dataclasses.field(
|
||||
default_factory=lambda : dict(),
|
||||
)
|
||||
|
||||
Key = TypeVar('Key')
|
||||
Value = TypeVar('Value')
|
||||
|
||||
|
||||
@overload
|
||||
def check_dict(
|
||||
value: Any,
|
||||
KT: Type[Key],
|
||||
VT: Type[Value],
|
||||
value: Any,
|
||||
KT: Type[Key],
|
||||
VT: Type[Value],
|
||||
) -> dict[Key, Value]: ...
|
||||
|
||||
|
||||
@overload
|
||||
def check_dict(
|
||||
value: Any,
|
||||
KT: Type[Key],
|
||||
value: Any,
|
||||
KT: Type[Key],
|
||||
) -> dict[Key, Any]: ...
|
||||
|
||||
|
||||
def check_dict(
|
||||
value: Any,
|
||||
KT: Type[Key],
|
||||
VT: Optional[Type[Value]] = None,
|
||||
value: Any,
|
||||
KT: Type[Key],
|
||||
VT: Optional[Type[Value]] = None,
|
||||
) -> dict[Key, Value]:
|
||||
assert isinstance(value, dict)
|
||||
value2 = cast(dict[Any, Any], value)
|
||||
assert isinstance(value, dict)
|
||||
value2 = cast(dict[Any, Any], value)
|
||||
|
||||
assert all([
|
||||
isinstance(k, KT) and (
|
||||
VT is None or
|
||||
isinstance(v, VT)
|
||||
)
|
||||
for k, v in value2.items()
|
||||
])
|
||||
assert all([isinstance(k, KT) and (VT is None or isinstance(v, VT)) for k, v in value2.items()])
|
||||
|
||||
if VT is None:
|
||||
return cast(
|
||||
dict[Key, Any],
|
||||
value,
|
||||
)
|
||||
else:
|
||||
return cast(
|
||||
dict[Key, Value],
|
||||
value,
|
||||
)
|
||||
|
||||
if VT is None:
|
||||
return cast(
|
||||
dict[Key, Any],
|
||||
value,
|
||||
)
|
||||
else:
|
||||
return cast(
|
||||
dict[Key, Value],
|
||||
value,
|
||||
)
|
||||
|
||||
@overload
|
||||
def check_list(
|
||||
value: Any,
|
||||
VT: Type[Value],
|
||||
value: Any,
|
||||
VT: Type[Value],
|
||||
) -> list[Value]: ...
|
||||
|
||||
|
||||
@overload
|
||||
def check_list(
|
||||
value: Any,
|
||||
value: Any,
|
||||
) -> list[Any]: ...
|
||||
|
||||
|
||||
def check_list(
|
||||
value: Any,
|
||||
VT: Optional[Type[Value]] = None,
|
||||
value: Any,
|
||||
VT: Optional[Type[Value]] = None,
|
||||
) -> list[Value] | list[Any]:
|
||||
assert isinstance(value, list)
|
||||
value2 = cast(list[Any], value)
|
||||
assert isinstance(value, list)
|
||||
value2 = cast(list[Any], value)
|
||||
|
||||
assert all([
|
||||
(
|
||||
VT is None or
|
||||
isinstance(o, VT)
|
||||
)
|
||||
for o in value2
|
||||
])
|
||||
assert all([(VT is None or isinstance(o, VT)) for o in value2])
|
||||
|
||||
if VT is None:
|
||||
return cast(
|
||||
list[Any],
|
||||
value,
|
||||
)
|
||||
else:
|
||||
return cast(
|
||||
list[Value],
|
||||
value,
|
||||
)
|
||||
|
||||
if VT is None:
|
||||
return cast(
|
||||
list[Any],
|
||||
value,
|
||||
)
|
||||
else:
|
||||
return cast(
|
||||
list[Value],
|
||||
value,
|
||||
)
|
||||
|
||||
def pyproject_load(
|
||||
d: pathlib.Path,
|
||||
d: pathlib.Path,
|
||||
) -> PyProject:
|
||||
with io.open(d, 'rb') as f:
|
||||
content = toml_load(f)
|
||||
with io.open(d, 'rb') as f:
|
||||
content = toml_load(f)
|
||||
|
||||
assert isinstance(content, dict)
|
||||
assert isinstance(content, dict)
|
||||
|
||||
dependencies : dict[str, list[str]] = dict()
|
||||
dependencies: dict[str, list[str]] = dict()
|
||||
|
||||
dependencies['default'] = content['project']['dependencies']
|
||||
dependencies['default'] = content['project']['dependencies']
|
||||
|
||||
if (
|
||||
'optional-dependencies' in content['project']
|
||||
):
|
||||
assert isinstance(
|
||||
content['project']['optional-dependencies'],
|
||||
dict
|
||||
)
|
||||
if 'optional-dependencies' in content['project']:
|
||||
assert isinstance(content['project']['optional-dependencies'], dict)
|
||||
|
||||
for k, v in check_dict(
|
||||
check_dict(
|
||||
check_dict(
|
||||
content,
|
||||
str,
|
||||
# Any,
|
||||
)['project'],
|
||||
str,
|
||||
# Any,
|
||||
)['optional-dependencies'],
|
||||
str,
|
||||
list[Any],
|
||||
).items():
|
||||
# assert isinstance(v, list)
|
||||
# assert isinstance(k, str)
|
||||
for k, v in check_dict(
|
||||
check_dict(
|
||||
check_dict(
|
||||
content,
|
||||
str,
|
||||
# Any,
|
||||
)['project'],
|
||||
str,
|
||||
# Any,
|
||||
)['optional-dependencies'],
|
||||
str,
|
||||
list[Any],
|
||||
).items():
|
||||
# assert isinstance(v, list)
|
||||
# assert isinstance(k, str)
|
||||
|
||||
dependencies[k] = v
|
||||
dependencies[k] = v
|
||||
|
||||
res = PyProject(
|
||||
path=d,
|
||||
dependencies=dependencies,
|
||||
)
|
||||
|
||||
res = PyProject(
|
||||
path=d,
|
||||
dependencies=dependencies,
|
||||
)
|
||||
tool_name = 'online.fxreader.pr34'.replace('.', '-')
|
||||
|
||||
tool_name = 'online.fxreader.pr34'.replace('.', '-')
|
||||
if 'tool' in content:
|
||||
res.tool = check_dict(
|
||||
content['tool'],
|
||||
str,
|
||||
)
|
||||
|
||||
if (
|
||||
'tool' in content
|
||||
):
|
||||
res.tool = check_dict(
|
||||
content['tool'],
|
||||
str,
|
||||
)
|
||||
if 'tool' in content and isinstance(content['tool'], dict) and tool_name in content['tool'] and isinstance(content['tool'][tool_name], dict):
|
||||
pr34_tool = check_dict(
|
||||
check_dict(
|
||||
content['tool'],
|
||||
str,
|
||||
)[tool_name],
|
||||
str,
|
||||
)
|
||||
|
||||
if (
|
||||
'tool' in content and
|
||||
isinstance(
|
||||
content['tool'], dict
|
||||
) and
|
||||
tool_name in content['tool'] and
|
||||
isinstance(
|
||||
content['tool'][tool_name],
|
||||
dict
|
||||
)
|
||||
):
|
||||
pr34_tool = check_dict(
|
||||
check_dict(
|
||||
content['tool'],
|
||||
str,
|
||||
)[tool_name],
|
||||
str
|
||||
)
|
||||
if 'early_features' in pr34_tool:
|
||||
res.early_features = pr34_tool['early_features']
|
||||
|
||||
if 'early_features' in pr34_tool:
|
||||
res.early_features = pr34_tool['early_features']
|
||||
if 'pip_find_links' in pr34_tool:
|
||||
res.pip_find_links = [d.parent / pathlib.Path(o) for o in pr34_tool['pip_find_links']]
|
||||
|
||||
if 'pip_find_links' in pr34_tool:
|
||||
res.pip_find_links = [
|
||||
d.parent / pathlib.Path(o)
|
||||
for o in pr34_tool['pip_find_links']
|
||||
]
|
||||
if 'runtime_libdirs' in pr34_tool:
|
||||
res.runtime_libdirs = [
|
||||
d.parent / pathlib.Path(o)
|
||||
# pathlib.Path(o)
|
||||
for o in pr34_tool['runtime_libdirs']
|
||||
]
|
||||
|
||||
if 'runtime_libdirs' in pr34_tool:
|
||||
res.runtime_libdirs = [
|
||||
d.parent / pathlib.Path(o)
|
||||
# pathlib.Path(o)
|
||||
for o in pr34_tool['runtime_libdirs']
|
||||
]
|
||||
if 'runtime_preload' in pr34_tool:
|
||||
res.runtime_preload = [
|
||||
d.parent / pathlib.Path(o)
|
||||
# pathlib.Path(o)
|
||||
for o in pr34_tool['runtime_preload']
|
||||
]
|
||||
|
||||
if 'runtime_preload' in pr34_tool:
|
||||
res.runtime_preload = [
|
||||
d.parent / pathlib.Path(o)
|
||||
# pathlib.Path(o)
|
||||
for o in pr34_tool['runtime_preload']
|
||||
]
|
||||
if 'requirements' in pr34_tool:
|
||||
res.requirements = {
|
||||
k: d.parent / pathlib.Path(v)
|
||||
# pathlib.Path(o)
|
||||
for k, v in check_dict(pr34_tool['requirements'], str, str).items()
|
||||
}
|
||||
|
||||
if 'requirements' in pr34_tool:
|
||||
res.requirements = {
|
||||
k : d.parent / pathlib.Path(v)
|
||||
# pathlib.Path(o)
|
||||
for k, v in check_dict(
|
||||
pr34_tool['requirements'],
|
||||
str,
|
||||
str
|
||||
).items()
|
||||
}
|
||||
if 'modules' in pr34_tool:
|
||||
modules = check_list(pr34_tool['modules'])
|
||||
# res.modules = []
|
||||
|
||||
if 'modules' in pr34_tool:
|
||||
modules = check_list(
|
||||
pr34_tool['modules']
|
||||
)
|
||||
# res.modules = []
|
||||
for o in modules:
|
||||
assert isinstance(o, dict)
|
||||
assert 'name' in o and isinstance(o['name'], str)
|
||||
|
||||
for o in modules:
|
||||
assert isinstance(o, dict)
|
||||
assert 'name' in o and isinstance(o['name'], str)
|
||||
module = PyProject.Module(
|
||||
name=o['name'],
|
||||
)
|
||||
|
||||
module = PyProject.Module(
|
||||
name=o['name'],
|
||||
)
|
||||
if 'meson' in o:
|
||||
assert 'meson' in o and isinstance(o['meson'], str)
|
||||
|
||||
if 'meson' in o:
|
||||
assert 'meson' in o and isinstance(o['meson'], str)
|
||||
module.meson = pathlib.Path(o['meson'])
|
||||
|
||||
module.meson = pathlib.Path(o['meson'])
|
||||
if 'tool' in o:
|
||||
module.tool.update(
|
||||
check_dict(
|
||||
o['tool'],
|
||||
str,
|
||||
)
|
||||
)
|
||||
|
||||
if 'tool' in o:
|
||||
module.tool.update(
|
||||
check_dict(
|
||||
o['tool'],
|
||||
str,
|
||||
)
|
||||
)
|
||||
res.modules.append(module)
|
||||
|
||||
res.modules.append(module)
|
||||
return res
|
||||
|
||||
return res
|
||||
|
||||
@dataclasses.dataclass
|
||||
class BootstrapSettings:
|
||||
env_path: pathlib.Path
|
||||
python_path: pathlib.Path
|
||||
base_dir: pathlib.Path
|
||||
python_version: Optional[str] = dataclasses.field(
|
||||
default_factory=lambda : os.environ.get(
|
||||
'PYTHON_VERSION',
|
||||
'%d.%d' % (
|
||||
sys.version_info.major,
|
||||
sys.version_info.minor,
|
||||
),
|
||||
).strip()
|
||||
)
|
||||
pip_check_conflicts: Optional[bool] = dataclasses.field(
|
||||
default_factory=lambda : os.environ.get(
|
||||
'PIP_CHECK_CONFLICTS',
|
||||
json.dumps(True)
|
||||
) in [json.dumps(True)],
|
||||
)
|
||||
uv_args: list[str] = dataclasses.field(
|
||||
default_factory=lambda : os.environ.get(
|
||||
'UV_ARGS',
|
||||
'--offline',
|
||||
).split(),
|
||||
)
|
||||
env_path: pathlib.Path
|
||||
python_path: pathlib.Path
|
||||
base_dir: pathlib.Path
|
||||
python_version: Optional[str] = dataclasses.field(
|
||||
default_factory=lambda: os.environ.get(
|
||||
'PYTHON_VERSION',
|
||||
'%d.%d'
|
||||
% (
|
||||
sys.version_info.major,
|
||||
sys.version_info.minor,
|
||||
),
|
||||
).strip()
|
||||
)
|
||||
pip_check_conflicts: Optional[bool] = dataclasses.field(
|
||||
default_factory=lambda: os.environ.get('PIP_CHECK_CONFLICTS', json.dumps(True)) in [json.dumps(True)],
|
||||
)
|
||||
uv_args: list[str] = dataclasses.field(
|
||||
default_factory=lambda: os.environ.get(
|
||||
'UV_ARGS',
|
||||
'--offline',
|
||||
).split(),
|
||||
)
|
||||
|
||||
@classmethod
|
||||
def get(
|
||||
cls,
|
||||
base_dir: Optional[pathlib.Path] = None,
|
||||
) -> Self:
|
||||
if base_dir is None:
|
||||
base_dir = pathlib.Path.cwd()
|
||||
@classmethod
|
||||
def get(
|
||||
cls,
|
||||
base_dir: Optional[pathlib.Path] = None,
|
||||
) -> Self:
|
||||
if base_dir is None:
|
||||
base_dir = pathlib.Path.cwd()
|
||||
|
||||
env_path: Optional[pathlib.Path] = None
|
||||
if 'ENV_PATH' in os.environ:
|
||||
env_path = pathlib.Path(os.environ['ENV_PATH'])
|
||||
else:
|
||||
env_path = base_dir / '.venv'
|
||||
env_path: Optional[pathlib.Path] = None
|
||||
if 'ENV_PATH' in os.environ:
|
||||
env_path = pathlib.Path(os.environ['ENV_PATH'])
|
||||
else:
|
||||
env_path = base_dir / '.venv'
|
||||
|
||||
python_path = env_path / 'bin' / 'python3'
|
||||
python_path = env_path / 'bin' / 'python3'
|
||||
|
||||
return cls(
|
||||
base_dir=base_dir,
|
||||
env_path=env_path,
|
||||
python_path=python_path,
|
||||
)
|
||||
|
||||
return cls(
|
||||
base_dir=base_dir,
|
||||
env_path=env_path,
|
||||
python_path=python_path,
|
||||
)
|
||||
|
||||
class requirements_name_get_t:
|
||||
@dataclasses.dataclass
|
||||
class res_t:
|
||||
not_compiled : pathlib.Path
|
||||
compiled: pathlib.Path
|
||||
name: str
|
||||
@dataclasses.dataclass
|
||||
class res_t:
|
||||
not_compiled: pathlib.Path
|
||||
compiled: pathlib.Path
|
||||
name: str
|
||||
|
||||
|
||||
def requirements_name_get(
|
||||
source_dir: pathlib.Path,
|
||||
python_version: Optional[str],
|
||||
features: list[str],
|
||||
requirements: dict[str, pathlib.Path],
|
||||
source_dir: pathlib.Path,
|
||||
python_version: Optional[str],
|
||||
features: list[str],
|
||||
requirements: dict[str, pathlib.Path],
|
||||
) -> requirements_name_get_t.res_t:
|
||||
requirements_python_version: Optional[str] = None
|
||||
if not python_version is None:
|
||||
requirements_python_version = \
|
||||
python_version.replace('.', '_')
|
||||
requirements_python_version: Optional[str] = None
|
||||
if not python_version is None:
|
||||
requirements_python_version = python_version.replace('.', '_')
|
||||
|
||||
requirements_name = '_'.join(sorted(features))
|
||||
requirements_name = '_'.join(sorted(features))
|
||||
|
||||
if requirements_python_version:
|
||||
requirements_name += '_' + requirements_python_version
|
||||
if requirements_python_version:
|
||||
requirements_name += '_' + requirements_python_version
|
||||
|
||||
requirements_path : Optional[pathlib.Path] = None
|
||||
requirements_path: Optional[pathlib.Path] = None
|
||||
|
||||
if requirements_name in requirements:
|
||||
requirements_path = requirements[requirements_name]
|
||||
else:
|
||||
requirements_path = source_dir / 'requirements.txt'
|
||||
if requirements_name in requirements:
|
||||
requirements_path = requirements[requirements_name]
|
||||
else:
|
||||
requirements_path = source_dir / 'requirements.txt'
|
||||
|
||||
requirements_path_in = requirements_path.parent / (
|
||||
requirements_path.stem + '.in'
|
||||
)
|
||||
requirements_path_in = requirements_path.parent / (requirements_path.stem + '.in')
|
||||
|
||||
requirements_in : list[str] = []
|
||||
requirements_in: list[str] = []
|
||||
|
||||
return requirements_name_get_t.res_t(
|
||||
not_compiled=requirements_path_in,
|
||||
compiled=requirements_path,
|
||||
name=requirements_name,
|
||||
)
|
||||
|
||||
return requirements_name_get_t.res_t(
|
||||
not_compiled=requirements_path_in,
|
||||
compiled=requirements_path,
|
||||
name=requirements_name,
|
||||
)
|
||||
|
||||
def env_bootstrap(
|
||||
bootstrap_settings: BootstrapSettings,
|
||||
pyproject: PyProject,
|
||||
bootstrap_settings: BootstrapSettings,
|
||||
pyproject: PyProject,
|
||||
) -> None:
|
||||
pip_find_links : list[pathlib.Path] = []
|
||||
pip_find_links: list[pathlib.Path] = []
|
||||
|
||||
if not pyproject.pip_find_links is None:
|
||||
pip_find_links.extend(pyproject.pip_find_links)
|
||||
if not pyproject.pip_find_links is None:
|
||||
pip_find_links.extend(pyproject.pip_find_links)
|
||||
|
||||
pip_find_links_args = sum([
|
||||
['-f', str(o),]
|
||||
for o in pip_find_links
|
||||
], cast(list[str], []))
|
||||
pip_find_links_args = sum(
|
||||
[
|
||||
[
|
||||
'-f',
|
||||
str(o),
|
||||
]
|
||||
for o in pip_find_links
|
||||
],
|
||||
cast(list[str], []),
|
||||
)
|
||||
|
||||
features : list[str] = []
|
||||
features: list[str] = []
|
||||
|
||||
if pyproject.early_features:
|
||||
features.extend(pyproject.early_features)
|
||||
if pyproject.early_features:
|
||||
features.extend(pyproject.early_features)
|
||||
|
||||
requirements_name_get_res = requirements_name_get(
|
||||
python_version=bootstrap_settings.python_version,
|
||||
features=features,
|
||||
requirements=pyproject.requirements,
|
||||
source_dir=pyproject.path.parent,
|
||||
)
|
||||
requirements_path = requirements_name_get_res.compiled
|
||||
requirements_name_get_res = requirements_name_get(
|
||||
python_version=bootstrap_settings.python_version,
|
||||
features=features,
|
||||
requirements=pyproject.requirements,
|
||||
source_dir=pyproject.path.parent,
|
||||
)
|
||||
requirements_path = requirements_name_get_res.compiled
|
||||
|
||||
requirements_in : list[str] = []
|
||||
requirements_in: list[str] = []
|
||||
|
||||
requirements_in.extend([
|
||||
'uv', 'pip', 'build', 'setuptools', 'meson-python', 'pybind11'
|
||||
])
|
||||
requirements_in.extend(['uv', 'pip', 'build', 'setuptools', 'meson-python', 'pybind11'])
|
||||
|
||||
if pyproject.early_features:
|
||||
early_dependencies = sum([
|
||||
pyproject.dependencies[o]
|
||||
for o in pyproject.early_features
|
||||
], cast(list[str], []))
|
||||
if pyproject.early_features:
|
||||
early_dependencies = sum([pyproject.dependencies[o] for o in pyproject.early_features], cast(list[str], []))
|
||||
|
||||
logger.info(dict(
|
||||
requirements_name_get_res=requirements_name_get_res,
|
||||
early_dependencies=early_dependencies,
|
||||
))
|
||||
logger.info(
|
||||
dict(
|
||||
requirements_name_get_res=requirements_name_get_res,
|
||||
early_dependencies=early_dependencies,
|
||||
)
|
||||
)
|
||||
|
||||
requirements_in.extend(early_dependencies)
|
||||
# if len(early_dependencies) > 0:
|
||||
# subprocess.check_call([
|
||||
# bootstrap_settings.python_path,
|
||||
# '-m',
|
||||
# 'uv', 'pip', 'install',
|
||||
# *pip_find_links_args,
|
||||
# # '-f', str(pathlib.Path(__file__).parent / 'deps' / 'dist'),
|
||||
# *bootstrap_settings.uv_args,
|
||||
# *early_dependencies,
|
||||
# ])
|
||||
requirements_in.extend(early_dependencies)
|
||||
# if len(early_dependencies) > 0:
|
||||
# subprocess.check_call([
|
||||
# bootstrap_settings.python_path,
|
||||
# '-m',
|
||||
# 'uv', 'pip', 'install',
|
||||
# *pip_find_links_args,
|
||||
# # '-f', str(pathlib.Path(__file__).parent / 'deps' / 'dist'),
|
||||
# *bootstrap_settings.uv_args,
|
||||
# *early_dependencies,
|
||||
# ])
|
||||
|
||||
if not requirements_path.exists():
|
||||
with tempfile.NamedTemporaryFile(
|
||||
mode='w',
|
||||
prefix='requirements',
|
||||
suffix='.in',
|
||||
) as f:
|
||||
f.write(
|
||||
'\n'.join(requirements_in)
|
||||
)
|
||||
f.flush()
|
||||
if not requirements_path.exists():
|
||||
with tempfile.NamedTemporaryFile(
|
||||
mode='w',
|
||||
prefix='requirements',
|
||||
suffix='.in',
|
||||
) as f:
|
||||
f.write('\n'.join(requirements_in))
|
||||
f.flush()
|
||||
|
||||
subprocess.check_call([
|
||||
'uv',
|
||||
'pip',
|
||||
'compile',
|
||||
'--generate-hashes',
|
||||
*pip_find_links_args,
|
||||
# '-p',
|
||||
# bootstrap_settings.python_path,
|
||||
*bootstrap_settings.uv_args,
|
||||
'-o', str(requirements_path),
|
||||
f.name,
|
||||
])
|
||||
subprocess.check_call(
|
||||
[
|
||||
'uv',
|
||||
'pip',
|
||||
'compile',
|
||||
'--generate-hashes',
|
||||
*pip_find_links_args,
|
||||
# '-p',
|
||||
# bootstrap_settings.python_path,
|
||||
*bootstrap_settings.uv_args,
|
||||
'-o',
|
||||
str(requirements_path),
|
||||
f.name,
|
||||
]
|
||||
)
|
||||
|
||||
uv_python_version: list[str] = []
|
||||
uv_python_version: list[str] = []
|
||||
|
||||
if not bootstrap_settings.python_version is None:
|
||||
uv_python_version.extend([
|
||||
'-p', bootstrap_settings.python_version,
|
||||
])
|
||||
if not bootstrap_settings.python_version is None:
|
||||
uv_python_version.extend(
|
||||
[
|
||||
'-p',
|
||||
bootstrap_settings.python_version,
|
||||
]
|
||||
)
|
||||
|
||||
subprocess.check_call([
|
||||
'uv', 'venv',
|
||||
*uv_python_version,
|
||||
*pip_find_links_args,
|
||||
# '--seed',
|
||||
*bootstrap_settings.uv_args,
|
||||
str(bootstrap_settings.env_path)
|
||||
])
|
||||
subprocess.check_call(
|
||||
[
|
||||
'uv',
|
||||
'venv',
|
||||
*uv_python_version,
|
||||
*pip_find_links_args,
|
||||
# '--seed',
|
||||
*bootstrap_settings.uv_args,
|
||||
str(bootstrap_settings.env_path),
|
||||
]
|
||||
)
|
||||
|
||||
subprocess.check_call([
|
||||
'uv',
|
||||
'pip',
|
||||
'install',
|
||||
*pip_find_links_args,
|
||||
'-p',
|
||||
bootstrap_settings.python_path,
|
||||
'--require-hashes',
|
||||
*bootstrap_settings.uv_args,
|
||||
'-r', str(requirements_path),
|
||||
])
|
||||
subprocess.check_call(
|
||||
[
|
||||
'uv',
|
||||
'pip',
|
||||
'install',
|
||||
*pip_find_links_args,
|
||||
'-p',
|
||||
bootstrap_settings.python_path,
|
||||
'--require-hashes',
|
||||
*bootstrap_settings.uv_args,
|
||||
'-r',
|
||||
str(requirements_path),
|
||||
]
|
||||
)
|
||||
|
||||
if bootstrap_settings.pip_check_conflicts:
|
||||
subprocess.check_call([
|
||||
bootstrap_settings.python_path,
|
||||
'-m',
|
||||
'online.fxreader.pr34.commands',
|
||||
'pip_check_conflicts',
|
||||
])
|
||||
if bootstrap_settings.pip_check_conflicts:
|
||||
subprocess.check_call(
|
||||
[
|
||||
bootstrap_settings.python_path,
|
||||
'-m',
|
||||
'online.fxreader.pr34.commands',
|
||||
'pip_check_conflicts',
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
def paths_equal(a: pathlib.Path | str, b: pathlib.Path | str) -> bool:
|
||||
return os.path.abspath(str(a)) == os.path.abspath(str(b))
|
||||
|
||||
def paths_equal(
|
||||
a: pathlib.Path | str,
|
||||
b: pathlib.Path | str
|
||||
) -> bool:
|
||||
return (
|
||||
os.path.abspath(str(a)) ==
|
||||
os.path.abspath(str(b))
|
||||
)
|
||||
|
||||
def run(
|
||||
d: Optional[pathlib.Path] = None,
|
||||
cli_path: Optional[pathlib.Path] = None,
|
||||
d: Optional[pathlib.Path] = None,
|
||||
cli_path: Optional[pathlib.Path] = None,
|
||||
) -> None:
|
||||
if cli_path is None:
|
||||
cli_path = pathlib.Path(__file__).parent / 'cli.py'
|
||||
if cli_path is None:
|
||||
cli_path = pathlib.Path(__file__).parent / 'cli.py'
|
||||
|
||||
if d is None:
|
||||
d = pathlib.Path(__file__).parent / 'pyproject.toml'
|
||||
if d is None:
|
||||
d = pathlib.Path(__file__).parent / 'pyproject.toml'
|
||||
|
||||
bootstrap_settings = BootstrapSettings.get()
|
||||
bootstrap_settings = BootstrapSettings.get()
|
||||
|
||||
pyproject : PyProject = pyproject_load(
|
||||
d
|
||||
)
|
||||
pyproject: PyProject = pyproject_load(d)
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
|
||||
if not bootstrap_settings.env_path.exists():
|
||||
env_bootstrap(
|
||||
bootstrap_settings=bootstrap_settings,
|
||||
pyproject=pyproject,
|
||||
)
|
||||
if not bootstrap_settings.env_path.exists():
|
||||
env_bootstrap(
|
||||
bootstrap_settings=bootstrap_settings,
|
||||
pyproject=pyproject,
|
||||
)
|
||||
|
||||
logger.info([sys.executable, sys.argv, bootstrap_settings.python_path])
|
||||
logger.info([sys.executable, sys.argv, bootstrap_settings.python_path])
|
||||
|
||||
if not paths_equal(sys.executable, bootstrap_settings.python_path):
|
||||
os.execv(
|
||||
str(bootstrap_settings.python_path),
|
||||
[
|
||||
str(bootstrap_settings.python_path),
|
||||
*sys.argv,
|
||||
]
|
||||
)
|
||||
if not paths_equal(sys.executable, bootstrap_settings.python_path):
|
||||
os.execv(
|
||||
str(bootstrap_settings.python_path),
|
||||
[
|
||||
str(bootstrap_settings.python_path),
|
||||
*sys.argv,
|
||||
],
|
||||
)
|
||||
|
||||
os.execv(
|
||||
str(bootstrap_settings.python_path),
|
||||
[
|
||||
str(bootstrap_settings.python_path),
|
||||
str(cli_path),
|
||||
*sys.argv[1:],
|
||||
],
|
||||
)
|
||||
|
||||
os.execv(
|
||||
str(bootstrap_settings.python_path),
|
||||
[
|
||||
str(bootstrap_settings.python_path),
|
||||
str(
|
||||
cli_path
|
||||
),
|
||||
*sys.argv[1:],
|
||||
]
|
||||
)
|
||||
|
||||
if __name__ == '__main__':
|
||||
run()
|
||||
run()
|
||||
|
||||
@ -4,88 +4,95 @@ import os
|
||||
import cryptography.hazmat.primitives.kdf.scrypt
|
||||
import cryptography.exceptions
|
||||
|
||||
from typing import (Literal, overload, Optional,)
|
||||
from typing import (
|
||||
Literal,
|
||||
overload,
|
||||
Optional,
|
||||
)
|
||||
|
||||
|
||||
class PasswordUtils:
|
||||
@overload
|
||||
@classmethod
|
||||
def secret_hash(
|
||||
cls,
|
||||
secret: str | bytes,
|
||||
mode: Literal['base64'],
|
||||
salt: Optional[bytes] = None,
|
||||
) -> tuple[str, str]: ...
|
||||
@overload
|
||||
@classmethod
|
||||
def secret_hash(
|
||||
cls,
|
||||
secret: str | bytes,
|
||||
mode: Literal['base64'],
|
||||
salt: Optional[bytes] = None,
|
||||
) -> tuple[str, str]: ...
|
||||
|
||||
@overload
|
||||
@classmethod
|
||||
def secret_hash(
|
||||
cls,
|
||||
secret: str | bytes,
|
||||
mode: Literal['bytes'],
|
||||
salt: Optional[bytes] = None,
|
||||
) -> tuple[bytes, bytes]: ...
|
||||
@overload
|
||||
@classmethod
|
||||
def secret_hash(
|
||||
cls,
|
||||
secret: str | bytes,
|
||||
mode: Literal['bytes'],
|
||||
salt: Optional[bytes] = None,
|
||||
) -> tuple[bytes, bytes]: ...
|
||||
|
||||
@classmethod
|
||||
def secret_hash(
|
||||
cls,
|
||||
secret: str | bytes,
|
||||
mode: Literal['bytes', 'base64'],
|
||||
salt: Optional[bytes] = None,
|
||||
) -> tuple[str, str] | tuple[bytes, bytes]:
|
||||
if salt is None:
|
||||
salt = os.urandom(16)
|
||||
@classmethod
|
||||
def secret_hash(
|
||||
cls,
|
||||
secret: str | bytes,
|
||||
mode: Literal['bytes', 'base64'],
|
||||
salt: Optional[bytes] = None,
|
||||
) -> tuple[str, str] | tuple[bytes, bytes]:
|
||||
if salt is None:
|
||||
salt = os.urandom(16)
|
||||
|
||||
if isinstance(secret, str):
|
||||
secret = secret.encode('utf-8')
|
||||
# derive
|
||||
kdf = cls._scrypt_init(salt=salt)
|
||||
if isinstance(secret, str):
|
||||
secret = secret.encode('utf-8')
|
||||
# derive
|
||||
kdf = cls._scrypt_init(salt=salt)
|
||||
|
||||
hashed_secret = kdf.derive(secret)
|
||||
hashed_secret = kdf.derive(secret)
|
||||
|
||||
if mode == 'bytes':
|
||||
return (salt, hashed_secret)
|
||||
elif mode == 'base64':
|
||||
res_tuple = tuple((
|
||||
base64.b64encode(o).decode('utf-8')
|
||||
for o in (salt, hashed_secret,)
|
||||
))
|
||||
return (res_tuple[0], res_tuple[1])
|
||||
else:
|
||||
raise NotImplementedError
|
||||
if mode == 'bytes':
|
||||
return (salt, hashed_secret)
|
||||
elif mode == 'base64':
|
||||
res_tuple = tuple(
|
||||
(
|
||||
base64.b64encode(o).decode('utf-8')
|
||||
for o in (
|
||||
salt,
|
||||
hashed_secret,
|
||||
)
|
||||
)
|
||||
)
|
||||
return (res_tuple[0], res_tuple[1])
|
||||
else:
|
||||
raise NotImplementedError
|
||||
|
||||
@classmethod
|
||||
def _scrypt_init(
|
||||
cls,
|
||||
salt: bytes
|
||||
) -> cryptography.hazmat.primitives.kdf.scrypt.Scrypt:
|
||||
return cryptography.hazmat.primitives.kdf.scrypt.Scrypt(
|
||||
salt=salt,
|
||||
length=32,
|
||||
n=2**14,
|
||||
r=8,
|
||||
p=1,
|
||||
)
|
||||
@classmethod
|
||||
def _scrypt_init(cls, salt: bytes) -> cryptography.hazmat.primitives.kdf.scrypt.Scrypt:
|
||||
return cryptography.hazmat.primitives.kdf.scrypt.Scrypt(
|
||||
salt=salt,
|
||||
length=32,
|
||||
n=2**14,
|
||||
r=8,
|
||||
p=1,
|
||||
)
|
||||
|
||||
@classmethod
|
||||
def secret_check(
|
||||
cls,
|
||||
secret: str | bytes,
|
||||
salt: str | bytes,
|
||||
hashed_secret: str | bytes,
|
||||
) -> bool:
|
||||
if isinstance(salt, str):
|
||||
salt = base64.b64decode(salt)
|
||||
@classmethod
|
||||
def secret_check(
|
||||
cls,
|
||||
secret: str | bytes,
|
||||
salt: str | bytes,
|
||||
hashed_secret: str | bytes,
|
||||
) -> bool:
|
||||
if isinstance(salt, str):
|
||||
salt = base64.b64decode(salt)
|
||||
|
||||
if isinstance(secret, str):
|
||||
secret = secret.encode('utf-8')
|
||||
if isinstance(secret, str):
|
||||
secret = secret.encode('utf-8')
|
||||
|
||||
if isinstance(hashed_secret, str):
|
||||
hashed_secret = base64.b64decode(hashed_secret)
|
||||
if isinstance(hashed_secret, str):
|
||||
hashed_secret = base64.b64decode(hashed_secret)
|
||||
|
||||
kdf = cls._scrypt_init(salt=salt)
|
||||
kdf = cls._scrypt_init(salt=salt)
|
||||
|
||||
try:
|
||||
kdf.verify(secret, hashed_secret)
|
||||
return True
|
||||
except cryptography.exceptions.InvalidKey:
|
||||
return False
|
||||
try:
|
||||
kdf.verify(secret, hashed_secret)
|
||||
return True
|
||||
except cryptography.exceptions.InvalidKey:
|
||||
return False
|
||||
|
||||
@ -1,35 +1,39 @@
|
||||
import os
|
||||
import logging
|
||||
|
||||
from typing import (Optional,)
|
||||
from typing import (
|
||||
Optional,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class DebugPy:
|
||||
@classmethod
|
||||
def set_trace(
|
||||
cls,
|
||||
host: Optional[str] = None,
|
||||
port: Optional[int] = None,
|
||||
wait: Optional[bool] = None,
|
||||
) -> None:
|
||||
if host is None:
|
||||
host = '127.0.0.1'
|
||||
if port is None:
|
||||
port = 4444
|
||||
if wait is None:
|
||||
wait = True
|
||||
@classmethod
|
||||
def set_trace(
|
||||
cls,
|
||||
host: Optional[str] = None,
|
||||
port: Optional[int] = None,
|
||||
wait: Optional[bool] = None,
|
||||
) -> None:
|
||||
if host is None:
|
||||
host = '127.0.0.1'
|
||||
if port is None:
|
||||
port = 4444
|
||||
if wait is None:
|
||||
wait = True
|
||||
|
||||
import debugpy
|
||||
import debugpy
|
||||
|
||||
if os.environ.get('DEBUGPY_RUNNING') != 'true':
|
||||
logger.info('debugpy init')
|
||||
import debugpy
|
||||
debugpy.listen((host, port))
|
||||
os.environ['DEBUGPY_RUNNING'] = 'true'
|
||||
if os.environ.get('DEBUGPY_RUNNING') != 'true':
|
||||
logger.info('debugpy init')
|
||||
import debugpy
|
||||
|
||||
if wait:
|
||||
debugpy.wait_for_client()
|
||||
debugpy.breakpoint()
|
||||
debugpy.listen((host, port))
|
||||
os.environ['DEBUGPY_RUNNING'] = 'true'
|
||||
|
||||
logger.info('debugpy done')
|
||||
if wait:
|
||||
debugpy.wait_for_client()
|
||||
debugpy.breakpoint()
|
||||
|
||||
logger.info('debugpy done')
|
||||
|
||||
@ -1,16 +1,14 @@
|
||||
import logging
|
||||
from typing import (Optional,)
|
||||
from typing import (
|
||||
Optional,
|
||||
)
|
||||
|
||||
|
||||
def setup(level: Optional[int] = None) -> None:
|
||||
if level is None:
|
||||
level = logging.INFO
|
||||
if level is None:
|
||||
level = logging.INFO
|
||||
|
||||
logging.basicConfig(
|
||||
level=level,
|
||||
format=(
|
||||
'%(levelname)s:%(name)s:%(message)s'
|
||||
':%(process)d'
|
||||
':%(asctime)s'
|
||||
':%(pathname)s:%(funcName)s:%(lineno)s'
|
||||
),
|
||||
)
|
||||
logging.basicConfig(
|
||||
level=level,
|
||||
format=('%(levelname)s:%(name)s:%(message)s:%(process)d:%(asctime)s:%(pathname)s:%(funcName)s:%(lineno)s'),
|
||||
)
|
||||
|
||||
@ -9,208 +9,232 @@ import logging
|
||||
import sys
|
||||
import argparse
|
||||
|
||||
from pydantic import (Field,)
|
||||
from pydantic import (
|
||||
Field,
|
||||
)
|
||||
|
||||
from typing import (ClassVar, Generator, Annotated, Optional, Any,)
|
||||
from typing import (
|
||||
ClassVar,
|
||||
Generator,
|
||||
Annotated,
|
||||
Optional,
|
||||
Any,
|
||||
)
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@pydantic.dataclasses.dataclass
|
||||
class MypyFormatEntry:
|
||||
name : str
|
||||
value : str
|
||||
name: str
|
||||
value: str
|
||||
|
||||
def __eq__(self, other: object) -> bool:
|
||||
if not isinstance(other, type(self)):
|
||||
raise NotImplementedError
|
||||
def __eq__(self, other: object) -> bool:
|
||||
if not isinstance(other, type(self)):
|
||||
raise NotImplementedError
|
||||
|
||||
return self.value == other.value
|
||||
|
||||
return self.value == other.value
|
||||
|
||||
class MypyFormat:
|
||||
vscode : ClassVar[MypyFormatEntry] = MypyFormatEntry(name='vscode', value='vscode')
|
||||
json : ClassVar[MypyFormatEntry] = MypyFormatEntry(name='json', value='json')
|
||||
vscode: ClassVar[MypyFormatEntry] = MypyFormatEntry(name='vscode', value='vscode')
|
||||
json: ClassVar[MypyFormatEntry] = MypyFormatEntry(name='json', value='json')
|
||||
|
||||
@classmethod
|
||||
def from_value(cls, value: str) -> MypyFormatEntry:
|
||||
for e in cls.entries():
|
||||
if value == e.value:
|
||||
return e
|
||||
|
||||
@classmethod
|
||||
def from_value(cls, value: str) -> MypyFormatEntry:
|
||||
for e in cls.entries():
|
||||
if value == e.value:
|
||||
return e
|
||||
raise NotImplementedError
|
||||
|
||||
raise NotImplementedError
|
||||
@classmethod
|
||||
def entries(
|
||||
cls,
|
||||
) -> Generator[
|
||||
MypyFormatEntry,
|
||||
None,
|
||||
None,
|
||||
]:
|
||||
for o in dir(cls):
|
||||
e = getattr(cls, o)
|
||||
if not isinstance(e, MypyFormatEntry):
|
||||
continue
|
||||
|
||||
@classmethod
|
||||
def entries(cls) -> Generator[MypyFormatEntry, None, None,]:
|
||||
for o in dir(cls):
|
||||
e = getattr(cls, o)
|
||||
if not isinstance(e, MypyFormatEntry):
|
||||
continue
|
||||
yield e
|
||||
|
||||
yield e
|
||||
|
||||
class MypySettings(pydantic_settings.BaseSettings):
|
||||
model_config = pydantic_settings.SettingsConfigDict(
|
||||
env_prefix='online_fxreader_pr34_mypy_',
|
||||
case_sensitive=False,
|
||||
)
|
||||
model_config = pydantic_settings.SettingsConfigDict(
|
||||
env_prefix='online_fxreader_pr34_mypy_',
|
||||
case_sensitive=False,
|
||||
)
|
||||
|
||||
config_path: pathlib.Path = pathlib.Path.cwd() / '.mypy.ini'
|
||||
max_errors: dict[str, int] = dict()
|
||||
paths: Annotated[list[pathlib.Path], Field(default_factory=lambda: ['.'])]
|
||||
|
||||
config_path : pathlib.Path = pathlib.Path.cwd() / '.mypy.ini'
|
||||
max_errors : dict[str, int] = dict()
|
||||
paths : Annotated[list[pathlib.Path], Field(default_factory=lambda : ['.'])]
|
||||
|
||||
def run(
|
||||
argv: Optional[list[str]] = None,
|
||||
settings: Optional[MypySettings] = None,
|
||||
argv: Optional[list[str]] = None,
|
||||
settings: Optional[MypySettings] = None,
|
||||
) -> None:
|
||||
if argv is None:
|
||||
argv = []
|
||||
if argv is None:
|
||||
argv = []
|
||||
|
||||
if settings is None:
|
||||
settings = MypySettings.model_validate(dict())
|
||||
if settings is None:
|
||||
settings = MypySettings.model_validate(dict())
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
'-q', '--quiet',
|
||||
dest='quiet',
|
||||
action='store_true',
|
||||
help='do not print anything if the program is correct according to max_errors limits',
|
||||
default=False,
|
||||
)
|
||||
parser.add_argument(
|
||||
'-i',
|
||||
dest='paths',
|
||||
help='specify paths to check',
|
||||
default=[],
|
||||
action='append',
|
||||
)
|
||||
parser.add_argument(
|
||||
'-f', '--format',
|
||||
dest='_format',
|
||||
help='output format of errors',
|
||||
default=MypyFormat.json.value,
|
||||
choices=[
|
||||
o.value
|
||||
for o in MypyFormat.entries()
|
||||
],
|
||||
)
|
||||
options, args = parser.parse_known_args(argv)
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
'-q',
|
||||
'--quiet',
|
||||
dest='quiet',
|
||||
action='store_true',
|
||||
help='do not print anything if the program is correct according to max_errors limits',
|
||||
default=False,
|
||||
)
|
||||
parser.add_argument(
|
||||
'-i',
|
||||
dest='paths',
|
||||
help='specify paths to check',
|
||||
default=[],
|
||||
action='append',
|
||||
)
|
||||
parser.add_argument(
|
||||
'-f',
|
||||
'--format',
|
||||
dest='_format',
|
||||
help='output format of errors',
|
||||
default=MypyFormat.json.value,
|
||||
choices=[o.value for o in MypyFormat.entries()],
|
||||
)
|
||||
options, args = parser.parse_known_args(argv)
|
||||
|
||||
if len(args) > 0 and args[0] == '--':
|
||||
del args[0]
|
||||
if len(args) > 0 and args[0] == '--':
|
||||
del args[0]
|
||||
|
||||
options.format = MypyFormat.from_value(options._format)
|
||||
options.format = MypyFormat.from_value(options._format)
|
||||
|
||||
if len(options.paths) == 0:
|
||||
options.paths.extend(settings.paths)
|
||||
if len(options.paths) == 0:
|
||||
options.paths.extend(settings.paths)
|
||||
|
||||
started_at = datetime.datetime.now()
|
||||
started_at = datetime.datetime.now()
|
||||
|
||||
mypy_cmd = [
|
||||
sys.executable,
|
||||
'-m',
|
||||
'mypy',
|
||||
'--config-file', str(settings.config_path),
|
||||
'--strict',
|
||||
'-O',
|
||||
'json',
|
||||
*args,
|
||||
*options.paths,
|
||||
]
|
||||
mypy_cmd = [
|
||||
sys.executable,
|
||||
'-m',
|
||||
'mypy',
|
||||
'--config-file',
|
||||
str(settings.config_path),
|
||||
'--strict',
|
||||
'-O',
|
||||
'json',
|
||||
*args,
|
||||
*options.paths,
|
||||
]
|
||||
|
||||
logger.info(dict(cmd=mypy_cmd))
|
||||
|
||||
logger.info(dict(cmd=mypy_cmd))
|
||||
res = subprocess.run(
|
||||
mypy_cmd,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
)
|
||||
|
||||
res = subprocess.run(
|
||||
mypy_cmd,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
)
|
||||
done_at = datetime.datetime.now()
|
||||
|
||||
done_at = datetime.datetime.now()
|
||||
try:
|
||||
assert not res.returncode is None
|
||||
|
||||
try:
|
||||
assert not res.returncode is None
|
||||
errors = sorted(
|
||||
[json.loads(o) for o in res.stdout.decode('utf-8').splitlines() if not o.strip() == ''],
|
||||
key=lambda x: (
|
||||
x.get('file', ''),
|
||||
x.get('line', 0),
|
||||
),
|
||||
)
|
||||
|
||||
errors = sorted([
|
||||
json.loads(o)
|
||||
for o in res.stdout.decode('utf-8').splitlines()
|
||||
if not o.strip() == ''
|
||||
], key=lambda x: (
|
||||
x.get('file', ''),
|
||||
x.get('line', 0),
|
||||
))
|
||||
if not options.quiet:
|
||||
if (len(res.stderr)) > 0:
|
||||
logger.error(res.stderr.decode('utf-8'))
|
||||
except:
|
||||
logger.exception('')
|
||||
logger.error(res.stdout.decode('utf-8'))
|
||||
logger.error(res.stderr.decode('utf-8'))
|
||||
sys.exit(res.returncode)
|
||||
|
||||
if not options.quiet:
|
||||
if (len(res.stderr)) > 0:
|
||||
logger.error(res.stderr.decode('utf-8'))
|
||||
except:
|
||||
logger.exception('')
|
||||
logger.error(res.stdout.decode('utf-8'))
|
||||
logger.error(res.stderr.decode('utf-8'))
|
||||
sys.exit(res.returncode)
|
||||
g: dict[str, Any] = dict()
|
||||
for o in errors:
|
||||
if not o['file'] in g:
|
||||
g[o['file']] = []
|
||||
g[o['file']].append(o)
|
||||
|
||||
h = {
|
||||
k: len(v)
|
||||
for k, v in sorted(
|
||||
list(g.items()),
|
||||
key=lambda x: x[0],
|
||||
)
|
||||
}
|
||||
|
||||
g : dict[str, Any] = dict()
|
||||
for o in errors:
|
||||
if not o['file'] in g:
|
||||
g[o['file']] = []
|
||||
g[o['file']].append(o)
|
||||
mentioned_paths = marisa_trie.Trie(list(h))
|
||||
|
||||
h = {
|
||||
k : len(v)
|
||||
for k, v in sorted(
|
||||
list(g.items()),
|
||||
key=lambda x: x[0],
|
||||
)
|
||||
}
|
||||
violated_limits: dict[str, str] = dict()
|
||||
|
||||
mentioned_paths = marisa_trie.Trie(list(h))
|
||||
for k, v in settings.max_errors.items():
|
||||
matching_paths = mentioned_paths.keys(k)
|
||||
total_errors = sum([h[o] for o in matching_paths], 0)
|
||||
|
||||
violated_limits : dict[str, str] = dict()
|
||||
if total_errors > v:
|
||||
violated_limits[k] = '%s - [%s]: has %d errors > %d' % (
|
||||
k,
|
||||
', '.join(matching_paths),
|
||||
total_errors,
|
||||
v,
|
||||
)
|
||||
|
||||
for k, v in settings.max_errors.items():
|
||||
matching_paths = mentioned_paths.keys(k)
|
||||
total_errors = sum([
|
||||
h[o]
|
||||
for o in matching_paths
|
||||
], 0)
|
||||
if len(violated_limits) > 0 or not options.quiet:
|
||||
if options.format == MypyFormat.vscode:
|
||||
for o in errors:
|
||||
sys.stdout.write(
|
||||
'[%s] %s:%d,%d %s - %s - %s\n'
|
||||
% (
|
||||
o['severity'],
|
||||
o['file'],
|
||||
o['line'],
|
||||
o['column'],
|
||||
o['message'],
|
||||
o['hint'],
|
||||
o['code'],
|
||||
)
|
||||
)
|
||||
sys.stdout.flush()
|
||||
# logger.info(json.dumps(errors, indent=4))
|
||||
else:
|
||||
logger.info(json.dumps(errors, indent=4))
|
||||
|
||||
if total_errors > v:
|
||||
violated_limits[k] = '%s - [%s]: has %d errors > %d' % (
|
||||
k, ', '.join(matching_paths), total_errors, v,
|
||||
)
|
||||
# if len(violated_limits) > 0:
|
||||
# logger.info(json.dumps(violated_limits, indent=4))
|
||||
logger.info(
|
||||
json.dumps(
|
||||
dict(
|
||||
max_errors=settings.max_errors,
|
||||
violated_limits=violated_limits,
|
||||
histogram=h,
|
||||
elapsed=(done_at - started_at).total_seconds(),
|
||||
),
|
||||
indent=4,
|
||||
)
|
||||
)
|
||||
|
||||
if len(violated_limits) > 0 or not options.quiet:
|
||||
if options.format == MypyFormat.vscode:
|
||||
for o in errors:
|
||||
sys.stdout.write('[%s] %s:%d,%d %s - %s - %s\n' % (
|
||||
o['severity'],
|
||||
o['file'],
|
||||
o['line'],
|
||||
o['column'],
|
||||
o['message'],
|
||||
o['hint'],
|
||||
o['code'],
|
||||
))
|
||||
sys.stdout.flush()
|
||||
#logger.info(json.dumps(errors, indent=4))
|
||||
else:
|
||||
logger.info(json.dumps(errors, indent=4))
|
||||
if len(violated_limits) > 0:
|
||||
sys.exit(1)
|
||||
|
||||
#if len(violated_limits) > 0:
|
||||
# logger.info(json.dumps(violated_limits, indent=4))
|
||||
logger.info(json.dumps(dict(
|
||||
max_errors=settings.max_errors,
|
||||
violated_limits=violated_limits,
|
||||
histogram=h,
|
||||
elapsed=(done_at - started_at).total_seconds(),
|
||||
), indent=4))
|
||||
|
||||
if len(violated_limits) > 0:
|
||||
sys.exit(1)
|
||||
|
||||
if __name__ == '__main__':
|
||||
from . import logging as _logging
|
||||
_logging.setup()
|
||||
run(sys.argv[1:])
|
||||
from . import logging as _logging
|
||||
|
||||
_logging.setup()
|
||||
run(sys.argv[1:])
|
||||
|
||||
@ -11,112 +11,115 @@ import dataclasses
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
from typing import (overload, Optional, Literal, Any, Annotated,)
|
||||
from typing import (
|
||||
overload,
|
||||
Optional,
|
||||
Literal,
|
||||
Any,
|
||||
Annotated,
|
||||
)
|
||||
|
||||
from .cli_bootstrap import PyProject
|
||||
|
||||
|
||||
@overload
|
||||
def shutil_which(
|
||||
name: str,
|
||||
raise_on_failure: Literal[True],
|
||||
name: str,
|
||||
raise_on_failure: Literal[True],
|
||||
) -> str: ...
|
||||
|
||||
|
||||
@overload
|
||||
def shutil_which(
|
||||
name: str,
|
||||
raise_on_failure: bool,
|
||||
name: str,
|
||||
raise_on_failure: bool,
|
||||
) -> Optional[str]: ...
|
||||
|
||||
|
||||
def shutil_which(
|
||||
name: str,
|
||||
raise_on_failure: bool,
|
||||
name: str,
|
||||
raise_on_failure: bool,
|
||||
) -> Optional[str]:
|
||||
res = shutil.which(name)
|
||||
if res is None and raise_on_failure:
|
||||
raise NotImplementedError
|
||||
else:
|
||||
return res
|
||||
res = shutil.which(name)
|
||||
if res is None and raise_on_failure:
|
||||
raise NotImplementedError
|
||||
else:
|
||||
return res
|
||||
|
||||
|
||||
def runtime_libdirs_init(
|
||||
project: PyProject,
|
||||
project: PyProject,
|
||||
) -> None:
|
||||
if sys.platform == 'linux':
|
||||
ld_library_path : list[pathlib.Path] = [
|
||||
o
|
||||
for o in [
|
||||
*[
|
||||
o.absolute()
|
||||
for o in (
|
||||
project.runtime_libdirs
|
||||
if project.runtime_libdirs
|
||||
else []
|
||||
)
|
||||
],
|
||||
*[
|
||||
pathlib.Path(o)
|
||||
for o in os.environ.get(
|
||||
'LD_LIBRARY_PATH',
|
||||
''
|
||||
).split(os.path.pathsep)
|
||||
if o != ''
|
||||
]
|
||||
]
|
||||
]
|
||||
if sys.platform == 'linux':
|
||||
ld_library_path: list[pathlib.Path] = [
|
||||
o
|
||||
for o in [
|
||||
*[o.absolute() for o in (project.runtime_libdirs if project.runtime_libdirs else [])],
|
||||
*[pathlib.Path(o) for o in os.environ.get('LD_LIBRARY_PATH', '').split(os.path.pathsep) if o != ''],
|
||||
]
|
||||
]
|
||||
|
||||
ld_library_path_present : list[pathlib.Path] = []
|
||||
ld_library_path_present: list[pathlib.Path] = []
|
||||
|
||||
for o in ld_library_path:
|
||||
if not o.exists():
|
||||
logger.warning(dict(
|
||||
ld_library_path=o,
|
||||
msg='not found',
|
||||
))
|
||||
for o in ld_library_path:
|
||||
if not o.exists():
|
||||
logger.warning(
|
||||
dict(
|
||||
ld_library_path=o,
|
||||
msg='not found',
|
||||
)
|
||||
)
|
||||
|
||||
ld_library_path_present.append(o)
|
||||
ld_library_path_present.append(o)
|
||||
|
||||
os.environ.update(
|
||||
LD_LIBRARY_PATH=os.path.pathsep.join([
|
||||
str(o) for o in ld_library_path_present
|
||||
])
|
||||
)
|
||||
os.environ.update(LD_LIBRARY_PATH=os.path.pathsep.join([str(o) for o in ld_library_path_present]))
|
||||
|
||||
for preload_path in (project.runtime_preload or []):
|
||||
for preload_found in glob.glob(str(
|
||||
preload_path.parent / ('lib%s.so' % preload_path.name)
|
||||
)):
|
||||
logger.info(dict(
|
||||
preload_path=preload_path, preload_found=preload_found,
|
||||
# lib_path=o,
|
||||
msg='load_library',
|
||||
))
|
||||
for preload_path in project.runtime_preload or []:
|
||||
for preload_found in glob.glob(str(preload_path.parent / ('lib%s.so' % preload_path.name))):
|
||||
logger.info(
|
||||
dict(
|
||||
preload_path=preload_path,
|
||||
preload_found=preload_found,
|
||||
# lib_path=o,
|
||||
msg='load_library',
|
||||
)
|
||||
)
|
||||
|
||||
ctypes.cdll.LoadLibrary(preload_found)
|
||||
else:
|
||||
raise NotImplementedError
|
||||
|
||||
ctypes.cdll.LoadLibrary(preload_found)
|
||||
else:
|
||||
raise NotImplementedError
|
||||
|
||||
class interfaces_index_t:
|
||||
@dataclasses.dataclass
|
||||
class Interface:
|
||||
@dataclasses.dataclass
|
||||
class AddrInfo:
|
||||
family: str
|
||||
local: str
|
||||
@dataclasses.dataclass
|
||||
class Interface:
|
||||
@dataclasses.dataclass
|
||||
class AddrInfo:
|
||||
family: str
|
||||
local: str
|
||||
|
||||
name: Annotated[
|
||||
str,
|
||||
pydantic.Field(
|
||||
alias='ifname',
|
||||
),
|
||||
]
|
||||
addr_info: list[AddrInfo]
|
||||
|
||||
name: Annotated[
|
||||
str,
|
||||
pydantic.Field(
|
||||
alias='ifname',
|
||||
)
|
||||
]
|
||||
addr_info: list[AddrInfo]
|
||||
|
||||
def interfaces_index() -> list[interfaces_index_t.Interface]:
|
||||
res = pydantic.RootModel[
|
||||
list[interfaces_index_t.Interface]
|
||||
].model_validate_json(
|
||||
subprocess.check_output([
|
||||
'ip', '-j', 'addr',
|
||||
]).decode('utf-8')
|
||||
).root
|
||||
res = (
|
||||
pydantic.RootModel[list[interfaces_index_t.Interface]]
|
||||
.model_validate_json(
|
||||
subprocess.check_output(
|
||||
[
|
||||
'ip',
|
||||
'-j',
|
||||
'addr',
|
||||
]
|
||||
).decode('utf-8')
|
||||
)
|
||||
.root
|
||||
)
|
||||
|
||||
return res
|
||||
return res
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@ -6,22 +6,23 @@ from typing import Any
|
||||
from typing_extensions import Protocol
|
||||
from abc import abstractmethod
|
||||
|
||||
C = typing.TypeVar("C", bound="Comparable")
|
||||
C = typing.TypeVar('C', bound='Comparable')
|
||||
|
||||
|
||||
class Comparable(Protocol):
|
||||
@abstractmethod
|
||||
def __eq__(self, other: Any) -> bool:
|
||||
pass
|
||||
@abstractmethod
|
||||
def __eq__(self, other: Any) -> bool:
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def __lt__(self: C, other: C) -> bool:
|
||||
pass
|
||||
@abstractmethod
|
||||
def __lt__(self: C, other: C) -> bool:
|
||||
pass
|
||||
|
||||
def __gt__(self: C, other: C) -> bool:
|
||||
return (not self < other) and self != other
|
||||
def __gt__(self: C, other: C) -> bool:
|
||||
return (not self < other) and self != other
|
||||
|
||||
def __le__(self: C, other: C) -> bool:
|
||||
return self < other or self == other
|
||||
def __le__(self: C, other: C) -> bool:
|
||||
return self < other or self == other
|
||||
|
||||
def __ge__(self: C, other: C) -> bool:
|
||||
return (not self < other)
|
||||
def __ge__(self: C, other: C) -> bool:
|
||||
return not self < other
|
||||
|
||||
@ -5,121 +5,107 @@ import pprint
|
||||
|
||||
|
||||
async def f1():
|
||||
devices = await bleak.BleakScanner.discover()
|
||||
return devices
|
||||
devices = await bleak.BleakScanner.discover()
|
||||
return devices
|
||||
|
||||
|
||||
async def f2(device, timeout=None):
|
||||
if timeout is None:
|
||||
timeout = 1.0
|
||||
if timeout is None:
|
||||
timeout = 1.0
|
||||
|
||||
assert isinstance(timeout, float) and timeout >= 1e-8
|
||||
assert isinstance(timeout, float) and timeout >= 1e-8
|
||||
|
||||
p = await bleak.BleakClient(
|
||||
device,
|
||||
timeout=timeout,
|
||||
).__aenter__()
|
||||
return p
|
||||
|
||||
p = await bleak.BleakClient(
|
||||
device,
|
||||
timeout=timeout,
|
||||
).__aenter__()
|
||||
return p
|
||||
|
||||
async def f3(client):
|
||||
t1 = [
|
||||
dict(
|
||||
service=o.__dict__,
|
||||
characteristics=[
|
||||
o2.__dict__
|
||||
for o2 in o.characteristics
|
||||
]
|
||||
)
|
||||
for o in client.services
|
||||
]
|
||||
return t1
|
||||
t1 = [dict(service=o.__dict__, characteristics=[o2.__dict__ for o2 in o.characteristics]) for o in client.services]
|
||||
return t1
|
||||
|
||||
|
||||
async def f5(
|
||||
name_check=None,
|
||||
name_check=None,
|
||||
):
|
||||
t2 = []
|
||||
t2 = []
|
||||
|
||||
attempt = 0
|
||||
attempt = 0
|
||||
|
||||
while True:
|
||||
t1 = await f1()
|
||||
pprint.pprint([o.__dict__ for o in t1])
|
||||
while True:
|
||||
t1 = await f1()
|
||||
pprint.pprint([o.__dict__ for o in t1])
|
||||
|
||||
if not name_check is None:
|
||||
assert inspect.isfunction(name_check)
|
||||
if not name_check is None:
|
||||
assert inspect.isfunction(name_check)
|
||||
|
||||
t5 = {
|
||||
i : o.details[0].name()
|
||||
for i, o in enumerate(t1)
|
||||
}
|
||||
t5 = {i: o.details[0].name() for i, o in enumerate(t1)}
|
||||
|
||||
t2.extend(
|
||||
[
|
||||
t1[k]
|
||||
for k, v in t5.items()
|
||||
if isinstance(v, str) and name_check(v)
|
||||
]
|
||||
)
|
||||
else:
|
||||
t2.extend(t1)
|
||||
t2.extend([t1[k] for k, v in t5.items() if isinstance(v, str) and name_check(v)])
|
||||
else:
|
||||
t2.extend(t1)
|
||||
|
||||
if len(t2) > 0:
|
||||
break
|
||||
if len(t2) > 0:
|
||||
break
|
||||
|
||||
attempt += 1
|
||||
print('\rattempt #%d' % attempt, end='')
|
||||
attempt += 1
|
||||
print('\rattempt #%d' % attempt, end='')
|
||||
|
||||
return t2
|
||||
|
||||
return t2
|
||||
|
||||
async def f4(
|
||||
timeout=None,
|
||||
characteristics=None,
|
||||
operations=None,
|
||||
name_check=None,
|
||||
timeout=None,
|
||||
characteristics=None,
|
||||
operations=None,
|
||||
name_check=None,
|
||||
):
|
||||
if isinstance(name_check, str):
|
||||
assert name_check in [
|
||||
'watch fit',
|
||||
]
|
||||
name_check2 = lambda current_name: name_check.lower() in current_name.lower()
|
||||
else:
|
||||
name_check2 = name_check
|
||||
if isinstance(name_check, str):
|
||||
assert name_check in [
|
||||
'watch fit',
|
||||
]
|
||||
name_check2 = lambda current_name: name_check.lower() in current_name.lower()
|
||||
else:
|
||||
name_check2 = name_check
|
||||
|
||||
assert not name_check2 is None
|
||||
assert not name_check2 is None
|
||||
|
||||
if characteristics is None:
|
||||
characteristics = [
|
||||
'0000ffd1-0000-1000-8000-00805f9b34fb',
|
||||
]
|
||||
if characteristics is None:
|
||||
characteristics = [
|
||||
'0000ffd1-0000-1000-8000-00805f9b34fb',
|
||||
]
|
||||
|
||||
t2 = await f5(
|
||||
name_check=name_check2,
|
||||
)
|
||||
t2 = await f5(
|
||||
name_check=name_check2,
|
||||
)
|
||||
|
||||
if len(t2) == 0:
|
||||
print('not found')
|
||||
return
|
||||
if len(t2) == 0:
|
||||
print('not found')
|
||||
return
|
||||
|
||||
t3 = None
|
||||
try:
|
||||
t3 = await f2(t2[0], timeout=timeout)
|
||||
t4 = await f3(t3)
|
||||
pprint.pprint(t4)
|
||||
t3 = None
|
||||
try:
|
||||
t3 = await f2(t2[0], timeout=timeout)
|
||||
t4 = await f3(t3)
|
||||
pprint.pprint(t4)
|
||||
|
||||
if not operations is None and inspect.isfunction(operations):
|
||||
await operations(
|
||||
client=t3,
|
||||
t4=t4,
|
||||
)
|
||||
else:
|
||||
t6 = {}
|
||||
for o in characteristics:
|
||||
try:
|
||||
t7 = await t3.read_gatt_char(o)
|
||||
except Exception as exception:
|
||||
print(traceback.format_exc())
|
||||
t7 = None
|
||||
t6[o] = t7
|
||||
pprint.pprint(t6)
|
||||
finally:
|
||||
if not t3 is None:
|
||||
await t3.disconnect()
|
||||
if not operations is None and inspect.isfunction(operations):
|
||||
await operations(
|
||||
client=t3,
|
||||
t4=t4,
|
||||
)
|
||||
else:
|
||||
t6 = {}
|
||||
for o in characteristics:
|
||||
try:
|
||||
t7 = await t3.read_gatt_char(o)
|
||||
except Exception as exception:
|
||||
print(traceback.format_exc())
|
||||
t7 = None
|
||||
t6[o] = t7
|
||||
pprint.pprint(t6)
|
||||
finally:
|
||||
if not t3 is None:
|
||||
await t3.disconnect()
|
||||
|
||||
@ -10,162 +10,149 @@ import threading
|
||||
import cython
|
||||
import datetime
|
||||
|
||||
from typing import (Any, Optional, TypeVar, Type, cast)
|
||||
from typing import Any, Optional, TypeVar, Type, cast
|
||||
# from scoping import scoping as s
|
||||
|
||||
def test(
|
||||
_id: int,
|
||||
T: float,
|
||||
a: numpy.ndarray[Any, numpy.dtype[numpy.int32]],
|
||||
) -> None:
|
||||
with cython.nogil:
|
||||
#if True:
|
||||
started_at = datetime.datetime.now()
|
||||
print('started')
|
||||
def elapsed() -> float:
|
||||
return (datetime.datetime.now() - started_at).total_seconds()
|
||||
#a = 0
|
||||
while elapsed() < T:
|
||||
#a += 1
|
||||
for k in range(1024 * 1024):
|
||||
a[_id] += 1
|
||||
|
||||
print(['done', started_at, elapsed(), a[_id]])
|
||||
def test(
|
||||
_id: int,
|
||||
T: float,
|
||||
a: numpy.ndarray[Any, numpy.dtype[numpy.int32]],
|
||||
) -> None:
|
||||
with cython.nogil:
|
||||
# if True:
|
||||
started_at = datetime.datetime.now()
|
||||
print('started')
|
||||
|
||||
def elapsed() -> float:
|
||||
return (datetime.datetime.now() - started_at).total_seconds()
|
||||
|
||||
# a = 0
|
||||
while elapsed() < T:
|
||||
# a += 1
|
||||
for k in range(1024 * 1024):
|
||||
a[_id] += 1
|
||||
|
||||
print(['done', started_at, elapsed(), a[_id]])
|
||||
|
||||
|
||||
M = TypeVar('M', bound=Type[Any])
|
||||
|
||||
|
||||
def build(content: str, module: M) -> M:
|
||||
import pathlib
|
||||
import tempfile
|
||||
import hashlib
|
||||
import Cython.Build.Inline
|
||||
import pathlib
|
||||
import tempfile
|
||||
import hashlib
|
||||
import Cython.Build.Inline
|
||||
|
||||
sha256sum = hashlib.sha256(content.encode('utf-8')).digest().hex()
|
||||
sha256sum = hashlib.sha256(content.encode('utf-8')).digest().hex()
|
||||
|
||||
output_dir = (pathlib.Path('.') / 'tmp' / 'cython' / sha256sum).absolute()
|
||||
output_dir = (pathlib.Path('.') / 'tmp' / 'cython' / sha256sum).absolute()
|
||||
|
||||
if not output_dir.exists() or True:
|
||||
os.makedirs(str(output_dir), exist_ok=True)
|
||||
|
||||
if not output_dir.exists() or True:
|
||||
os.makedirs(str(output_dir), exist_ok=True)
|
||||
source_path = output_dir / ('_%s.pyx' % sha256sum)
|
||||
if not source_path.exists():
|
||||
with io.open(str(source_path), 'w') as f:
|
||||
f.write(content)
|
||||
|
||||
source_path = output_dir / ('_%s.pyx' % sha256sum)
|
||||
if not source_path.exists():
|
||||
with io.open(str(source_path), 'w') as f:
|
||||
f.write(content)
|
||||
t1 = Cython.Build.Inline._get_build_extension()
|
||||
t1.extensions = Cython.Build.cythonize(str(source_path))
|
||||
t1.build_temp = str(pathlib.Path('/'))
|
||||
t1.build_lib = str(output_dir)
|
||||
# t2 = Cython.Build.Inline.Extension(
|
||||
# name=sha256sum,
|
||||
# )
|
||||
t1.run()
|
||||
|
||||
t1 = Cython.Build.Inline._get_build_extension()
|
||||
t1.extensions = Cython.Build.cythonize(str(source_path))
|
||||
t1.build_temp = str(pathlib.Path('/'))
|
||||
t1.build_lib = str(output_dir)
|
||||
#t2 = Cython.Build.Inline.Extension(
|
||||
# name=sha256sum,
|
||||
#)
|
||||
t1.run()
|
||||
return cast(M, Cython.Build.Inline.load_dynamic('_%s' % sha256sum, glob.glob(str(output_dir / ('_%s*.so' % sha256sum)))[0]))
|
||||
|
||||
return cast(
|
||||
M,
|
||||
Cython.Build.Inline.load_dynamic(
|
||||
'_%s' % sha256sum,
|
||||
glob.glob(
|
||||
str(output_dir / ('_%s*.so' % sha256sum))
|
||||
)[0]
|
||||
)
|
||||
)
|
||||
raise NotImplementedError
|
||||
|
||||
raise NotImplementedError
|
||||
|
||||
def mypyc_build(file_path: pathlib.Path) -> Any:
|
||||
import pathlib
|
||||
import tempfile
|
||||
import hashlib
|
||||
import mypyc.build
|
||||
import Cython.Build.Inline
|
||||
import pathlib
|
||||
import tempfile
|
||||
import hashlib
|
||||
import mypyc.build
|
||||
import Cython.Build.Inline
|
||||
|
||||
assert isinstance(file_path, pathlib.Path)
|
||||
assert isinstance(file_path, pathlib.Path)
|
||||
|
||||
#sha256sum = hashlib.sha256(content.encode('utf-8')).digest().hex()
|
||||
# sha256sum = hashlib.sha256(content.encode('utf-8')).digest().hex()
|
||||
|
||||
#output_dir = (pathlib.Path('.') / 'tmp' / 'cython' / sha256sum).absolute()
|
||||
output_dir = pathlib.Path('.') / 'tmp' / 'mypyc'
|
||||
sha256sum = file_path.stem
|
||||
lib_pattern = file_path.parent / ('%s.cpython*.so' % sha256sum)
|
||||
lib_dir = pathlib.Path('.')
|
||||
# output_dir = (pathlib.Path('.') / 'tmp' / 'cython' / sha256sum).absolute()
|
||||
output_dir = pathlib.Path('.') / 'tmp' / 'mypyc'
|
||||
sha256sum = file_path.stem
|
||||
lib_pattern = file_path.parent / ('%s.cpython*.so' % sha256sum)
|
||||
lib_dir = pathlib.Path('.')
|
||||
|
||||
def lib_path_glob(path: str | pathlib.Path) -> Optional[pathlib.Path]:
|
||||
res: list[str] = glob.glob(str(path))
|
||||
|
||||
def lib_path_glob(path: str | pathlib.Path) -> Optional[pathlib.Path]:
|
||||
res : list[str] = glob.glob(str(path))
|
||||
if len(res) == 0:
|
||||
return None
|
||||
else:
|
||||
return pathlib.Path(res[0])
|
||||
|
||||
if len(res) == 0:
|
||||
return None
|
||||
else:
|
||||
return pathlib.Path(res[0])
|
||||
need_build: bool = False
|
||||
|
||||
need_build : bool = False
|
||||
lib_path: Optional[pathlib.Path] = None
|
||||
|
||||
lib_path : Optional[pathlib.Path] = None
|
||||
lib_path = lib_path_glob(lib_pattern)
|
||||
|
||||
lib_path = lib_path_glob(lib_pattern)
|
||||
if not lib_path is None:
|
||||
t2 = file_path.stat()
|
||||
t3 = lib_path.stat()
|
||||
if t3.st_mtime < t2.st_mtime:
|
||||
need_build = True
|
||||
|
||||
if not lib_path is None:
|
||||
t2 = file_path.stat()
|
||||
t3 = lib_path.stat()
|
||||
if t3.st_mtime < t2.st_mtime:
|
||||
need_build = True
|
||||
del t2
|
||||
del t3
|
||||
else:
|
||||
need_build = True
|
||||
|
||||
del t2
|
||||
del t3
|
||||
else:
|
||||
need_build = True
|
||||
if need_build:
|
||||
for o in [
|
||||
output_dir,
|
||||
output_dir / 'build' / file_path.parent,
|
||||
]:
|
||||
os.makedirs(str(o), exist_ok=True)
|
||||
# source_path = output_dir / ('_%s.py' % sha256sum)
|
||||
source_path = file_path
|
||||
# with io.open(str(source_path), 'w') as f:
|
||||
# f.write(content)
|
||||
|
||||
t1 = Cython.Build.Inline._get_build_extension()
|
||||
t1.extensions = mypyc.build.mypycify([str(source_path)], target_dir=str(output_dir / 'build'))
|
||||
t1.build_temp = str(output_dir)
|
||||
t1.build_lib = str(lib_dir)
|
||||
# t2 = Cython.Build.Inline.Extension(
|
||||
# name=sha256sum,
|
||||
# )
|
||||
t1.run()
|
||||
|
||||
if need_build:
|
||||
for o in [
|
||||
output_dir,
|
||||
output_dir / 'build' / file_path.parent,
|
||||
]:
|
||||
os.makedirs(
|
||||
str(o),
|
||||
exist_ok=True
|
||||
)
|
||||
#source_path = output_dir / ('_%s.py' % sha256sum)
|
||||
source_path = file_path
|
||||
#with io.open(str(source_path), 'w') as f:
|
||||
# f.write(content)
|
||||
lib_path = lib_path_glob(lib_pattern)
|
||||
|
||||
t1 = Cython.Build.Inline._get_build_extension()
|
||||
t1.extensions = mypyc.build.mypycify(
|
||||
[str(source_path)],
|
||||
target_dir=str(output_dir / 'build')
|
||||
)
|
||||
t1.build_temp = str(output_dir)
|
||||
t1.build_lib = str(lib_dir)
|
||||
#t2 = Cython.Build.Inline.Extension(
|
||||
# name=sha256sum,
|
||||
#)
|
||||
t1.run()
|
||||
return Cython.Build.Inline.load_dynamic(
|
||||
#'_%s' % sha256sum,
|
||||
# t1.extensions[0].name,
|
||||
file_path.stem,
|
||||
str(lib_path),
|
||||
)
|
||||
|
||||
lib_path = lib_path_glob(lib_pattern)
|
||||
raise NotImplementedError
|
||||
|
||||
return Cython.Build.Inline.load_dynamic(
|
||||
#'_%s' % sha256sum,
|
||||
#t1.extensions[0].name,
|
||||
file_path.stem,
|
||||
str(lib_path),
|
||||
)
|
||||
|
||||
raise NotImplementedError
|
||||
|
||||
class Source:
|
||||
@staticmethod
|
||||
def test2(
|
||||
_a : numpy.ndarray[Any, numpy.dtype[numpy.int64]],
|
||||
_id : numpy.dtype[numpy.int32] | int,
|
||||
T : float=16
|
||||
) -> int:
|
||||
raise NotImplementedError
|
||||
@staticmethod
|
||||
def test2(_a: numpy.ndarray[Any, numpy.dtype[numpy.int64]], _id: numpy.dtype[numpy.int32] | int, T: float = 16) -> int:
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
source = build(r'''
|
||||
source = build(
|
||||
r"""
|
||||
cimport cython
|
||||
|
||||
@cython.boundscheck(False)
|
||||
@ -226,52 +213,52 @@ def test2(long long [:] _a, int _id, double T=16) -> int:
|
||||
|
||||
return _a[_id]
|
||||
|
||||
''', Source)
|
||||
""",
|
||||
Source,
|
||||
)
|
||||
|
||||
def test_cython(N: int=4, T:int=16) -> None:
|
||||
#a = [0] * N
|
||||
a = numpy.zeros((N,), dtype=numpy.int64)
|
||||
|
||||
t = [
|
||||
threading.Thread(
|
||||
target=functools.partial(
|
||||
source.test2,
|
||||
a,
|
||||
k,
|
||||
T,
|
||||
)
|
||||
)
|
||||
for k in range(N)
|
||||
]
|
||||
def test_cython(N: int = 4, T: int = 16) -> None:
|
||||
# a = [0] * N
|
||||
a = numpy.zeros((N,), dtype=numpy.int64)
|
||||
|
||||
for o in t:
|
||||
o.start()
|
||||
for o in t:
|
||||
o.join()
|
||||
t = [
|
||||
threading.Thread(
|
||||
target=functools.partial(
|
||||
source.test2,
|
||||
a,
|
||||
k,
|
||||
T,
|
||||
)
|
||||
)
|
||||
for k in range(N)
|
||||
]
|
||||
|
||||
#cython_module['test2'](a, 0)
|
||||
for o in t:
|
||||
o.start()
|
||||
for o in t:
|
||||
o.join()
|
||||
|
||||
def test_mypyc(N: int=4, W:int=35) -> None:
|
||||
cython2 = mypyc_build(
|
||||
(pathlib.Path(__file__).parent / 'cython2.py').relative_to(
|
||||
pathlib.Path.cwd()
|
||||
)
|
||||
)
|
||||
# cython_module['test2'](a, 0)
|
||||
|
||||
# from .cython2 import fib
|
||||
|
||||
#a = [0] * N
|
||||
t = [
|
||||
threading.Thread(
|
||||
target=functools.partial(
|
||||
cython2.fib,
|
||||
W,
|
||||
)
|
||||
)
|
||||
for k in range(N)
|
||||
]
|
||||
def test_mypyc(N: int = 4, W: int = 35) -> None:
|
||||
cython2 = mypyc_build((pathlib.Path(__file__).parent / 'cython2.py').relative_to(pathlib.Path.cwd()))
|
||||
|
||||
for o in t:
|
||||
o.start()
|
||||
for o in t:
|
||||
o.join()
|
||||
# from .cython2 import fib
|
||||
|
||||
# a = [0] * N
|
||||
t = [
|
||||
threading.Thread(
|
||||
target=functools.partial(
|
||||
cython2.fib,
|
||||
W,
|
||||
)
|
||||
)
|
||||
for k in range(N)
|
||||
]
|
||||
|
||||
for o in t:
|
||||
o.start()
|
||||
for o in t:
|
||||
o.join()
|
||||
|
||||
@ -1,10 +1,12 @@
|
||||
import time
|
||||
|
||||
|
||||
def fib(n: int) -> int:
|
||||
if n <= 1:
|
||||
return n
|
||||
else:
|
||||
return fib(n - 2) + fib(n - 1)
|
||||
if n <= 1:
|
||||
return n
|
||||
else:
|
||||
return fib(n - 2) + fib(n - 1)
|
||||
|
||||
|
||||
t0 = time.time()
|
||||
fib(32)
|
||||
|
||||
@ -5,378 +5,334 @@ import os
|
||||
|
||||
|
||||
def kernel_1_sample_scrap(
|
||||
max_articles=None,
|
||||
max_articles=None,
|
||||
):
|
||||
if max_articles is None:
|
||||
max_articles = 1
|
||||
if max_articles is None:
|
||||
max_articles = 1
|
||||
|
||||
with requests.get(
|
||||
'https://dev.to',
|
||||
) as p:
|
||||
t10 = p.content.decode('utf-8')
|
||||
t11 = pyquery.PyQuery(t10)
|
||||
t13 = t11('.crayons-story__title > a')
|
||||
t12 = [
|
||||
pyquery.PyQuery(o).attr('href')
|
||||
for o in t13
|
||||
]
|
||||
pprint.pprint(t12)
|
||||
t14 = [
|
||||
'https://dev.to/%s' % o
|
||||
for o in t12
|
||||
]
|
||||
with requests.get(
|
||||
'https://dev.to',
|
||||
) as p:
|
||||
t10 = p.content.decode('utf-8')
|
||||
t11 = pyquery.PyQuery(t10)
|
||||
t13 = t11('.crayons-story__title > a')
|
||||
t12 = [pyquery.PyQuery(o).attr('href') for o in t13]
|
||||
pprint.pprint(t12)
|
||||
t14 = ['https://dev.to/%s' % o for o in t12]
|
||||
|
||||
t8 = []
|
||||
for t7 in t14[:max_articles]:
|
||||
with requests.get(
|
||||
t7,
|
||||
) as p:
|
||||
t1 = p.content.decode('utf-8')
|
||||
t2 = pyquery.PyQuery(t1)
|
||||
t3 = t2('.comment__content')
|
||||
t6 = []
|
||||
for o in t3:
|
||||
t4 = pyquery.PyQuery(o)
|
||||
t5 = t4('.comment__header > a').attr['href']
|
||||
t9 = t4('.comment__body').text()
|
||||
t6.append(
|
||||
dict(
|
||||
author=t5,
|
||||
text=t9,
|
||||
)
|
||||
)
|
||||
t8 = []
|
||||
for t7 in t14[:max_articles]:
|
||||
with requests.get(
|
||||
t7,
|
||||
) as p:
|
||||
t1 = p.content.decode('utf-8')
|
||||
t2 = pyquery.PyQuery(t1)
|
||||
t3 = t2('.comment__content')
|
||||
t6 = []
|
||||
for o in t3:
|
||||
t4 = pyquery.PyQuery(o)
|
||||
t5 = t4('.comment__header > a').attr['href']
|
||||
t9 = t4('.comment__body').text()
|
||||
t6.append(
|
||||
dict(
|
||||
author=t5,
|
||||
text=t9,
|
||||
)
|
||||
)
|
||||
|
||||
#pprint.pprint(t3)
|
||||
pprint.pprint(t6)
|
||||
t8.append(
|
||||
dict(
|
||||
article=t7,
|
||||
comments=t6,
|
||||
)
|
||||
)
|
||||
# pprint.pprint(t3)
|
||||
pprint.pprint(t6)
|
||||
t8.append(
|
||||
dict(
|
||||
article=t7,
|
||||
comments=t6,
|
||||
)
|
||||
)
|
||||
|
||||
pprint.pprint(t8)
|
||||
pprint.pprint(t8)
|
||||
|
||||
return dict(
|
||||
t1=t1,
|
||||
t2=t2,
|
||||
t3=t3,
|
||||
t6=t6,
|
||||
t8=t8,
|
||||
t12=t12,
|
||||
)
|
||||
|
||||
return dict(
|
||||
t1=t1,
|
||||
t2=t2,
|
||||
t3=t3,
|
||||
t6=t6,
|
||||
t8=t8,
|
||||
t12=t12,
|
||||
)
|
||||
|
||||
def kernel_2():
|
||||
import numpy as np # linear algebra
|
||||
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
|
||||
from tqdm import tqdm
|
||||
from sklearn.model_selection import train_test_split
|
||||
import tensorflow as tf
|
||||
from keras.models import Sequential
|
||||
from keras.layers.recurrent import LSTM, GRU,SimpleRNN
|
||||
from keras.layers.core import Dense, Activation, Dropout
|
||||
from keras.layers.embeddings import Embedding
|
||||
from keras.layers.normalization import BatchNormalization
|
||||
from keras.utils import np_utils
|
||||
from sklearn import preprocessing, decomposition, model_selection, metrics, pipeline
|
||||
from keras.layers import GlobalMaxPooling1D, Conv1D, MaxPooling1D, Flatten, Bidirectional, SpatialDropout1D
|
||||
from keras.preprocessing import sequence, text
|
||||
from keras.callbacks import EarlyStopping
|
||||
import numpy as np # linear algebra
|
||||
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
|
||||
from tqdm import tqdm
|
||||
from sklearn.model_selection import train_test_split
|
||||
import tensorflow as tf
|
||||
from keras.models import Sequential
|
||||
from keras.layers.recurrent import LSTM, GRU, SimpleRNN
|
||||
from keras.layers.core import Dense, Activation, Dropout
|
||||
from keras.layers.embeddings import Embedding
|
||||
from keras.layers.normalization import BatchNormalization
|
||||
from keras.utils import np_utils
|
||||
from sklearn import preprocessing, decomposition, model_selection, metrics, pipeline
|
||||
from keras.layers import GlobalMaxPooling1D, Conv1D, MaxPooling1D, Flatten, Bidirectional, SpatialDropout1D
|
||||
from keras.preprocessing import sequence, text
|
||||
from keras.callbacks import EarlyStopping
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
import seaborn as sns
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
import seaborn as sns
|
||||
#%matplotlib inline
|
||||
from plotly import graph_objs as go
|
||||
import plotly.express as px
|
||||
import plotly.figure_factory as ff
|
||||
# %matplotlib inline
|
||||
from plotly import graph_objs as go
|
||||
import plotly.express as px
|
||||
import plotly.figure_factory as ff
|
||||
|
||||
# %% [markdown]
|
||||
# # Configuring TPU's
|
||||
#
|
||||
# For this version of Notebook we will be using TPU's as we have to built a BERT Model
|
||||
# %% [markdown]
|
||||
# # Configuring TPU's
|
||||
#
|
||||
# For this version of Notebook we will be using TPU's as we have to built a BERT Model
|
||||
|
||||
# %% [code]
|
||||
# Detect hardware, return appropriate distribution strategy
|
||||
try:
|
||||
# TPU detection. No parameters necessary if TPU_NAME environment variable is
|
||||
# set: this is always the case on Kaggle.
|
||||
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
|
||||
print('Running on TPU ', tpu.master())
|
||||
except ValueError:
|
||||
tpu = None
|
||||
# %% [code]
|
||||
# Detect hardware, return appropriate distribution strategy
|
||||
try:
|
||||
# TPU detection. No parameters necessary if TPU_NAME environment variable is
|
||||
# set: this is always the case on Kaggle.
|
||||
tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
|
||||
print('Running on TPU ', tpu.master())
|
||||
except ValueError:
|
||||
tpu = None
|
||||
|
||||
if tpu:
|
||||
tf.config.experimental_connect_to_cluster(tpu)
|
||||
tf.tpu.experimental.initialize_tpu_system(tpu)
|
||||
strategy = tf.distribute.experimental.TPUStrategy(tpu)
|
||||
else:
|
||||
# Default distribution strategy in Tensorflow. Works on CPU and single GPU.
|
||||
strategy = tf.distribute.get_strategy()
|
||||
if tpu:
|
||||
tf.config.experimental_connect_to_cluster(tpu)
|
||||
tf.tpu.experimental.initialize_tpu_system(tpu)
|
||||
strategy = tf.distribute.experimental.TPUStrategy(tpu)
|
||||
else:
|
||||
# Default distribution strategy in Tensorflow. Works on CPU and single GPU.
|
||||
strategy = tf.distribute.get_strategy()
|
||||
|
||||
print("REPLICAS: ", strategy.num_replicas_in_sync)
|
||||
print('REPLICAS: ', strategy.num_replicas_in_sync)
|
||||
|
||||
# %% [code]
|
||||
train = pd.read_csv('/kaggle/input/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train.csv')
|
||||
validation = pd.read_csv('/kaggle/input/jigsaw-multilingual-toxic-comment-classification/validation.csv')
|
||||
test = pd.read_csv('/kaggle/input/jigsaw-multilingual-toxic-comment-classification/test.csv')
|
||||
# %% [code]
|
||||
train = pd.read_csv('/kaggle/input/jigsaw-multilingual-toxic-comment-classification/jigsaw-toxic-comment-train.csv')
|
||||
validation = pd.read_csv('/kaggle/input/jigsaw-multilingual-toxic-comment-classification/validation.csv')
|
||||
test = pd.read_csv('/kaggle/input/jigsaw-multilingual-toxic-comment-classification/test.csv')
|
||||
|
||||
# %% [markdown]
|
||||
# We will drop the other columns and approach this problem as a Binary Classification Problem and also we will have our exercise done on a smaller subsection of the dataset(only 12000 data points) to make it easier to train the models
|
||||
# %% [markdown]
|
||||
# We will drop the other columns and approach this problem as a Binary Classification Problem and also we will have our exercise done on a smaller subsection of the dataset(only 12000 data points) to make it easier to train the models
|
||||
|
||||
# %% [code]
|
||||
train.drop(['severe_toxic','obscene','threat','insult','identity_hate'],axis=1,inplace=True)
|
||||
# %% [code]
|
||||
train.drop(['severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate'], axis=1, inplace=True)
|
||||
|
||||
# %% [code]
|
||||
train = train.loc[:12000,:]
|
||||
train.shape
|
||||
# %% [code]
|
||||
train = train.loc[:12000, :]
|
||||
train.shape
|
||||
|
||||
# %% [markdown]
|
||||
# We will check the maximum number of words that can be present in a comment , this will help us in padding later
|
||||
# %% [markdown]
|
||||
# We will check the maximum number of words that can be present in a comment , this will help us in padding later
|
||||
|
||||
# %% [code]
|
||||
train['comment_text'].apply(lambda x:len(str(x).split())).max()
|
||||
# %% [code]
|
||||
train['comment_text'].apply(lambda x: len(str(x).split())).max()
|
||||
|
||||
# %% [markdown]
|
||||
# ### Data Preparation
|
||||
|
||||
# %% [markdown]
|
||||
# ### Data Preparation
|
||||
# %% [code]
|
||||
xtrain, xvalid, ytrain, yvalid = train_test_split(
|
||||
train.comment_text.values, train.toxic.values, stratify=train.toxic.values, random_state=42, test_size=0.2, shuffle=True
|
||||
)
|
||||
|
||||
# %% [code]
|
||||
xtrain, xvalid, ytrain, yvalid = train_test_split(train.comment_text.values, train.toxic.values,
|
||||
stratify=train.toxic.values,
|
||||
random_state=42,
|
||||
test_size=0.2, shuffle=True)
|
||||
# %% [markdown]
|
||||
# # Before We Begin
|
||||
#
|
||||
# Before we Begin If you are a complete starter with NLP and never worked with text data, I am attaching a few kernels that will serve as a starting point of your journey
|
||||
# * https://www.kaggle.com/arthurtok/spooky-nlp-and-topic-modelling-tutorial
|
||||
# * https://www.kaggle.com/abhishek/approaching-almost-any-nlp-problem-on-kaggle
|
||||
#
|
||||
# If you want a more basic dataset to practice with here is another kernel which I wrote:
|
||||
# * https://www.kaggle.com/tanulsingh077/what-s-cooking
|
||||
#
|
||||
# Below are some Resources to get started with basic level Neural Networks, It will help us to easily understand the upcoming parts
|
||||
# * https://www.youtube.com/watch?v=aircAruvnKk&list=PL_h2yd2CGtBHEKwEH5iqTZH85wLS-eUzv
|
||||
# * https://www.youtube.com/watch?v=IHZwWFHWa-w&list=PL_h2yd2CGtBHEKwEH5iqTZH85wLS-eUzv&index=2
|
||||
# * https://www.youtube.com/watch?v=Ilg3gGewQ5U&list=PL_h2yd2CGtBHEKwEH5iqTZH85wLS-eUzv&index=3
|
||||
# * https://www.youtube.com/watch?v=tIeHLnjs5U8&list=PL_h2yd2CGtBHEKwEH5iqTZH85wLS-eUzv&index=4
|
||||
#
|
||||
# For Learning how to visualize test data and what to use view:
|
||||
# * https://www.kaggle.com/tanulsingh077/twitter-sentiment-extaction-analysis-eda-and-model
|
||||
# * https://www.kaggle.com/jagangupta/stop-the-s-toxic-comments-eda
|
||||
|
||||
# %% [markdown]
|
||||
# # Before We Begin
|
||||
#
|
||||
# Before we Begin If you are a complete starter with NLP and never worked with text data, I am attaching a few kernels that will serve as a starting point of your journey
|
||||
# * https://www.kaggle.com/arthurtok/spooky-nlp-and-topic-modelling-tutorial
|
||||
# * https://www.kaggle.com/abhishek/approaching-almost-any-nlp-problem-on-kaggle
|
||||
#
|
||||
# If you want a more basic dataset to practice with here is another kernel which I wrote:
|
||||
# * https://www.kaggle.com/tanulsingh077/what-s-cooking
|
||||
#
|
||||
# Below are some Resources to get started with basic level Neural Networks, It will help us to easily understand the upcoming parts
|
||||
# * https://www.youtube.com/watch?v=aircAruvnKk&list=PL_h2yd2CGtBHEKwEH5iqTZH85wLS-eUzv
|
||||
# * https://www.youtube.com/watch?v=IHZwWFHWa-w&list=PL_h2yd2CGtBHEKwEH5iqTZH85wLS-eUzv&index=2
|
||||
# * https://www.youtube.com/watch?v=Ilg3gGewQ5U&list=PL_h2yd2CGtBHEKwEH5iqTZH85wLS-eUzv&index=3
|
||||
# * https://www.youtube.com/watch?v=tIeHLnjs5U8&list=PL_h2yd2CGtBHEKwEH5iqTZH85wLS-eUzv&index=4
|
||||
#
|
||||
# For Learning how to visualize test data and what to use view:
|
||||
# * https://www.kaggle.com/tanulsingh077/twitter-sentiment-extaction-analysis-eda-and-model
|
||||
# * https://www.kaggle.com/jagangupta/stop-the-s-toxic-comments-eda
|
||||
# %% [markdown]
|
||||
# # Simple RNN
|
||||
#
|
||||
# ## Basic Overview
|
||||
#
|
||||
# What is a RNN?
|
||||
#
|
||||
# Recurrent Neural Network(RNN) are a type of Neural Network where the output from previous step are fed as input to the current step. In traditional neural networks, all the inputs and outputs are independent of each other, but in cases like when it is required to predict the next word of a sentence, the previous words are required and hence there is a need to remember the previous words. Thus RNN came into existence, which solved this issue with the help of a Hidden Layer.
|
||||
#
|
||||
# Why RNN's?
|
||||
#
|
||||
# https://www.quora.com/Why-do-we-use-an-RNN-instead-of-a-simple-neural-network
|
||||
#
|
||||
# ## In-Depth Understanding
|
||||
#
|
||||
# * https://medium.com/mindorks/understanding-the-recurrent-neural-network-44d593f112a2
|
||||
# * https://www.youtube.com/watch?v=2E65LDnM2cA&list=PL1F3ABbhcqa3BBWo170U4Ev2wfsF7FN8l
|
||||
# * https://www.d2l.ai/chapter_recurrent-neural-networks/rnn.html
|
||||
#
|
||||
# ## Code Implementation
|
||||
#
|
||||
# So first I will implement the and then I will explain the code step by step
|
||||
|
||||
# %% [markdown]
|
||||
# # Simple RNN
|
||||
#
|
||||
# ## Basic Overview
|
||||
#
|
||||
# What is a RNN?
|
||||
#
|
||||
# Recurrent Neural Network(RNN) are a type of Neural Network where the output from previous step are fed as input to the current step. In traditional neural networks, all the inputs and outputs are independent of each other, but in cases like when it is required to predict the next word of a sentence, the previous words are required and hence there is a need to remember the previous words. Thus RNN came into existence, which solved this issue with the help of a Hidden Layer.
|
||||
#
|
||||
# Why RNN's?
|
||||
#
|
||||
# https://www.quora.com/Why-do-we-use-an-RNN-instead-of-a-simple-neural-network
|
||||
#
|
||||
# ## In-Depth Understanding
|
||||
#
|
||||
# * https://medium.com/mindorks/understanding-the-recurrent-neural-network-44d593f112a2
|
||||
# * https://www.youtube.com/watch?v=2E65LDnM2cA&list=PL1F3ABbhcqa3BBWo170U4Ev2wfsF7FN8l
|
||||
# * https://www.d2l.ai/chapter_recurrent-neural-networks/rnn.html
|
||||
#
|
||||
# ## Code Implementation
|
||||
#
|
||||
# So first I will implement the and then I will explain the code step by step
|
||||
# %% [code]
|
||||
# using keras tokenizer here
|
||||
token = text.Tokenizer(num_words=None)
|
||||
max_len = 1500
|
||||
|
||||
# %% [code]
|
||||
# using keras tokenizer here
|
||||
token = text.Tokenizer(num_words=None)
|
||||
max_len = 1500
|
||||
token.fit_on_texts(list(xtrain) + list(xvalid))
|
||||
xtrain_seq = token.texts_to_sequences(xtrain)
|
||||
xvalid_seq = token.texts_to_sequences(xvalid)
|
||||
|
||||
token.fit_on_texts(list(xtrain) + list(xvalid))
|
||||
xtrain_seq = token.texts_to_sequences(xtrain)
|
||||
xvalid_seq = token.texts_to_sequences(xvalid)
|
||||
# zero pad the sequences
|
||||
xtrain_pad = sequence.pad_sequences(xtrain_seq, maxlen=max_len)
|
||||
xvalid_pad = sequence.pad_sequences(xvalid_seq, maxlen=max_len)
|
||||
|
||||
#zero pad the sequences
|
||||
xtrain_pad = sequence.pad_sequences(xtrain_seq, maxlen=max_len)
|
||||
xvalid_pad = sequence.pad_sequences(xvalid_seq, maxlen=max_len)
|
||||
word_index = token.word_index
|
||||
|
||||
word_index = token.word_index
|
||||
# %% [code]
|
||||
# %%time
|
||||
with strategy.scope():
|
||||
# A simpleRNN without any pretrained embeddings and one dense layer
|
||||
model = Sequential()
|
||||
model.add(Embedding(len(word_index) + 1, 300, input_length=max_len))
|
||||
model.add(SimpleRNN(100))
|
||||
model.add(Dense(1, activation='sigmoid'))
|
||||
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
|
||||
|
||||
# %% [code]
|
||||
#%%time
|
||||
with strategy.scope():
|
||||
# A simpleRNN without any pretrained embeddings and one dense layer
|
||||
model = Sequential()
|
||||
model.add(Embedding(len(word_index) + 1,
|
||||
300,
|
||||
input_length=max_len))
|
||||
model.add(SimpleRNN(100))
|
||||
model.add(Dense(1, activation='sigmoid'))
|
||||
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
|
||||
model.summary()
|
||||
|
||||
model.summary()
|
||||
|
||||
return dict(
|
||||
model=model,
|
||||
xtrain_pad=xtrain_pad,
|
||||
strategy=strategy,
|
||||
xvalid_pad=xvalid_pad,
|
||||
xtrain_seq=xtrain_seq,
|
||||
token=token,
|
||||
max_len=max_len,
|
||||
xtrain=xtrain,
|
||||
xvalid=xvalid,
|
||||
ytrain=ytrain,
|
||||
yvalid=yvalid,
|
||||
)
|
||||
return dict(
|
||||
model=model,
|
||||
xtrain_pad=xtrain_pad,
|
||||
strategy=strategy,
|
||||
xvalid_pad=xvalid_pad,
|
||||
xtrain_seq=xtrain_seq,
|
||||
token=token,
|
||||
max_len=max_len,
|
||||
xtrain=xtrain,
|
||||
xvalid=xvalid,
|
||||
ytrain=ytrain,
|
||||
yvalid=yvalid,
|
||||
)
|
||||
|
||||
|
||||
def kernel_3(
|
||||
o_2,
|
||||
nb_epochs=None,
|
||||
o_2,
|
||||
nb_epochs=None,
|
||||
):
|
||||
if nb_epochs is None:
|
||||
nb_epochs = 5
|
||||
if nb_epochs is None:
|
||||
nb_epochs = 5
|
||||
|
||||
# %% [markdown]
|
||||
# Writing a function for getting auc score for validation
|
||||
# %% [markdown]
|
||||
# Writing a function for getting auc score for validation
|
||||
|
||||
# %% [code]
|
||||
def roc_auc(predictions,target):
|
||||
import sklearn.metrics
|
||||
'''
|
||||
# %% [code]
|
||||
def roc_auc(predictions, target):
|
||||
import sklearn.metrics
|
||||
|
||||
"""
|
||||
This methods returns the AUC Score when given the Predictions
|
||||
and Labels
|
||||
'''
|
||||
"""
|
||||
|
||||
fpr, tpr, thresholds = sklearn.metrics.roc_curve(target, predictions)
|
||||
roc_auc = sklearn.metrics.auc(fpr, tpr)
|
||||
return roc_auc
|
||||
fpr, tpr, thresholds = sklearn.metrics.roc_curve(target, predictions)
|
||||
roc_auc = sklearn.metrics.auc(fpr, tpr)
|
||||
return roc_auc
|
||||
|
||||
# %% [code]
|
||||
if os.path.exists('model.h5'):
|
||||
o_2['model'].load_weights('model.h5')
|
||||
else:
|
||||
o_2['model'].fit(
|
||||
o_2['xtrain_pad'],
|
||||
o_2['ytrain'],
|
||||
nb_epoch=nb_epochs,
|
||||
batch_size=64*o_2['strategy'].num_replicas_in_sync
|
||||
) #Multiplying by Strategy to run on TPU's
|
||||
o_2['model'].save_weights('model.h5')
|
||||
# %% [code]
|
||||
if os.path.exists('model.h5'):
|
||||
o_2['model'].load_weights('model.h5')
|
||||
else:
|
||||
o_2['model'].fit(
|
||||
o_2['xtrain_pad'], o_2['ytrain'], nb_epoch=nb_epochs, batch_size=64 * o_2['strategy'].num_replicas_in_sync
|
||||
) # Multiplying by Strategy to run on TPU's
|
||||
o_2['model'].save_weights('model.h5')
|
||||
|
||||
# %% [code]
|
||||
scores = o_2['model'].predict(o_2['xvalid_pad'])
|
||||
print(
|
||||
"Auc: %.2f%%" % (
|
||||
roc_auc(
|
||||
scores,
|
||||
o_2['yvalid']
|
||||
)
|
||||
)
|
||||
)
|
||||
# %% [code]
|
||||
scores = o_2['model'].predict(o_2['xvalid_pad'])
|
||||
print('Auc: %.2f%%' % (roc_auc(scores, o_2['yvalid'])))
|
||||
|
||||
# %% [code]
|
||||
scores_model = []
|
||||
scores_model.append(
|
||||
{
|
||||
'Model': 'SimpleRNN',
|
||||
'AUC_Score': roc_auc(
|
||||
scores,
|
||||
o_2['yvalid']
|
||||
)
|
||||
}
|
||||
)
|
||||
# %% [code]
|
||||
scores_model = []
|
||||
scores_model.append({'Model': 'SimpleRNN', 'AUC_Score': roc_auc(scores, o_2['yvalid'])})
|
||||
|
||||
# %% [markdown]
|
||||
# ## Code Explanantion
|
||||
# * Tokenization<br><br>
|
||||
# So if you have watched the videos and referred to the links, you would know that in an RNN we input a sentence word by word. We represent every word as one hot vectors of dimensions : Numbers of words in Vocab +1. <br>
|
||||
# What keras Tokenizer does is , it takes all the unique words in the corpus,forms a dictionary with words as keys and their number of occurences as values,it then sorts the dictionary in descending order of counts. It then assigns the first value 1 , second value 2 and so on. So let's suppose word 'the' occured the most in the corpus then it will assigned index 1 and vector representing 'the' would be a one-hot vector with value 1 at position 1 and rest zereos.<br>
|
||||
# Try printing first 2 elements of xtrain_seq you will see every word is represented as a digit now
|
||||
# %% [markdown]
|
||||
# ## Code Explanantion
|
||||
# * Tokenization<br><br>
|
||||
# So if you have watched the videos and referred to the links, you would know that in an RNN we input a sentence word by word. We represent every word as one hot vectors of dimensions : Numbers of words in Vocab +1. <br>
|
||||
# What keras Tokenizer does is , it takes all the unique words in the corpus,forms a dictionary with words as keys and their number of occurences as values,it then sorts the dictionary in descending order of counts. It then assigns the first value 1 , second value 2 and so on. So let's suppose word 'the' occured the most in the corpus then it will assigned index 1 and vector representing 'the' would be a one-hot vector with value 1 at position 1 and rest zereos.<br>
|
||||
# Try printing first 2 elements of xtrain_seq you will see every word is represented as a digit now
|
||||
|
||||
# %% [code]
|
||||
o_2['xtrain_seq'][:1]
|
||||
|
||||
# %% [code]
|
||||
o_2['xtrain_seq'][:1]
|
||||
|
||||
def kernel_4(
|
||||
o_2,
|
||||
input_texts=None,
|
||||
o_2,
|
||||
input_texts=None,
|
||||
):
|
||||
import keras.preprocessing.sequence
|
||||
import keras.preprocessing.sequence
|
||||
|
||||
if input_texts is None:
|
||||
input_texts = [
|
||||
'blahb blahb blah',
|
||||
'Hello World!',
|
||||
'This is very good!',
|
||||
'A very non toxic comment! This is so polite and polished one!'
|
||||
]
|
||||
if input_texts is None:
|
||||
input_texts = ['blahb blahb blah', 'Hello World!', 'This is very good!', 'A very non toxic comment! This is so polite and polished one!']
|
||||
|
||||
t6 = []
|
||||
for o in input_texts:
|
||||
t1 = o
|
||||
t2 = o_2['token'].texts_to_sequences(
|
||||
[t1],
|
||||
)
|
||||
t3 = keras.preprocessing.sequence.pad_sequences(
|
||||
t2,
|
||||
maxlen=o_2['max_len']
|
||||
)
|
||||
t4 = o_2['model'].predict(
|
||||
t3,
|
||||
)
|
||||
t6.append(
|
||||
dict(
|
||||
text=o,
|
||||
score=t4[0][0],
|
||||
)
|
||||
)
|
||||
pprint.pprint(
|
||||
dict(
|
||||
t1=t1,
|
||||
t2=t2,
|
||||
t3=t3,
|
||||
t4=t4,
|
||||
)
|
||||
)
|
||||
pprint.pprint(t6)
|
||||
t6 = []
|
||||
for o in input_texts:
|
||||
t1 = o
|
||||
t2 = o_2['token'].texts_to_sequences(
|
||||
[t1],
|
||||
)
|
||||
t3 = keras.preprocessing.sequence.pad_sequences(t2, maxlen=o_2['max_len'])
|
||||
t4 = o_2['model'].predict(
|
||||
t3,
|
||||
)
|
||||
t6.append(
|
||||
dict(
|
||||
text=o,
|
||||
score=t4[0][0],
|
||||
)
|
||||
)
|
||||
pprint.pprint(
|
||||
dict(
|
||||
t1=t1,
|
||||
t2=t2,
|
||||
t3=t3,
|
||||
t4=t4,
|
||||
)
|
||||
)
|
||||
pprint.pprint(t6)
|
||||
|
||||
return dict(
|
||||
t6=t6,
|
||||
)
|
||||
|
||||
return dict(
|
||||
t6=t6,
|
||||
)
|
||||
|
||||
def kernel_5(
|
||||
o_1=None,
|
||||
o_2=None,
|
||||
o_1=None,
|
||||
o_2=None,
|
||||
):
|
||||
if o_1 is None:
|
||||
o_1 = kernel_1_sample_scrap(max_articles=50)
|
||||
if o_1 is None:
|
||||
o_1 = kernel_1_sample_scrap(max_articles=50)
|
||||
|
||||
if o_2 is None:
|
||||
o_2 = kernel_2()
|
||||
o_3 = kernel_3(
|
||||
o_2=o_2,
|
||||
nb_epochs=1
|
||||
)
|
||||
if o_2 is None:
|
||||
o_2 = kernel_2()
|
||||
o_3 = kernel_3(o_2=o_2, nb_epochs=1)
|
||||
|
||||
t1 = sum(
|
||||
[
|
||||
[
|
||||
o['text'] for o in o2['comments']
|
||||
] for o2 in o_1['t8']
|
||||
], []
|
||||
)
|
||||
t1 = sum([[o['text'] for o in o2['comments']] for o2 in o_1['t8']], [])
|
||||
|
||||
t2 = kernel_4(
|
||||
o_2=o_2,
|
||||
input_texts=t1
|
||||
)
|
||||
t2 = kernel_4(o_2=o_2, input_texts=t1)
|
||||
|
||||
t3 = sorted(
|
||||
t2['t6'],
|
||||
key=lambda x: x['score'],
|
||||
)
|
||||
pprint.pprint(t3)
|
||||
t3 = sorted(
|
||||
t2['t6'],
|
||||
key=lambda x: x['score'],
|
||||
)
|
||||
pprint.pprint(t3)
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@ -3,34 +3,34 @@ import unittest
|
||||
|
||||
|
||||
class TestCrypto(unittest.TestCase):
|
||||
def test_password_utils(self) -> None:
|
||||
salt = b'asdfasdfasdf'
|
||||
def test_password_utils(self) -> None:
|
||||
salt = b'asdfasdfasdf'
|
||||
|
||||
secret = 'blah'
|
||||
secret = 'blah'
|
||||
|
||||
hash_res = crypto.PasswordUtils.secret_hash(
|
||||
secret,
|
||||
mode='bytes',
|
||||
salt=salt,
|
||||
)
|
||||
self.assertEqual(
|
||||
hash_res,
|
||||
(
|
||||
salt,
|
||||
b'\xdak\xd15\xfa\x8e\xc8\r\xc3\xd2c\xf1m\xb0\xbf\xe6\x98\x01$!j\xc8\xc0Hh\x84\xea,\x91\x8b\x08\xce',
|
||||
),
|
||||
)
|
||||
hash_res = crypto.PasswordUtils.secret_hash(
|
||||
secret,
|
||||
mode='bytes',
|
||||
salt=salt,
|
||||
)
|
||||
self.assertEqual(
|
||||
hash_res,
|
||||
(
|
||||
salt,
|
||||
b'\xdak\xd15\xfa\x8e\xc8\r\xc3\xd2c\xf1m\xb0\xbf\xe6\x98\x01$!j\xc8\xc0Hh\x84\xea,\x91\x8b\x08\xce',
|
||||
),
|
||||
)
|
||||
|
||||
check_res = crypto.PasswordUtils.secret_check(
|
||||
secret,
|
||||
*hash_res,
|
||||
)
|
||||
check_res = crypto.PasswordUtils.secret_check(
|
||||
secret,
|
||||
*hash_res,
|
||||
)
|
||||
|
||||
self.assertTrue(check_res)
|
||||
self.assertTrue(check_res)
|
||||
|
||||
self.assertFalse(
|
||||
crypto.PasswordUtils.secret_check(
|
||||
secret + 'asdfasdfsdf',
|
||||
*hash_res,
|
||||
)
|
||||
)
|
||||
self.assertFalse(
|
||||
crypto.PasswordUtils.secret_check(
|
||||
secret + 'asdfasdfsdf',
|
||||
*hash_res,
|
||||
)
|
||||
)
|
||||
|
||||
Loading…
Reference in New Issue
Block a user